Kalmantic Labs

Publications

Research

Applied AI research combining benchmarks, inference optimization, and open-source publishing.

Research Philosophy

Standard benchmarks measure model capability in isolation. We measure what happens when those models run against real infrastructure, real data, and real cost constraints. All research is published openly. All tools are open source.

Industry Benchmarking

Domain-specific evaluations for autonomous agents across automotive, legacy code, finance, healthcare, and more.

Inference Optimization

Research on MoE models, weight optimization, and techniques for efficient AI deployment at scale.

AI Safety & Harness

Building the right harness and designing benchmarks that measure AI safety in production environments.

Papers & Reports

Publications

2025Paper

LegacyCodeBench: A Benchmark for Evaluating AI Agents on Real-World Legacy Modernization

Kalmantic Labs

We introduce LegacyCodeBench, a comprehensive benchmark for evaluating how well AI systems understand and modernize legacy code across COBOL, Fortran, and enterprise Java systems with real-world production constraints.

Read paper
2025Paper

PeakWeights: Weight Optimization Techniques for Efficient Model Deployment

Kalmantic Labs

Weight optimization techniques for production model deployment. Quantization, pruning, and compression methods that maintain output quality at lower inference cost.

Read paper
2026Research

Inference Optimization and MoE Models for Production Systems

Kalmantic Labs

Deep research into inference optimization strategies, Mixture of Experts model architectures, and their practical implications for AI safety, AI harness design, and autonomous agent deployment.

Read paper
2026Research

Beyond Benchmarks: Measuring Real-World Impact of Autonomous Agents

Kalmantic Labs

A framework for collecting and analyzing real-world feedback on how autonomous agents impact humans, workflows, and organizational structures across industries.

Read paper