Publications
Research
Applied AI research combining benchmarks, inference optimization, and open-source publishing.
Research Philosophy
Standard benchmarks measure model capability in isolation. We measure what happens when those models run against real infrastructure, real data, and real cost constraints. All research is published openly. All tools are open source.
Industry Benchmarking
Domain-specific evaluations for autonomous agents across automotive, legacy code, finance, healthcare, and more.
Inference Optimization
Research on MoE models, weight optimization, and techniques for efficient AI deployment at scale.
AI Safety & Harness
Building the right harness and designing benchmarks that measure AI safety in production environments.
Papers & Reports
Publications
LegacyCodeBench: A Benchmark for Evaluating AI Agents on Real-World Legacy Modernization
Kalmantic Labs
We introduce LegacyCodeBench, a comprehensive benchmark for evaluating how well AI systems understand and modernize legacy code across COBOL, Fortran, and enterprise Java systems with real-world production constraints.
PeakWeights: Weight Optimization Techniques for Efficient Model Deployment
Kalmantic Labs
Weight optimization techniques for production model deployment. Quantization, pruning, and compression methods that maintain output quality at lower inference cost.
Inference Optimization and MoE Models for Production Systems
Kalmantic Labs
Deep research into inference optimization strategies, Mixture of Experts model architectures, and their practical implications for AI safety, AI harness design, and autonomous agent deployment.
Beyond Benchmarks: Measuring Real-World Impact of Autonomous Agents
Kalmantic Labs
A framework for collecting and analyzing real-world feedback on how autonomous agents impact humans, workflows, and organizational structures across industries.