The after-conference proceeding of the CML 2026 will be published in SCOPUS Indexed Springer Book Series "Lecture Notes in Networks and Systems"

Reeshav Kumar

Post-Training Optimization Techniques for AI Models

Abstract:

Post-training optimization is essential for turning trained AI models into deployable production systems. This talk introduces a strata model that integrates model, runtime, and system-level strategies to help practitioners build high-performance systems under resource constraints. Model-level techniques—including post-training quantization (PTQ), sparsity pruning, low-rank adaptation (LoRA), and knowledge distillation—improve parameter efficiency. Compiler-level optimizations such as operator fusion, memory layout restructuring, constant folding, kernel auto-tuning, and compiler re-architecture enhance computational performance. System-level methods—including dynamic batching, KV-cache reuse, paged attention, request coalescing, and model routing—optimize deployment for specific environments and enable efficient resource utilization. The talk also presents a structured framework for evaluating trade-offs among quality and latency, throughput and memory, energy use, and cost. This approach enables practitioners to make informed optimization decisions that improve efficiency and reliability across applications and serving infrastructure.

Proile:

Reeshav Kumar is a Senior Product Manager at Meta Platforms, leading product development for Instagram's business analytics and next-generation Mixed Reality devices. With over a decade of experience spanning hardware, software, and AI, he shapes how people connect and businesses thrive in the digital ecosystem. Before Meta, Reeshav spent four years at Apple as a Product Manager and Engineering Project Manager, spearheading computational photography and neural engine capabilities for iPhones and Apple Silicon. His AI/ML optimization work dramatically improved latency and power consumption, while his automation initiatives delivered over $10 million in cost savings. His technical foundation includes engineering roles at Apple, Barefoot Networks, and Oracle, where he designed components for 517+ million devices, developed programmable switches for AI data center workloads, and created processors generating over $1 billion in revenue. Reeshav holds an MBA from UC Berkeley's Haas School of Business and an MS in Electrical Engineering from Texas A&M University, where he also taught. He earned his BE with Honours from BITS Pilani, India, and has published research on IC design and network architectures. Beyond work, Reeshav supported educational initiatives for 120+ underprivileged children in rural India and pursues underwater exploration as a PADI Master Scuba Diver.