Mahendran Vasagam
How AI Learns to Route Queries: ML-Driven Resource Prediction for Distributed Data Platforms
Abstract:
Organizations operating distributed query engines across multiple compute clusters face a persistent challenge: routing queries to appropriately sized infrastructure. Current approaches such as round-robin, hash-based routing, and manual rules waste resources when simple queries land on oversized clusters and cause failures when complex queries overwhelm undersized ones. Production environments commonly experience 30-50% infrastructure waste and 15-25% query failure rates from these mismatches.
This talk presents a machine learning-based system that predicts query resource requirements before execution by vectorizing query execution plans and performing similarity search against historical performance data. The system employs three complementary approaches: feature-based k-NN similarity search for fast interpretable routing, gradient boosting for balanced production deployment, and graph neural networks that capture execution plan topology for maximum accuracy. An active learning feedback loop enables the system to adapt continuously to evolving workload patterns.
Drawing from experience building petabyte-scale data infrastructure in enterprise environments, the session covers real-world architecture decisions, production deployment considerations, and measurable outcomes including 78% routing accuracy, 80% reduction in query failures, and 40% infrastructure cost savings. Attendees will gain practical insights into applying ML techniques to distributed systems optimization, applicable to any organization managing multi-cluster query workloads
Profile:
Mahendran Vasagam is a Staff Software Engineer at Slack, specializing in data engineering, with hands-on experience building and operating distributed query engines, compute platforms, and orchestration systems at petabyte scale. His production experience managing multi-cluster environments and optimizing infrastructure efficiency motivated his research into ML-driven query routing.
With 20+ years of experience spanning data platform engineering, distributed systems, cloud security, and network security, Mahendran has held senior technical roles at multiple technology companies including two startups acquired by major enterprises. As a founding engineer at Skyhigh Networks (acquired by McAfee), he helped build the platform that defined the Cloud Access Security Broker (CASB) market, scaling to 500+ enterprise customers and processing 10 billion security events daily. He also served as Principal Engineer at Armorblox (acquired by Cisco), building AI-driven email security infrastructure. Mahendran is an IEEE Senior Member and serves on the IEEE Consumer Technology Society Technical Committee for Security and Privacy in Consumer Technology. He also serves as a peer reviewer and Technical Program Committee (TPC) member for international conferences and journals, including ACM SIGCHI, ACM DIS, ACM PEARC, ICLR, IEEE ICCE-Taiwan.

