AIR2025

Mr. Syamprasad Guda

Intelligent Autoscaling for AI Workloads: A Cost-Efficient, Multi-Cloud Approach with KEDA and Event-Driven Microservices

Abstract:

As AI workloads become increasingly dynamic and resource-intensive, organizations face the challenge of ensuring scalability, resilience, and cost-efficiency across cloud environments. This session presents a real-world architecture that integrates Kubernetes-based Event Driven Autoscaler (KEDA) with serverless Java microservices deployed in an active-active multi-cloud setup across AWS, Azure, and GCP.

We’ll explore how event-driven scaling—triggered by real workload signals like Kafka lag, queue depth, and database backlog—can outperform traditional CPU-based autoscaling in both responsiveness and cost savings. With experimental validation showing 45% cost reduction and 99.99% uptime under peak conditions, this architecture is particularly well-suited for AI applications such as real-time inference, NLP pipelines, and distributed analytics.

Attendees will gain practical insights into designing resilient, event-driven systems for AI services, implementing cross-cloud fault tolerance, and planning for future enhancements like predictive autoscaling and green cloud optimization. Whether you're an AI architect, DevOps leader, or cloud practitioner, this session will equip you with a scalable blueprint for next-gen AI infrastructure.