Mr. Vishal Mukeshbhai Shah
Intelligent Anomaly Detection for Complex Cloud Systems: A Deep Learning Framework for Scalable, Real-Time Monitoring
Abstract:
Modern cloud infrastructures generate between 7.2 and 18.5 terabytes of monitoring data each day across distributed services, introducing challenges that exceed the capabilities of traditional rule-based systems. Studies show that these legacy tools detect only 63.7 percent of critical anomalies while producing 29.4 percent false positives, highlighting the need for AI-driven monitoring approaches. This presentation proposes a deep learning-based anomaly detection framework integrating Long Short-Term Memory (LSTM) autoencoders for time-series telemetry
and Transformer models for log sequence analysis. LSTM autoencoders trained on 12.4 million telemetry points achieved 91.3 percent detection accuracy with 7.8 percent false positives, while Transformer architectures reached F1 scores of 0.921, detecting 91.7 percent of true incidents with a 5.3 percent false positive rate. Together, they provide early detection of precursor anomalies, often five to fourteen minutes before service degradation becomes apparent in conventional tools. The system employs a hierarchical distributed architecture with edge preprocessing that reduces data volume by 72.5 percent while preserving 97.8 percent of detection capability. Regional GPU-accelerated nodes process 312,000 metric streams and 64.7 million log entries per minute, maintaining 124 millisecond inference latency. A federated learning component enhances cross-regional model performance, improving detection F1 scores by 15.8 percent and reducing inter-region data transfers by 92.7 percent, ensuring compliance with data sovereignty requirements. Deployed across 12,400 production servers, this framework reduced mean time to detection from 76.3 minutes to 11.8 minutes, significantly improving operational efficiency and cloud reliability. The approach demonstrates how LSTM Transformer integration and federated scalability can transform cloud observability by enabling real-time, accurate, and adaptive anomaly detection in complex distributed systems.
Profile:
Vishal Shah is a seasoned Principal Software Engineer with over 18 years of experience architecting and developing large-scale distributed systems, cloud infrastructure, and data integration platforms. Currently at Workday Inc., he spearheads the development of resilient distributed frameworks that enable core services to achieve high availability and fault tolerance, playing a key role in architectural decisions for Workday's HCM platform. Shah's expertise spans the entire technology stack, from low-level system architecture to cloud-native solutions. At Informatica, he was instrumental in defining the foundational architecture for the company's cloud transformation, delivering critical features including Spark execution on Kubernetes via the Cloud Data Integration Elastic Secure Agent. His work significantly accelerated cloud adoption and established new standards for scalable data processing. Throughout his career at leading technology companies including Informatica and SAP Sybase, Shah has consistently delivered breakthrough solutions that enhance system performance and scalability. His notable contributions include developing dynamic lookup cache systems for real-time data processing, implementing parallel data loading infrastructure, and creating extensible frameworks that reduce development complexity while improving system reliability. Shah holds a Master of Technology degree from the International Institute of Information Technology, Hyderabad, and a Bachelor of Technology from Nirma Institute of Technology. His technical leadership combines deep architectural expertise with practical implementation skills, making him a trusted advisor for complex system design and cloud migration initiatives. Based in Sunnyvale, California, Shah continues to drive innovation in distributed systems and cloud infrastructure, with a focus on building resilient, high-performance platforms that scale to meet enterprise demands.