Mr. Aditya Gupta
From Spans to Signals: The Future of AI-Augmented Tracing in Multi-Cloud Environments
Abstract:
Distributed systems today generate an unprecedented volume of telemetry data. Every user interaction, API call, and containerized service produces spans that form the backbone of modern observability pipelines. Yet, while platforms such as AWS X-Ray and GCP Cloud Trace allow engineers to collect this data at scale, the real challenge is no longer how much data we can capture but how effectively we can transform those spans into actionable signals. This keynote addresses the next frontier in observability: applying artificial intelligence to tracing pipelines so they move beyond raw collection into intelligent storytelling. By combining distributed tracing with machine learning, systems can be designed to automatically prioritize spans, detect anomalies, and highlight service interactions that may otherwise be buried under terabytes of noise. Instead of leaving engineers with millions of data points to sift through, AI-augmented tracing can surface the most relevant causal chains and guide debugging directly toward root causes. The session will draw from practical insights gained in building secure and scalable observability systems at one of the world’s largest cloud providers, and from research in anomaly detection, explainable models, and fault classification. We will explore how lightweight ML models can be deployed at the edge of Docker-based microservices to perform early inference, reducing ingestion costs and improving signal quality. We will also discuss how multi-cloud architectures can leverage open standards such as OpenTelemetry to unify tracing data across environments, ensuring portability between AWS, GCP, and hybrid deployments. Finally, the keynote will place this vision in the context of long-term resilience. As enterprises adopt AI-driven observability, the goal is not just faster debugging but building cloud-native systems that are self-optimizing, self-healing, and resilient against unpredictable workloads. By turning spans into signals, and signals into stories, tracing systems can evolve from passive record-keepers into active participants in the reliability and security of distributed infrastructure.
.png)