
Mr. Aditya Gupta
What They Don’t Teach You About Debugging at Scale: Observability Lessons from a Cloud Engineer
Abstract:
Everyone logs. Few observe. At small scale, debugging is easy: just grep your logs. But once your service handles millions of transactions per minute, grepping logs becomes archaeology. You need systems — not just scripts. In this talk, I’ll share what most courses and books don’t teach you: how to build observability into your systems from day one. You’ll learn how to debug the undebuggable using open-source APM patterns (like OpenTelemetry and Prometheus), how to trace business logic across microservices, and why “working logs” often fail during real incidents. We’ll cover lessons from debugging p99 latency, silent failures, and ghost API timeouts — all in systems with 1M+ transactions per second. These are tool-agnostic lessons, applicable whether you’re at a startup or running production on a laptop. If you’re a backend engineer, DevOps team member, or just tired of guessing what’s wrong in prod, this talk will give you the debugging playbook you were never taught.