Krishnarjun Senthilvelan

Autonomous Service Mesh Policies for Cloud-Native Traffic Governance

Abstract:

Cloud native microservices environments operate under constantly changing conditions such as traffic spikes, dependency instability, latency fluctuations, and regional service degradation. Managing these disruptions through traditional routing and failover mechanisms often requires continuous human intervention, slowing response times and increasing operational complexity. This presentation explores how service mesh platforms can evolve into autonomous traffic governance systems capable of adapting policies in real time using live telemetry signals.
The research examines how metrics such as latency patterns, dependency health indicators, and service availability can guide dynamic policy adjustments within the mesh layer. By embedding capabilities such as automated failover, adaptive routing decisions, fault isolation strategies, and dynamic retry behavior, the service mesh can respond to emerging disruptions before they escalate into system wide failures.
Several scenarios demonstrate how telemetry driven policy adjustments enabled the mesh to detect degradation, redirect traffic toward healthier service paths, and stabilize distributed systems without requiring immediate operator involvement. The study also highlights how built in rate limiting mechanisms helped contain localized failures and prevented cascading outages across service dependencies.
The presentation further discusses the operational impact of reducing manual intervention during routine incidents. Autonomous mesh governance can decrease on call escalation frequency, reduce operational overhead, and improve system reliability in complex distributed environments.
By demonstrating how self regulating mesh policies strengthen resilience, traffic governance, and operational efficiency, this work provides a practical framework for organizations operating large scale microservices architectures.
 
Profile:

 

Krishnarjun Senthilvelan is a software and reliability engineer with nine years of experience building, operating, and improving cloud-native systems. He has worked across the full software development lifecycle from Java and microservices development to automation, observability, and large-scale system reliability.
 
He has supported mission-critical platforms for organizations such as Cigna, Fannie Mae, Walmart, and Toyota, helping modernize systems, strengthen performance, and build automation frameworks that reduce operational overhead. His technical strengths include Java/Spring, Angular, React, Node.js, Terraform, AWS, Azure, Splunk, Dynatrace, CI/CD pipelines, and infrastructure as code.
 
With experience spanning both development and SRE roles, Krishnarjun specializes in root-cause analysis, monitoring, cloud resource management, performance testing, and designing scalable, resilient applications. He holds a master’s degree in software engineering and brings a mix of technical depth, problem-solving, and collaborative engineering to every project he works on.