INTELLIGENT AUTOMATION FOR SERVICE DEGRADATION PREDICTION USING LLMS AND OBSERVABILITY DATA
Abstract:
For maintaining reliable operations in distributed cloud systems, it is vital to predict the degradation of the services. Traditional threshold-based monitoring does not capture early warning signs that there will be an issue eventually but respond only once the service is impacted and SLAs are breached. This paper introduces an observable framework that uses Large Language Models (LLMs) to predict service degradation, as an extension of our earlier work on sequential logs and metrics. Based on validated field trials and data analyses, this paper provides both high accuracies along with proactive insight generation together with improved response times to incidents. This paper shows the end-to-end implementation along with some deployment details and a case study to prove the effectiveness of models.
© Copyright @ peis2024. All Rights Reserved