Date:

Formula 1 Leverages Generative AI for Faster Race-Day Issue Resolution

Formula 1’s AI-Driven Root Cause Analysis Solution

Formula 1 (F1) races are high-stakes affairs where operational efficiency is paramount. During these live events, F1 IT engineers must triage critical issues across its services, such as network degradation to one of its APIs. This impacts downstream services that consume data from the API, including products like F1 TV, which offer live and on-demand coverage of every race, as well as real-time telemetry. Determining the root cause of these issues and preventing it from happening again takes significant effort. Due to the event schedule and change freeze periods, it can take up to 3 weeks to triage, test, and resolve a critical issue, requiring investigations across teams including development, operations, infrastructure, and networking.

Implementing the Root Cause Analysis Solution Architecture

In collaboration with the AWS Prototyping team, F1 embarked on a 5-week prototype to demonstrate the feasibility of this solution. The objective was to use AWS to replicate and automate the current manual troubleshooting process for two candidate systems. As a starting point, the team reviewed real-life issues, drafting a flowchart outlining 1) the troubleshooting process, 2) teams and systems involved, 3) required live checks, and 4) logs investigations required for each scenario. The following is a diagram of the solution architecture.

Creating ETL Pipelines to Transform Log Data

Preparing your data to provide quality results is the first step in an AI project. AWS helps you improve your data quality over time so you can innovate with trust and confidence. Amazon CloudWatch gives you visibility into system-wide performance and allows you to set alarms, automatically react to changes, and gain a unified view of operational health.

Agentic RAG Implementation

Amazon Bedrock Agents facilitates interaction with internal systems such as databases and Amazon Elastic Compute Cloud (Amazon EC2) instances and external systems such as Jira and Datadog. Anthropic’s Claude 3 models (the latest model at the time of development) were used to orchestrate and generate high-quality responses, maintaining accurate and relevant information from the chat assistant.

Chat Application

The chat assistant UI was developed using the Streamlit framework, which is Python-based and provides simple yet powerful application widgets. In the Streamlit app, users can test their Amazon Bedrock agent iterations seamlessly by providing or replacing the agent ID and alias ID. In the chat assistant, the full conversation history is displayed, and the conversation can be reset by choosing Clear. The response from the LLM application consists of two parts. On the left is the final neutral response based on the user’s questions. On the right is the trace of LLM agent orchestration plans and executions, which is hidden by default to keep the response clean and concise.

Conclusion

In this post, we explained how F1 and AWS have developed a root cause analysis (RCA) assistant powered by Amazon Bedrock to reduce manual intervention and accelerate the resolution of recurrent operational issues during races from weeks to minutes. The RCA assistant enables the F1 team to spend more time on innovation and improving its services, ultimately delivering an exceptional experience for fans and partners. The successful collaboration between F1 and AWS showcases the transformative potential of generative AI in empowering teams to accomplish more in less time.

FAQs

Q: What is the root cause analysis (RCA) assistant?
A: The RCA assistant is a chat-based application that uses natural language processing and machine learning to help IT engineers troubleshoot and resolve complex technical issues.

Q: How does the RCA assistant work?
A: The RCA assistant uses Amazon Bedrock to query various data sources, including log files, databases, and external systems, to identify potential causes of issues and provide recommendations for resolution.

Q: What are the benefits of using the RCA assistant?
A: The RCA assistant reduces the time it takes to resolve issues from weeks to minutes, allowing IT engineers to focus on innovation and improving services, and providing a better experience for fans and partners.

Q: How does the RCA assistant integrate with existing incident management tools?
A: The RCA assistant integrates with existing incident management tools, such as Jira, to facilitate seamless communication and ticket creation.

Q: Can I use the RCA assistant for other use cases?
A: Yes, the RCA assistant can be used for other use cases, such as troubleshooting and resolving issues in other industries, such as healthcare, finance, and education.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here