Date:

Monte Carlo Brings GenAI to Data Observability

Monte Carlo Unleashes Generative AI for Data Observability

Monte Carlo has made a name for itself in the field of data observability, where it uses machine learning and other statistical methods to identify quality and reliability issues hiding in big data. With this week’s update, which it made during its IMPACT 2024 event, the company is adopting generative AI to help it take its data observability capabilities to a new level.

Introducing GenAI Monitor Recommendations

That’s where the new GenAI Monitor Recommendations that Monte Carlo announced yesterday can make a difference. In a nutshell, the company is using a large language model (LLM) to search through the myriad ways that data is used in a customer’s database, and then recommending some specific monitors, or data quality rules, to keep an eye on them.

How GenAI Works

Here’s how it works: In the Data Profiler component of the Monte Carlo platform, sample data is fed into the LLM to analyze how the database is used, specifically the relationships between the database columns. The LLM uses this sample, as well as other metadata, to build a contextual understanding of actual database usage.

The Power of LLMs

While classical ML models do well with detecting anomalies in data, such as table freshness and volume issues, LLMs excel at detecting patterns in the data that are difficult if not impossible to discover using traditional ML, says Lior Gavish, Monte Carlo co-founder and CTO. "GenAI’s strength lies in semantic understanding," Gavish tells BigDATAwire. "For example, it can analyze SQL query patterns to understand how fields are actually used in production, and identify logical relationships between fields (like ensuring a ‘start_date’ is always earlier than an ‘end_date). This semantic comprehension capability goes beyond what was possible with traditional ML/DL approaches."

Real-World Applications

The new capability will make it easier for technical and non-technical employees to build data quality rules. Monte Carlo used the example of a data analyst for a professional baseball team to quickly create rules for a "pitch_history" table. There’s clearly a relationship between the column "pitch_type" (fastball, curveball, etc.) and pitch speed. With GenAI baked in, Monte Carlo can automatically recommend data quality rules that make sense based on the history of the relationship between those two columns, i.e. "fastball" should have pitch speeds of greater than 80mph, the company says.

Conclusion

Thanks to its human-like capability to grasp semantic meaning and generate accurate responses, GenAI tech has the potential to transform many data management tasks that are highly reliant on human perception, including data quality management and observability. Monte Carlo’s integration of GenAI into its data observability platform marks a significant milestone in the company’s mission to provide data teams with the tools they need to ensure data reliability and quality.

Frequently Asked Questions

Q: What is GenAI?
A: GenAI is a large language model (LLM) that uses semantic understanding to analyze data and recommend specific monitors or data quality rules.

Q: How does GenAI work?
A: GenAI uses a sample of data to analyze how the database is used, specifically the relationships between database columns, and then builds a contextual understanding of actual database usage.

Q: What are the benefits of GenAI?
A: GenAI excels at detecting patterns in data that are difficult or impossible to discover using traditional ML, and can automatically recommend data quality rules based on the history of relationships between data fields.

Q: How does Monte Carlo plan to use GenAI?
A: Monte Carlo plans to integrate GenAI into its data observability platform to provide data teams with the tools they need to ensure data reliability and quality.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here