The Fragile State of AI Data: A Threat to Trust and Reliability
The Consequences of Poor Data Quality
Trust is fragile, and that’s one problem with artificial intelligence, which is only as good as the data behind it. Data integrity concerns have vexed even the savviest organizations for decades, and are rearing their head again. Industry experts are sounding the alarm, warning that users of generative AI may be fed incomplete, duplicative, or erroneous information that can come back to bite them – thanks to the weak or siloed data underpinning these systems.
The Challenges of AI-Ready Data Architecture
An AI-ready data architecture is a different beast than traditional approaches to data delivery. AI is built on probabilistic models, meaning output will vary based on probabilities and the supporting data underneath at the time of query. This limits data system design, and data systems may not be designed for probabilistic models, which can make the cost of training and retraining high, without data transformation that includes data ontologies, governance, and trust-building actions, and creation of data queries that reflect real-world scenarios.
The Threat of Hallucinations and Model Drift
To the challenges, add hallucinations and model drift, which are reasons to keep human hands in the process and step up efforts to align and assure consistency in data. This potentially cuts into trust, perhaps the most valuable commodity in the AI world, according to Ian Clayton, chief product officer of Redpoint Global.
The Importance of Human Oversight
"Creating a data environment with robust data governance, data lineage, and transparent privacy regulations helps ensure the ethical use of AI within the parameters of a brand promise," said Clayton. Building a foundation of trust helps prevent AI from going rogue, which can easily lead to uneven customer experiences.
Industry Concerns Over Data Readiness
Across the industry, concern is mounting over data readiness for AI. "Data quality is a perennial issue that businesses have faced for decades," said Gordon Robinson, senior director of data management at SAS. There are two essential questions on data environments for businesses to consider before starting an AI program: "Do you understand what data you have, the quality of the data, and whether it is trustworthy or not?" and "Do you have the right skills and tools available to you to prepare your data for AI?"
The Need for Data Consolidation and Quality
There is an enhanced need for "data consolidation and data quality" to face AI headwinds, Clayton said. "These entail bringing all data together and out of silos, as well as intensive data quality steps that include deduplication, data integrity, and ensuring consistency."
Data Security Concerns
Data security also takes on a new dimension as AI is introduced. "Shortcutting security controls in an attempt to rapidly deliver AI solutions leads to a lack of oversight," said Omar Khawaja, field chief information security officer at Databricks.
Essential Elements for Ensuring Trust in AI Data
Industry observers point to several essential elements needed to ensure trust in the data behind AI:
- Agile data pipelines: The rapid evolution of AI requires agile and scalable data pipelines, which are vital to ensure that the business can easily adapt to new AI use cases.
- Visualization: "If data scientists find it hard to access and visualize the data they have, it severely limits their AI development efficiency," Clayton pointed out.
- Robust governance programs: Without strong data governance, businesses may encounter data quality issues, leading to inaccurate insights and poor decision-making.
- Thorough and ongoing measurements: The accuracy and effectiveness of AI models are directly dependent on the quality of the data it is trained on.
Conclusion
An AI-ready data architecture should enable IT and data teams to "measure a variety of outcomes covering data quality, accuracy, completeness, consistency, and AI model performance," said Clayton. "Organizations should take steps to continually verify that AI is paying dividends versus just implementing AI for AI’s sake."
Frequently Asked Questions
- What are the consequences of poor data quality in AI?
- Incomplete, duplicative, or erroneous information may be fed to users of generative AI, which can come back to bite them.
- What are the challenges of AI-ready data architecture?
- AI is built on probabilistic models, which can limit data system design, and data systems may not be designed for probabilistic models.
- What are hallucinations and model drift in AI?
- Hallucinations and model drift are reasons to keep human hands in the process and step up efforts to align and assure consistency in data.
- Why is human oversight important in AI?
- Building a foundation of trust helps prevent AI from going rogue, which can easily lead to uneven customer experiences.