OpenAI’s general-purpose speech recognition mannequin is flawed, researchers say

The Related Press reported just lately that it has interviewed greater than a dozen software program engineers, builders and tutorial researchers who take situation with a declare by synthetic intelligence developer OpenAI that certainly one of its machine studying instruments, which is utilized in medical documentation at many U.S. well being programs, has human-like accuracy.

WHY IT MATTERS

Researchers on the College of Michigan and others discovered that AI hallucinations resulted in misguided transcripts – typically with racial and violent rhetoric along with imagined medical remedies, based on the AP.

Of concern is the widespread uptake of instruments that use Whisper, out there open supply or as an API, that might result in misguided affected person diagnoses or poor medical decision-making.

Trace Well being is one medical know-how vendor that added the Whisper API final 12 months, giving docs the flexibility to report affected person consultations throughout the vendor’s app and transcribe them with OpenAI’s massive language fashions.

In the meantime, greater than 30,000 clinicians and 40 well being programs, reminiscent of Kids’s Hospital Los Angeles, use ambient AI from Nable that comes with a Whisper-based device. Nabla stated Whisper has been used to transcribe roughly seven million medical visits, based on the report.

A spokesperson for that firm cited a weblog posted on Monday that addresses the precise steps the corporate takes to make sure fashions are appropriately used and monitored in utilization.

“Nabla detects incorrectly generated content material based mostly on guide edits to the be aware and plain language suggestions,” the corporate stated within the weblog. “This offers a exact measure of real-world efficiency and offers us further inputs to enhance fashions over time.”

Of be aware, Whisper can be built-in into some variations of OpenAI’s flagship chatbot ChatGPT, and is a built-in providing in Oracle and Microsoft’s cloud computing platforms, based on the AP.

In the meantime, OpenAI warns customers that the device shouldn’t be utilized in “high-risk domains” and recommends in its on-line disclosures in opposition to utilizing Whisper in “decision-making contexts, the place flaws in accuracy can result in pronounced flaws in outcomes.”

“Will the following mannequin enhance on the difficulty of large-v3 producing a major quantity of hallucinations?,” one consumer requested on OpenAI’s GitHub Whisper dialogue board on Tuesday. A query that was unanswered at press time.

“This appears solvable if the corporate is prepared to prioritize it,” William Saunders, a San Francisco-based analysis engineer who left OpenAI earlier this 12 months, informed the AP. “It’s problematic in the event you put this on the market and persons are overconfident about what it may possibly do and combine it into all these different programs.”

Of be aware, OpenAI just lately posted a job opening for a well being AI analysis scientist, whose chief tasks can be to “design and apply sensible and scalable strategies to enhance security and reliability of our fashions” and “consider strategies utilizing health-related knowledge, guaranteeing fashions present correct, dependable and reliable info.”

THE LARGER TREND

In September, Texas Lawyer Common Ken Paxton introduced a settlement with Dallas-based synthetic intelligence developer Items Applied sciences over allegations that the corporate’s generative AI instruments had put affected person security in danger by overpromising accuracy. That firm makes use of genAI to summarize real-time digital well being report knowledge about affected person circumstances and coverings.

And in a examine LLM accuracy in producing medical notes by the College of Massachusetts Amherst and Mendel, an AI firm targeted on AI hallucination detection, there have been many errors.

Researchers in contrast Open AI’s GPT-4o and Meta’s Llama-3 and located of fifty medical notes, GPT had 21 summaries with incorrect info and 50 with generalized info, whereas Llama had 19 errors and 47 generalizations.

ON THE RECORD

“We take this situation significantly and are frequently working to enhance the accuracy of our fashions, together with lowering hallucinations,” a spokesperson for OpenAI informed Healthcare IT Information by e mail Tuesday.

“For Whisper use on our API platform, our utilization insurance policies prohibit use in sure high-stakes decision-making contexts, and our mannequin card for open-source use consists of suggestions in opposition to use in high-risk domains. We thank researchers for sharing their findings.”

Andrea Fox is senior editor of Healthcare IT Information.
E-mail: afox@himss.org
Healthcare IT Information is a HIMSS Media publication.

Post Views: 128

OpenAI’s general-purpose speech recognition mannequin is flawed, researchers say

Exploring the societal impacts of AI | MIT News

SmartThings Blog

Generate single title from this title Best AI Tools for E-Commerce to Use in 2026 in 100 -150 characters. And it must return only...

New chip could help tiny robots traverse complex environments | MIT News

Generate single title from this title Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AI in 100 -150 characters. And...

Exploring the societal impacts of AI | MIT News

SmartThings Blog

Generate single title from this title Best AI Tools for E-Commerce to Use in 2026 in 100 -150 characters. And it must return only...

New chip could help tiny robots traverse complex environments | MIT News

Generate single title from this title Building AI Agents for AR Glasses and XR Devices with NVIDIA XR AI in 100 -150 characters. And...

Generate single title from this title Google Cloud generative AI automates council planning operations in 100 -150 characters. And it must return only title...

Could AI tell you where you left your keys? | MIT News

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

LEAVE A REPLY Cancel reply

Latest

Exploring the societal impacts of AI | MIT News

SmartThings Blog

Generate single title from this title Best AI Tools for E-Commerce to Use in 2026 in 100 -150 characters. And it must return only...

Categories

Useful Links

Our Newsletter