OpenAI Says DeepSeek May Have Improperly Harvested Its Data

OpenAI Reviews Evidence of Data Harvesting by Chinese Start-up DeepSeek

OpenAI’s Terms of Service Allegedly Breached

OpenAI, a San Francisco-based start-up valued at $157 billion, is reviewing evidence that DeepSeek, a Chinese company, may have harvested large amounts of data from its AI technologies to teach similar skills to its own systems. This process, known as distillation, is common in the AI field, but OpenAI’s terms of service prohibit anyone from using data generated by its systems to build technologies that compete in the same market.

"We know that groups in the P.R.C. are actively working to use methods, including what’s known as distillation, to replicate advanced U.S. AI models," said OpenAI spokeswoman Liz Bourgeois. "We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more."

DeepSeek’s A.I. Technologies

DeepSeek, which has sent shockwaves through Silicon Valley, has been making waves by releasing AI technologies that match the performance of anything else on the market. The prevailing wisdom had been that the most powerful systems could not be built without billions of dollars in specialized computer chips, but DeepSeek claims to have created its technologies using far fewer resources.

AI Companies and Open Sourcing

AI companies like DeepSeek and OpenAI rely heavily on a practice called open sourcing, freely sharing code that underpins their technologies and reusing code shared by others. They see this as a way of accelerating technological development. AI companies also need massive amounts of online data to train their systems, which learn their skills by pinpointing patterns in text, computer programs, images, sounds, and videos.

Distillation and Data Harvesting

Distillation is often used to train new systems. While it is allowed by open source technologies, it may be legally problematic if a company takes data from proprietary technology. OpenAI’s terms of service explicitly prohibit anyone from using data generated by its systems to build technologies that compete in the same market.

Lawsuits Against OpenAI

OpenAI is facing more than a dozen lawsuits accusing it of illegally using copyrighted internet data to train its systems. One of these lawsuits was brought by The New York Times against OpenAI and its partner Microsoft, alleging that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information. Both OpenAI and Microsoft deny the claims.

Conclusion

The situation highlights the complex issues surrounding AI development and data harvesting. While OpenAI’s terms of service prohibit data harvesting, the company’s own practices have been called into question. The case is ongoing, and the outcome will have significant implications for the AI industry.

FAQs

Q: What is distillation in AI?
A: Distillation is a process used to train new AI systems by using data generated by existing systems.

Q: What is open sourcing in AI?
A: Open sourcing is the practice of freely sharing code that underpins AI technologies and reusing code shared by others to accelerate technological development.

Q: What are the implications of OpenAI’s terms of service breach?
A: The breach could have significant implications for the AI industry, including the potential for legal action and reputational damage.

Q: What is the current status of the lawsuit against OpenAI and Microsoft?
A: The lawsuit is ongoing, with both parties denying the claims.

Post Views: 74

OpenAI Says DeepSeek May Have Improperly Harvested Its Data

SmartThings Blog

A better method for planning complex visual tasks | MIT News

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

SmartThings Blog

A better method for planning complex visual tasks | MIT News

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

LEAVE A REPLY Cancel reply

Latest

SmartThings Blog

A better method for planning complex visual tasks | MIT News

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Categories

Useful Links

Our Newsletter