Microsoft and OpenAI probe alleged data theft by DeepSeek

Microsoft and OpenAI Investigate Potential Breach of AI System by Chinese Start-up DeepSeek

According to Bloomberg, Microsoft and OpenAI are investigating a potential breach of OpenAI’s system by a group allegedly linked to Chinese AI start-up DeepSeek. The investigation stems from suspicious data extraction activity detected in late 2024 via OpenAI’s application programming interface (API), sparking broader concerns over international AI competition.

Large-scale Data Extraction

Microsoft, OpenAI’s largest financial backer, first identified the large-scale data extraction and informed the ChatGPT maker of the incident. Sources believe the activity may have violated OpenAI’s terms of service or that the group may have exploited loopholes to bypass restrictions limiting how much data they could collect.

DeepSeek’s Rise to Prominence

DeepSeek has quickly risen to prominence in the competitive AI landscape, particularly with the release of its latest model, R-1, on January 20. Billed as a rival to OpenAI’s ChatGPT in performance but developed at a significantly lower cost, R-1 has shaken up the tech industry. Its release triggered a sharp decline in tech and AI stocks that wiped billions from US markets in a single week.

Model Distillation

David Sacks, the White House’s newly appointed "crypto and AI czar," alleged that DeepSeek may have employed questionable methods to achieve its AI’s capabilities. In an interview with Fox News, Sacks noted evidence suggesting that DeepSeek had used "distillation" to train its AI models using outputs from OpenAI’s systems.

"There’s substantial evidence that what DeepSeek did here is they distilled knowledge out of OpenAI’s models, and I don’t think OpenAI is very happy about this," Sacks told the network.

Geopolitical and Security Concerns

The growing competition between the US and China in the AI sector has underscored wider concerns regarding technological ownership, ethical governance, and national security. The US Navy has banned its personnel from using DeepSeek’s products, citing fears that the Chinese government could exploit the platform to access sensitive information.

Conclusion

The case highlights the risks posed by model distillation and the need for stricter measures to protect intellectual property and prevent data breaches. As AI systems advance and become increasingly integral to global economic and strategic planning, disputes over data usage and intellectual property are likely to intensify. The investigation into the alleged breach by DeepSeek has set a precedent for how AI developers police model usage and enforce terms of service.

FAQs

What is model distillation?
Model distillation is a technique that involves training one AI system using data generated by another, potentially allowing a competitor to develop similar functionality.
What is the concern about DeepSeek’s data collection practices?
Critics have highlighted DeepSeek’s privacy policy, which permits the collection of data such as IP addresses, device information, and even keystroke patterns, a scope of data collection considered excessive by some experts.
Why has the US Navy banned its personnel from using DeepSeek’s products?
The US Navy has banned its personnel from using DeepSeek’s products due to "potential security and ethical concerns associated with the model’s origin and usage."
What are the implications for the AI industry?
The growing competition between the US and China in the AI sector has underscored wider concerns regarding technological ownership, ethical governance, and national security, highlighting the need for stricter measures to protect intellectual property and prevent data breaches.

Post Views: 54

Microsoft and OpenAI probe alleged data theft by DeepSeek

Generate single title from this title Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE in 100 -150 characters. And it must return only...

Generate single title from this title Three ways school districts can build a sustainable AI framework in 100 -150 characters. And it must return...

SmartThings Blog

Generate single title from this title 3 ways students can use AI tools to improve their literacy skills in 100 -150 characters. And it...

Tackling the housing shortage with robotic microfactories | MIT News

Generate single title from this title Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE in 100 -150 characters. And it must return only...

Generate single title from this title Three ways school districts can build a sustainable AI framework in 100 -150 characters. And it must return...

SmartThings Blog

Generate single title from this title 3 ways students can use AI tools to improve their literacy skills in 100 -150 characters. And it...

Tackling the housing shortage with robotic microfactories | MIT News

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

the ‘Friend Yet Foe’ Paradox

Assetisation, LinkedIn, and the Future of Work

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE in 100 -150 characters. And it must return only...

Generate single title from this title Three ways school districts can build a sustainable AI framework in 100 -150 characters. And it must return...

SmartThings Blog

Categories

Useful Links

Our Newsletter