Leaked data exposes a Chinese AI censorship machine

Repression is Getting Smarter

Data Found in Plain Sight

A leaked database reveals that China has developed an AI system to supercharge its censorship machine, extending far beyond traditional taboos like the Tiananmen Square massacre.

An LLM for Detecting Dissent

The system’s creator tasks an unnamed LLM to figure out if a piece of content has anything to do with sensitive topics related to politics, social life, and the military. Top-priority topics include pollution and food safety scandals, financial fraud, and labor disputes, which are hot-button issues in China that sometimes lead to public protests.

Inside the Training Data

From this huge collection of 133,000 examples that the LLM must evaluate for censorship, TechCrunch gathered 10 representative pieces of content. Topics likely to stir up social unrest are a recurring theme. One snippet, for example, is a post by a business owner complaining about corrupt local police officers shaking down entrepreneurs, a rising issue in China as its economy struggles.

Built for “Public Opinion Work”

The dataset doesn’t include any information about its creators. But it does say that it’s intended for "public opinion work," which offers a strong clue that it’s meant to serve Chinese government goals, one expert told TechCrunch.

Repression is Getting Smarter

The dataset examined by TechCrunch is the latest evidence that authoritarian governments are seeking to leverage AI for repressive purposes. OpenAI released a report last month revealing that an unidentified actor, likely operating from China, used generative AI to monitor social media conversations — particularly those advocating for human rights protests against China — and forward them to the Chinese government.

Contact Us

If you know more about how AI is used in state oppression, you can contact Charles Rollet securely on Signal at charlesrollet.12 You also can contact TechCrunch via SecureDrop.

FAQs

Q: What is the purpose of the AI system developed by China?
A: The AI system is designed to supercharge China’s censorship machine, extending beyond traditional taboos like the Tiananmen Square massacre.

Q: What are the top-priority topics for the LLM to detect dissent?
A: Top-priority topics include pollution and food safety scandals, financial fraud, and labor disputes, which are hot-button issues in China that sometimes lead to public protests.

Q: What is the intended use of the dataset?
A: The dataset is intended for "public opinion work," which is overseen by the powerful Chinese government regulator, the Cyberspace Administration of China (CAC), and typically refers to censorship and propaganda efforts.

Q: How is the AI system used for repressive purposes?
A: The AI system is used to monitor social media conversations, particularly those advocating for human rights protests against China, and forward them to the Chinese government.

Q: Can I contact TechCrunch for more information?
A: Yes, you can contact Charles Rollet securely on Signal at charlesrollet.12 or contact TechCrunch via SecureDrop.

Post Views: 43

Leaked data exposes a Chinese AI censorship machine

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Categories

Useful Links

Our Newsletter