Repression is Getting Smarter
Data Found in Plain Sight
A leaked database reveals that China has developed an AI system to supercharge its censorship machine, extending far beyond traditional taboos like the Tiananmen Square massacre.
An LLM for Detecting Dissent
The system’s creator tasks an unnamed LLM to figure out if a piece of content has anything to do with sensitive topics related to politics, social life, and the military. Top-priority topics include pollution and food safety scandals, financial fraud, and labor disputes, which are hot-button issues in China that sometimes lead to public protests.
Inside the Training Data
From this huge collection of 133,000 examples that the LLM must evaluate for censorship, TechCrunch gathered 10 representative pieces of content. Topics likely to stir up social unrest are a recurring theme. One snippet, for example, is a post by a business owner complaining about corrupt local police officers shaking down entrepreneurs, a rising issue in China as its economy struggles.
Built for “Public Opinion Work”
The dataset doesn’t include any information about its creators. But it does say that it’s intended for "public opinion work," which offers a strong clue that it’s meant to serve Chinese government goals, one expert told TechCrunch.
Repression is Getting Smarter
The dataset examined by TechCrunch is the latest evidence that authoritarian governments are seeking to leverage AI for repressive purposes. OpenAI released a report last month revealing that an unidentified actor, likely operating from China, used generative AI to monitor social media conversations — particularly those advocating for human rights protests against China — and forward them to the Chinese government.
Contact Us
If you know more about how AI is used in state oppression, you can contact Charles Rollet securely on Signal at charlesrollet.12 You also can contact TechCrunch via SecureDrop.
FAQs
Q: What is the purpose of the AI system developed by China?
A: The AI system is designed to supercharge China’s censorship machine, extending beyond traditional taboos like the Tiananmen Square massacre.
Q: What are the top-priority topics for the LLM to detect dissent?
A: Top-priority topics include pollution and food safety scandals, financial fraud, and labor disputes, which are hot-button issues in China that sometimes lead to public protests.
Q: What is the intended use of the dataset?
A: The dataset is intended for "public opinion work," which is overseen by the powerful Chinese government regulator, the Cyberspace Administration of China (CAC), and typically refers to censorship and propaganda efforts.
Q: How is the AI system used for repressive purposes?
A: The AI system is used to monitor social media conversations, particularly those advocating for human rights protests against China, and forward them to the Chinese government.
Q: Can I contact TechCrunch for more information?
A: Yes, you can contact Charles Rollet securely on Signal at charlesrollet.12 or contact TechCrunch via SecureDrop.

