Apple Improves AI Models with Synthetic Data and Differential Privacy
Addressing Criticism over Underwhelming Performance
In the wake of criticism over the underwhelming performance of its AI products, especially in areas like notification summaries, Apple on Monday detailed how it is trying to improve its AI models by analyzing user data privately with the aid of synthetic data.
Using Differential Privacy and Synthetic Data
Using an approach called “differential privacy,” the company said it would first generate synthetic data and then poll users’ devices (provided they’ve opted-in to share device analytics with Apple) with snippets of the generated synthetic data to compare how accurate its models are, and subsequently improve them.
How Synthetic Data is Created
“Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content,” the company wrote in a blog post. “To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics […] We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length.”
Comparing Synthetic Data with User Data
The company said these embeddings are then sent to a small number of user devices that have opted in to Device Analytics, and the devices then compare them with a sample of emails to tell Apple which embeddings are most accurate.
Improving AI Models
The company said it is using this approach to improve its Genmoji models, and would in the future use synthetic data for Image Playground, Image Wand, Memories Creation and Writing Tools as well as Visual Intelligence. Apple said it would also poll users who opt in to share device analytics with synthetic data to improve email summaries.
Conclusion
Apple’s approach to improving its AI models using synthetic data and differential privacy is a significant step towards ensuring user data privacy while still providing accurate and personalized experiences. By generating synthetic data that mimics real user data, Apple can train its AI models without compromising user privacy.
FAQs
Q: What is synthetic data?
A: Synthetic data is created to mimic the format and important properties of user data, but does not contain any actual user generated content.
Q: How is synthetic data created?
A: Synthetic data is created by generating a large set of synthetic messages on a variety of topics, and then deriving a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message.
Q: How does Apple use synthetic data to improve its AI models?
A: Apple uses synthetic data to improve its AI models by sending snippets of the generated synthetic data to user devices that have opted in to share device analytics, and then comparing the synthetic data with user data to improve the accuracy of its models.
Q: What are the benefits of using synthetic data and differential privacy?
A: The benefits of using synthetic data and differential privacy include ensuring user data privacy while still providing accurate and personalized experiences, and allowing Apple to train its AI models without compromising user privacy.