Apple is taking a new approach to training its AI models – one that avoids collecting or copying user content from iPhones or Macs.
According to a recent blog post, the company plans to continue to rely on synthetic data (constructed data that is used to mimic user behaviour) and differential privacy to improve features like email summaries, without gaining access to personal emails or messages.
Improving Genmoji and other Apple Intelligence features
The company already uses differential privacy to improve features like Genmoji, where it collects general trends about which prompts are most popular without linking any prompt with a specific user or device. In upcoming releases, Apple plans to apply similar methods to other Apple Intelligence features, including Image Playground, Image Wand, Memories Creation, and Writing Tools.
For Genmoji, the company anonymously polls participating devices to determine whether specific prompt fragments have been seen. Each device responds with a noisy signal – some responses reflect actual use, while others are randomised. The approach ensures that only widely-used terms become visible to Apple, and no individual response can be traced back to a user or device, the company says.
Curating synthetic data for better email summaries
While the above method has worked well with respect to short prompts, Apple needed a new approach for more complex tasks like summarising emails. For this, Apple generates thousands of sample messages, and these synthetic messages are converted into numerical representations, or ’embeddings,’ based on language, tone, and topic. Participating user devices then compare the embeddings to locally stored samples. Again, only the selected match is shared, not the content itself.
Apple collects the most frequently-selected synthetic embeddings from participating devices and uses them to refine its training data. Over time, this process allows the system to generate more relevant and realistic synthetic emails, helping Apple to improve its AI outputs for summarisation and text generation without apparent compromise of user privacy.
Available in beta
Apple is rolling out the system in beta versions of iOS 18.5, iPadOS 18.5, and macOS 15.5. According to Bloomberg’s Mark Gurman, Apple is attempting to address challenges with its AI development in this way, problems which have included delayed feature rollouts and the fallout from leadership changes in the Siri team.
Whether its approach will yield more useful AI outputs in practice remains to be seen, but it signals a clear public effort to balance user privacy with model performance.
Conclusion
Apple’s new approach to training its AI models is a significant step towards balancing user privacy with model performance. By relying on synthetic data and differential privacy, the company can improve its AI outputs without collecting or copying user content. This approach is expected to improve features like email summaries and other Apple Intelligence features.
FAQs
Q: What is synthetic data?
A: Synthetic data is constructed data that is used to mimic user behaviour.
Q: What is differential privacy?
A: Differential privacy is a method that introduces randomised data into broader datasets to help protect individual identities.
Q: How does Apple collect data for its AI models?
A: Apple collects synthetic data and uses differential privacy to improve its AI models without collecting or copying user content.
Q: What features will be improved using this new approach?
A: Apple’s new approach will improve features like email summaries, Genmoji, Image Playground, Image Wand, Memories Creation, and Writing Tools.
Q: Is this approach available for public use?
A: The system is currently available in beta versions of iOS 18.5, iPadOS 18.5, and macOS 15.5.