There’s no putting toothpaste back in the tube
Lasso’s Findings
Lasso, a cybersecurity research group, recently conducted an investigation on Microsoft’s Copilot, a large language model. Their findings revealed that despite Microsoft’s attempts to remove sensitive information from GitHub, the data was still accessible to Copilot, leaving private information vulnerable to exploitation.
The Fix was Only Partial
According to Lasso, Microsoft’s fix involved blocking public access to a special Bing user interface, once available at cc.bingj.com. However, this fix did not clear the private pages from the cache itself, leaving the sensitive information still accessible to Copilot.
Cached Pages Continued to Appear in Search Results
Lasso explained that although Bing’s cached link feature was disabled, cached pages continued to appear in search results, indicating that the fix was only temporary and did not fully remove the underlying data.
Copilot Still Had Access to Sensitive Information
When Lasso revisited their investigation of Microsoft Copilot, they found that the model still had access to the cached data that was no longer available to human users. In short, the fix was only partial, preventing human users from retrieving the cached data, but allowing Copilot to still access it.
The Consequences of Sensitive Information Sharing
Developers often embed security tokens, private encryption keys, and other sensitive information directly into their code, despite best practices that advise against it. This can lead to significant security risks when this code is made available in public repositories.
The Importance of Credential Rotation
When sensitive information is exposed, the only recourse is to rotate all credentials. This is crucial, as making the code private is not enough to contain the damage. Lasso’s findings highlight the need for developers to take extra precautions to protect sensitive information.
Microsoft’s Statement
In an emailed statement, Microsoft wrote: "It is commonly understood that large language models are often trained on publicly available information from the web. If users prefer to avoid making their content publicly available for training these models, they are encouraged to keep their repositories private at all times."
Conclusion
The findings of Lasso’s investigation serve as a stark reminder of the importance of protecting sensitive information. Despite Microsoft’s efforts to remove sensitive data from GitHub, Copilot still has access to the information, highlighting the need for developers to take extra precautions to ensure the security of their code. By rotating credentials and keeping repositories private, developers can minimize the risk of sensitive information being compromised.
FAQs
Q: What did Lasso’s investigation find?
A: Lasso’s investigation found that despite Microsoft’s attempts to remove sensitive information from GitHub, the data was still accessible to Copilot, leaving private information vulnerable to exploitation.
Q: How did Microsoft fix the issue?
A: Microsoft blocked public access to a special Bing user interface, but this fix did not clear the private pages from the cache itself, leaving the sensitive information still accessible to Copilot.
Q: What are the consequences of sensitive information sharing?
A: Sensitive information sharing can lead to significant security risks when code is made available in public repositories, and can result in the compromise of sensitive data.
Q: What should developers do to protect sensitive information?
A: Developers should take extra precautions to protect sensitive information, including rotating credentials, keeping repositories private, and following best practices for secure coding.

