While publishing a storage bucket of open-source training data on GitHub, Microsoft AI researchers accidentally exposed tens of terabytes of sensitive data, including private keys and passwords.
Cloud security startup Wiz told it found Microsoft’s AI research division’s GitHub repository while investigating accidental cloud data exposure.
Open source code and AI models for image recognition were available on GitHub. Readers were instructed to download the models from Azure Storage. Wiz discovered this URL was configured to grant permissions on the entire storage account, accidentally exposing more private data.
Two Microsoft employees’ personal computer backups were among the 38 terabytes of sensitive data. Other sensitive data included Microsoft service passwords, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of Microsoft employees.
Wiz reported that the URL, which had exposed this data since 2020, was misconfigured to allow “full control” rather than “read-only” permissions, allowing anyone to delete, replace, and inject malicious content.
Storage account was not directly exposed, says Wiz. Microsoft AI developers used an overly permissive SAS token in the URL. Azure SAS tokens let users create shareable links to Azure Storage account data.
“AI unlocks huge potential for tech companies,” Wiz co-founder and CTO Ami Luttwak told . Data scientists and engineers must add security measures to the massive amounts of data they handle as they rush to produce new AI solutions. Many development teams must manipulate massive amounts of data, share it with peers, or collaborate on public open-source projects, making cases like Microsoft’s harder to monitor and avoid.
Wiz informed Microsoft of its findings on June 22, and Microsoft revoked the SAS token on June 24. Microsoft said its organizational impact investigation was complete on August 16.
Before publication, Microsoft’s Security Response Center told that “no customer data was exposed, and no other internal services were put at risk because of this issue.”
Microsoft expanded GitHub’s secret spanning service, which monitors all public open-source code changes for plaintext credentials and other secrets, to include SAS tokens with overly permissive expirations or privileges based on Wiz’s research.