A timely reminder that some nations do have laws that already apply to cutting-edge AI has just been issued by Italy’s data protection authority, which has ordered OpenAI to stop processing people’s data locally with immediate effect. The order comes two days after an open letter called for a moratorium on the development of more potent generative AI models so regulators can catch up with companies like ChatGPT.
The maker of ChatGPT may be violating the General Data Protection Regulation of the European Union, according to the Italian DPA (GDPR).
The guarantee noted in particular that it has ordered ChatGPT to be blocked due to worries that OpenAI has improperly processed people’s data and also because there is no system in place to prevent children from using the technology.
The San Francisco-based corporation has 20 days to reply to the directive; failure to do so will result in serious consequences. (Remember that penalties for violations of the EU’s data protection regime can reach up to 4% of annual revenue or €20 million, whichever is greater.)
It’s important to note that any data protection body is permitted to take action under the GDPR if it identifies dangers to local consumers because OpenAI does not have a legal organization established in the EU. (So, where Italy leads, others might follow.)
A range of GDPR concerns
Every time the personal data of EU users is processed, the GDPR is in effect. And it’s obvious that OpenAI’s huge language model has been processing this kind of data because, for instance, it can write biographies of specific people in the area on demand (we know because we’ve tested it). The training data utilized for the most recent version of the technology, GPT-4, was not disclosed by OpenAI. However, it has been revealed that prior models had their training data taken from the Internet, including Reddit forums. Thus, if you’ve been online for a while, there’s a good possibility the bot recognizes your name.
Moreover, ChatGPT has been demonstrated to produce entirely bogus information about named individuals, ostensibly inventing information that is missing from its training data. Considering that the GDPR gives Europeans a number of rights over their data, including the right to have mistakes in their data corrected, this might potentially create more GDPR concerns. It’s also unclear how—or even if—people can ask OpenAI to correct inaccurate statements the bot has made about them, to use one hypothetical example.
The guarantee’s statement also draws attention to a data breach the site experienced earlier this month, when OpenAI acknowledged that a tool that recorded conversations had been leaking users’ chats and stated it might have exposed some customers’ financial information.
Another area that the GDPR governs is data breaches, with a focus on making sure that organizations that process personal data are safeguarding the data effectively. The EU-wide rule further stipulates that substantial violations must be reported to the appropriate supervisory authorities within a certain amount of time.
Overarching all of this is the big (bigger) question of what legal foundation OpenAI first used to process data about Europeans. also known as whether this processing is legal.
The GDPR allows for a variety of scenarios, from consent to public interest, but as the guarantee notes (pointing to the “mass collection and storage of personal data”), the scale of processing to train these large language models complicates the legality question. Data minimization is another major focus of the regulation, which also contains principles that require transparency and fairness. The (now) for-profit firm behind ChatGPT does not, at the very least, appear to have informed the individuals whose data has been used to train its commercial AIs. Which can present it with some challenging issues.
If OpenAI has processed Europeans’ data unlawfully, DPAs throughout the EU could order the data to be deleted; but, as an existing regulation struggles to keep up with cutting-edge technology, it is unclear if that would require it to retrain models developed on data obtained improperly.
On the other hand, Italy might have accidentally outlawed all machine learning.
The DPA writes in its statement today [which we’ve translated from Italian using AI]: “[T]he Privacy Guarantor notes the lack of information provided to users and all interested parties whose data is collected by OpenAI, but more importantly, the absence of a legal basis that justifies the mass collection and storage of personal data for the purpose of “training” the algorithms underlying the operation of the platform.
The information provided by ChatGPT “does not always correspond to the true data, as indicated by the checks carried out,” it continued. This results in the erroneous handling of personal data.
The authorities also expressed worry over the possibility that OpenAI may process the data of children as it does not take any steps to actively exclude anyone under the age of 13 from signing up to use the chatbot, such as by implementing age verification technology.
The regulator has been highly active in the area of risks to children’s data; most recently, it ordered a similar ban on Replika, an AI chatbot for virtual friendships, due to worries about children’s safety. In recent years, it has also pursued TikTok over minors using the service, requiring the latter to delete more than 500,000 accounts that it could not certify were not those of minors.
OpenAI may therefore be obliged to erase any users’ accounts and start over with a more thorough sign-up process if it is unable to conclusively verify the age of any users it has signed up in Italy.
As a response to the guarantee’s directive, OpenAI was approached.