One can realize their mistake when they manage to upset the White House, the TIME Person of the Year, and a highly passionate fanbase from pop culture. Last week, X, the platform owned by Elon Musk and formerly known as Twitter, experienced a viral spread of AI-generated deepfake images featuring explicit content of Taylor Swift.
A highly popular post featuring nonconsensual, explicit deepfakes garnered over 45 million views and received hundreds of thousands of likes. Additionally, it should be noted that numerous accounts have also shared the images in separate posts. Once an image has reached such wide circulation, its removal becomes exceedingly difficult.
X currently lacks the necessary infrastructure to efficiently and effectively identify and address instances of abusive content on a large scale. In the era of Twitter, addressing this problem was already challenging. However, the situation has deteriorated significantly following Musk’s decision to downsize a substantial portion of Twitter’s workforce, including a significant number of its trust and safety personnel. So, Taylor Swift’s massive and passionate fanbase took matters into their own hands, flooding search results for queries like “taylor swift ai” and “taylor swift deepfake” to make it more difficult for users to find the abusive images. While the White House press secretary urged Congress to take action, X temporarily prohibited the use of the search term “Taylor Swift” for a brief period. Upon searching the musician’s name, users would encounter a notification indicating an error had transpired.
This content moderation failure gained significant media attention due to the involvement of Taylor Swift, a well-known public figure. If social platforms are unable to safeguard a globally renowned figure, it raises questions about their ability to provide protection.
“In cases similar to Taylor Swift’s, where many people have experienced similar situations, the level of support received may not be equal due to differences in influence. Consequently, individuals may not have access to crucial support networks,” explained Dr. Carolina Are, a fellow at Northumbria University’s Centre for Digital Citizens in the U.K., in an interview with TechCrunch. “These communities of care are often the last resort for users facing such situations, highlighting the shortcomings of content moderation.”
Banning the search term “taylor swift” is like putting a piece of Scotch tape on a burst pipe. Several workarounds are commonly employed, such as TikTok users searching for the term “seggs” instead of “sex”. X could have implemented the search block as a means to appear productive, but it fails to prevent individuals from simply searching for “t swift” instead. Mike Masnick, the founder of Copia Institute and Techdirt, described the initiative as an excessively forceful approach to ensuring trust and safety.
“Platforms often fail to empower women, non-binary individuals, and queer individuals to have control over their bodies, leading to the replication of offline systems of abuse and patriarchy,” stated Are. “If moderation systems fail to respond promptly during a crisis or fail to address user reports effectively, there is a significant issue.”
What actions could X have taken to avoid the Taylor Swift controversy?
As part of her research, Are raises these questions and suggests that social platforms require a comprehensive revamp in their approach to content moderation. She recently organized a series of roundtable discussions involving 45 internet users from different parts of the world who have been affected by censorship and abuse. The purpose of these discussions was to gather recommendations for online platforms on how they can bring about meaningful change.
One recommendation is for social media platforms to be more transparent with individual users about decisions regarding their account or their reports about other accounts.
“Access to case records is restricted, despite the fact that platforms possess the necessary information. However, they choose not to disclose it,” stated Are. I believe that addressing abuse requires a response that is tailored to the individual’s circumstances, is grounded in the specific context, and is delivered promptly. This response should ideally involve direct communication, if not in-person assistance.
X recently announced its plans to hire 100 content moderators for its new “Trust and Safety” center in Austin, Texas. However, the platform under Musk’s leadership has not established a robust standard for safeguarding marginalized users from instances of abuse. It can also be challenging to take Musk at face value, as the mogul has a long track record of failing to deliver on his promises. Upon acquiring Twitter, Musk expressed his intention to establish a content moderation council as a preliminary step before making any significant determinations. This event did not occur.
When it comes to AI-generated deepfakes, the responsibility extends beyond social platforms. It’s also on the companies that create consumer-facing generative AI products.
According to an investigation by 404 Media, the abusive depictions of Swift came from a Telegram group devoted to creating nonconsensual, explicit deepfakes. The members of the group often use Microsoft Designer, which draws from OpenAI’s DALL-E 3 to generate images based on inputted prompts. Microsoft has already addressed a loophole where users were able to generate images of celebrities by using prompts such as “taylor ‘singer’ swift” or “jennifer ‘actor’ aniston.”
In a letter addressed to the Washington state attorney general, Shane Jones, a principal software engineering lead at Microsoft, reported discovering vulnerabilities in DALL-E 3 back in December. These vulnerabilities allowed for the circumvention of certain safeguards that were put in place to prevent the model from generating and disseminating potentially harmful images.
Jones promptly notified Microsoft and OpenAI about the vulnerabilities. However, despite the passage of two weeks, he did not receive any indication that the issues were being addressed. He published a letter on LinkedIn, requesting OpenAI to temporarily halt the availability of DALL-E 3. Jones promptly notified Microsoft about his letter, but he was promptly requested to remove it.
In his letter to the state attorney general, Jones emphasized the importance of holding companies responsible for the safety of their products and their obligation to inform the public about any known risks. “Employees who have concerns should not feel pressured to remain silent.”
According to OpenAI, they promptly examined Jones’ report and determined that the technique he described did not circumvent their safety systems.
“We have taken measures to ensure that the DALL-E 3 model is not trained on explicit or harmful content, such as graphic sexual and violent images. Additionally, we have implemented strong image classifiers to guide the model in generating safe and non-harmful images,” stated a representative from OpenAI. “We’ve also implemented additional safeguards for our products, ChatGPT and the DALL-E API – including declining requests that ask for a public figure by name.”
OpenAI stated that it employs external red teaming to thoroughly evaluate its products for potential misuse. It’s still not confirmed if Microsoft’s program is responsible for the explicit Swift deepfakes, but the fact stands that as of last week, both journalists and bad actors on Telegram were able to use this software to generate images of celebrities.
In order to effectively regulate abusive content, platforms must adopt a proactive approach, especially as influential companies invest heavily in AI. However, even in the past when creating celebrity deepfakes was not as simple, moderation struggled to catch violative behavior.
“The observation highlights the inherent unreliability of platforms,” Are remarked. “Marginalized communities must place their trust in their followers and fellow users rather than relying solely on those who hold the technical responsibility for our online safety.”