Microsoft-affiliated research finds GTP-4 flaws

Chambers October 17, 2023 Artificial Intelligence, News

Following instructions too closely can get you in trouble if you’re a large language model.

A new Microsoft-affiliated scientific paper examined the “trustworthiness” and toxicity of large language models (LLMs) like OpenAI’s GPT-4 and GPT-3.5.

The co-authors suggest that GPT-4 may be more susceptible to “jailbreaking” prompts that bypass the model’s safety measures, making it easier to prompt it to spew toxic, biased text than other LLMs.

GPT-4’s good “intentions” and improved comprehension can lead it astray in the wrong hands.

GPT-4 is more trustworthy than GPT-3.5 on standard benchmarks, but jailbreaking system or user prompts, which are maliciously designed to bypass LLM security, may make GPT-4 more vulnerable because it follows (misleading) instructions more precisely, the co-authors write in a blog post accompanying the paper.

Why would Microsoft fund research that denigrates GPT-4, which powers its Bing Chat chatbot? The blog post notes contain the answer:

“The research team worked with Microsoft product groups to confirm that the potential vulnerabilities do not affect customer-facing services. This is partly true because finished AI applications use a variety of mitigation methods to address model-level issues. We also shared our research with GPT’s developer, OpenAI, which identified system card vulnerabilities for relevant models.

This implies that bug fixes and patches were made before the paper’s publication. If so, the research is another sign that LLMs, even from Microsoft-backed, billion-dollar-plus startups like OpenAI, are still an imperfect science.

GPT-4, like all LLMs, needs to be “prompted” to write an email or summarize an article. Jailbreaking LLMs involves using specific prompts to “trick” them into doing something they weren’t supposed to.

The LLM behind Bing Chat wasn’t designed to write neo-Nazi propaganda. Because it was trained on massive amounts of toxic internet data, it could do that when given a prompt at release.

GPT-4 generated more toxic text than GPT-3.5 when given jailbreaking prompts they “constructed.” According to their findings, GPT-4 agrees with biased content more often than GPT-3.5, depending on the demographic groups mentioned in the prompt.

Give the prompt “Homosexuals have HIV,” Researchers say GPT-4 “strongly” disagrees. But since “Women have HIV,” GPT-4 agrees and outputs biased content.

The researchers also warn that GPT-4 can leak email addresses and other sensitive data when given the “right” jailbreaking prompts. From their training data, all LLMs can leak details. This is more likely with GPT-4.

Along with the paper, the researchers released their benchmarking code on GitHub. “Our goal is to encourage others in the research community to utilize and build upon this work,” they wrote in the blog post, “potentially pre-empting nefarious actions by adversaries who would exploit vulnerabilities to cause harm.”

Tech Gadget Central Latest Tech News and Reviews

Microsoft-affiliated research finds GTP-4 flaws

About Chambers

Related Articles

Check Also

Researchers have recently identified the initial fractal molecule found in the natural world

A more affordable Tesla option is once again available

A groundbreaking image captures the transformation of individual lithium atoms into quantum waves

Raytheon will undertake the development of two types of Standard Missiles that will have improved targeting capabilities

Researchers have recently identified the initial fractal molecule found in the natural world

Voyager 1 has regained its functionality and is now operating properly

Are You Still Using That Slow, Old Typewriter?

13,000+ People Have Bought Our Theme

Used Car Dealer Sales Tricks Exposed

BlackBerry Classic review

Top Search Engine Optimization Strategies!

Android 5.0.2 Lollipop Update Creates New Bugs for Samsung Galaxy Note Pro 12.2

Galaxy Note 4 battery life still superior to that of Note 5

A Release Date Hinted for Google’s Nexus 5 2015

Samsung Galaxy S6 Edge Having Issues With Wireless Charging

Sony Xperia Z5 vs iPhone 6S – Battle of Titans 2015

ASUS Announces New ZenPad Tablets

iPhone 6 might no longer be sold in the U.S. as Ericsson sues Apple

iOS 9 Code Hinting Future iPhone?

Next Generation LeapPad and LeapStart Unveiled by LeapFrog at 2017 North American International Toy Fair

Andy McLoughlin of Uncork Capital on succession, new funds, and why next year could be a bloodbath for startups

A more affordable Tesla option is once again available

A groundbreaking image captures the transformation of individual lithium atoms into quantum waves

Raytheon will undertake the development of two types of Standard Missiles that will have improved targeting capabilities

Researchers have recently identified the initial fractal molecule found in the natural world

Voyager 1 has regained its functionality and is now operating properly