Large language models (LLMs) like OpenAI’s ChatGPT all lie.
The mistakes range from bizarre and harmless—claiming the Golden Gate Bridge was transported across Egypt in 2016—to dangerous and problematic.
A mayor in Australia threatened to sue OpenAI after ChatGPT misrepresented his plea to a major bribery scandal. Researchers found that LLM hallucinations can be used to distribute malware to unsuspecting software developers. LLMs often give bad mental health and medical advice, like wine can “prevent cancer.”
Hallucination occurs because today’s LLMs and all generative AI models are developed and trained to invent “facts”.
Train models
Generative AI models are statistical systems that predict words, images, speech, music, and other data. AI models learn how likely data is to occur based on patterns, including context, from a huge number of examples, usually from the public web.
An LLM may complete an email ending with “Looking forward…” with “… to hearing back” based on its training on numerous emails. It doesn’t mean the LLM is excited.
Sebastian Berns, a Ph.D. researcher at Queen Mary University of London, told in an email interview that “the current framework of training LLMs involves concealing, or ‘masking,’ previous words for context” and having the model predict which words should replace them. This is like using predictive text in iOS and pressing one of the suggested next words.”
This probability-based approach scales well—mostly. While the range of words and their probabilities may produce comprehensible text, it’s not guaranteed.
LLMs can make grammatically correct but nonsensical claims, like the Golden Gate claim. They can also lie and spread training data errors. They can also mix up sources, including fictional ones, even if they contradict each other.
LLMs aren’t malicious. True and false are meaningless to them, and they have no malice. They’ve learned to associate words or phrases with concepts, even if they’re wrong.
‘Hallucinations’ are caused by an LLM’s inability to estimate its prediction’s uncertainty, Berns said. “LLMs are trained to always output, even when the input is very different from the training data. A standard LLM cannot determine if it can reliably answer a query or make a prediction.
Solving hallucinations
Does hallucination have a solution? It depends on what you mean by “solved.”
Vu Ha, an Allen Institute for Artificial Intelligence applied researcher and engineer, says LLMs “do and will always hallucinate.” He also thinks LLM training and deployment can reduce hallucinations, but not eliminate them.
“Consider a question answering system,” Ha emailed. “It can be engineered to have high accuracy by curating a high-quality knowledge base of questions and answers and connecting it to an LLM to provide accurate answers via a retrieval-like process.”
Ha showed the difference between an LLM with a “high quality” knowledge base and one with less data curation. He asked, “Who are the Toolformer paper authors?” Microsoft’s LLM-powered Bing Chat and Google’s Bard train Meta’s AI model Toolformer. Bing Chat correctly listed all eight Meta co-authors, while Bard misattributed the paper to Hugging Face and Google researchers.
Any deployed LLM-based system will hallucinate. “The real question is if the benefits outweigh the negative outcomes of hallucination,” Ha said. Thus, if a model is helpful but occasionally misses a date or name, it may be worth the trade-off. “It’s a question of maximizing AI expected utility,” he said.
Reinforcement learning from human feedback had also reduced LLM hallucinations, according to Berns. RLHF, introduced by OpenAI in 2017, involves training an LLM, gathering more data to train a “reward” model, and fine-tuning the LLM with the reward model via reinforcement learning.
RLHF generates text from predefined data prompts using an LLM. Then, human annotators rank LLM outputs by “helpfulness” to train the reward model. The reward model, which can now score any text based on human perception, is used to fine-tune the LLM’s responses.
RLHF trained OpenAI models like GPT-4. Berns warned that even RLHF is imperfect.
“I believe the space of possibilities is too large to fully ‘align’ LLMs with RLHF,” Berns said. RLHF often trains a model to give a ‘I don’t know’ answer to a tricky question, relying on human domain knowledge and hoping the model generalizes it to its own domain knowledge. Often, but finicky.”
Alternative views
If hallucination can’t be treated with today’s LLMs, is that bad? Berns disagrees, actually. He suggests that hallucinating models can act as a “co-creative partner,” producing outputs that may not be entirely factual but have useful threads to pull on. Creative hallucination can produce unexpected results or combinations of ideas.
He said “Hallucinations” are a problem if LLM-generated statements are factually incorrect or violate general human, social, or specific cultural values in situations where a person relies on the LLM as an expert. But in creative or artistic tasks, unexpected outputs can be useful. A human recipient may be surprised by a query response and be pushed into a certain direction of thought, which may lead to a novel idea connection.
Ha claimed that modern LLMs are being held to an unreasonable standard because humans “hallucinate” when we lie. He believes LLMs cause cognitive dissonance because they produce outputs that appear good but contain errors.
“LLMs, like any AI technique, are imperfect and make mistakes,” he said. Since we expect and accept imperfections, we’re OK with AI systems making mistakes. It’s more complicated when LLMs make mistakes.”
The technical details of generative AI models may not be the answer. As far as a “solution” to hallucination, treating model predictions with skepticism seems best.