Artificial intelligence is evolving fast—but our understanding of how it really works hasn’t caught up. Despite widespread use of generative AI tools, even experts often struggle to explain what’s going on under the hood—especially when things go wrong. So what causes AI hallucinations and bias? A surprising new answer may lie in the laws of physics.
Neil Johnson, a physicist and professor at George Washington University, is bringing a fresh perspective to this mystery. His new paper, co-authored with researcher Frank Yingjie Huo, suggests that the problems plaguing AI, such as hallucinated responses and skewed predictions, may be baked into the very architecture of today’s large language models (LLMs).
In a study titled Capturing AI’s Attention: Physics of Repetition, Hallucination, Bias and Beyond, Johnson applies principles from quantum mechanics to examine the AI Attention mechanism—the core component that powers how LLMs generate coherent text. This approach offers new insights into why models like ChatGPT sometimes spiral into inaccuracies.
At its core, Johnson’s theory reframes the Attention mechanism as a type of quantum system. Think of each token, like a word in a sentence, as a particle or “spin.” That token sits inside a “spin bath” of other semantically related words. These spin baths interact based on how the model has been trained. According to Johnson, current LLMs function like a 2-body Hamiltonian—a simplified energy system in physics. But this design is where the trouble starts.
His math shows that a two-body model can’t always manage the complexity of language. It becomes highly sensitive to noise, such as bias in training data. This imbalance means some words or ideas can get amplified disproportionately, leading to unpredictable, even dangerous, outputs—what we commonly call AI hallucinations.
So why stick with this flawed model? The answer, Johnson suggests, is inertia. Today’s AI systems weren’t built from deep theoretical understanding—they evolved through trial, error, and scaling what happened to work. Like outdated rail tracks in the 1800s, this design got locked in because it was “good enough” and widely adopted.
History offers a useful parallel. George Stephenson’s narrow-gauge railway tracks became the British standard despite Isambard Kingdom Brunel’s superior broad-gauge system. Brunel’s design promised smoother, faster travel—but Stephenson’s had already won over investors and regulators. In the same way, today’s AI systems may not be the best possible design—but they’re embedded and profitable.
Still, Johnson’s findings offer hope. He proposes that by understanding the physics behind Attention, we can better predict when AI might veer off track. His equations suggest it’s possible to estimate how often a model will hallucinate, depending on the quality of its training. For a poorly trained model, it might be every 200 words. For a more robust one, it could be every 2,000.
This introduces a new concept: AI risk management through mathematical modeling. Just as insurance companies rely on actuarial data, AI developers and regulators could start using predictive formulas to gauge when and how AI errors might occur. This would allow more controlled deployment of AI tools in critical fields like healthcare, law, and national security.
The implications are huge. Rather than just reacting to AI failures, companies could assess risk ahead of time—similar to running safety checks on planes or nuclear reactors. It’s not about reinventing AI from scratch, but understanding its limitations with the same rigor that governs engineering and science.
Ultimately, Johnson’s work gives us a new lens through which to view AI systems—one that could help us build trust, improve reliability, and develop more secure models for the future. The hallucinations aren’t just random quirks. They’re signals from a system that, despite its impressive fluency, still needs deeper grounding in reality.