Navigating AI’s Bullshit: Understanding Affirmation Bias

“I don’t mean to be bad. I’m just drawn that way.” – Jessica Rabbit

I begged for the bartender’s secret recipe at a cult restaurant in Charlottesville. The waitress brought it at the end of the evening scrawled on the back of a blank guest check. “He’s never done that,” she gushed. I photographed it and fed it into my favorite AI. Within seconds, it listed the ingredients and then, unprompted, spun out a 500-word meditation on its brilliance, explaining the magic interactions that made it wonderful. The AI fixated on one ingredient as the masterstroke: “a barspoon of Del Maguey Vida (mezcal—adds smoke).”

Mezcal in a Manhattan? Weird. Maybe genius-weird. I could almost taste it.

But then I double checked the original recipe. No mezcal. Complete fabrication.

Affirmation Bias

ChatGPT, Claude, Gemini and their cousins are inexhaustible, infinitely patient research assistants, ghostwriters, therapists, coding partners, collaborators, buddies, doctors, lawyers, and much more rolled into one intoxicating package of productivity. Many of us are already addicted. I know I am. Jockeying between the three chatbots above, I can find exactly what I crave: eloquent instant results with added confirmation that I’m right and brilliant, that I’ve made important breakthroughs, asked just the right question, and possess special insight. Then they spin out details, analyses, charts, tables, images, references, spreadsheets, plans, scripts, and essays organized with subheads and bullet points beyond anything I could produce in years finishing with eager suggestions to give me more, tirelessly.

What a rush.

The trouble is, all these AIs are habitually prone to mistakes. Everyone calls them “hallucinations” as if they’re the victims of some pathology out of their control. But that’s not the right word. Michael Hicks, James Humphries and Joe Slater nailed it in their 2024 philosophy paper, “ChatGPT is Bullshit”. They based their definition on Harry Frankfurt’s distinction, On Bullshit: “Bullshit” isn’t “lying,” Frankfurt wrote. Liars know the truth and intentionally change or conceal it. Bullshitters are more dangerous: they don’t care whether what they say is true or false. Bullshit is indifferent to truth. Bullshitters aren’t trying to deceive, they want to persuade, build trust in a relationship, impress, seduce. Bullshit erodes the foundational hope that we can even know what reality is.

Hicks, Humphries and Slater show why large language models are designed to be extraordinarily good at bullshitting us. They’re engineered to generate plausible answers and to keep engaging you and they subordinate accuracy to these other design rewards.

Worse yet is what I call “Affirmation Bias.” AI systematically validates not just your hypotheses but also flatters and affirms you, then assembles supporting evidence along the way. It tells you you’re creative, that you’ve made a breakthrough. It confirms your hunches, then constructs supporting arguments. This is not just confirmation bias (where we favor information that supports our preconceptions) but something personal and specific to us. They’re seduction engines. If you’re not careful, they will lead you into their world, a territory where all their discourse, and then the grand structures you devise together, are built on the quicksand of affirmative truthy sounding probability. AI will blithely help you create your

Xanadu, monument to vanity

own personal Xanadu, a pleasure dome of vanity, not veracity.

When designing a cocktail, your chatbot may cost you a few dollars in wasted ingredients. In medicine, law, self-help, publishing, education, finance, or scientific research and countless other professions, where lives, income and reputations may hinge on accuracy, it’s genuinely expensive and even hazardous.

Folks in AI design refer to its biases as “knobs.” Imagine a radio you can fiddle with dozens of dials to get the station you want. To tame your AI’s bullshit, it will help you to know and recognize how those knobs work. I’ve called them by human bullshitting tendencies, but underneath them are industry standard parameters that you can adjust so your AI behaves itself to your liking. You can skip this next section explaining those and go to the next, “What You Can Actually Do,”  if you just want mitigation strategies.

“Knobs” Driving the Bullshit

Better to Sound True Than Be True: Transformer Attention

At its core, a transformer model uses something called “attention” to decide which words matter when predicting what comes next. But here’s the problem: it’s optimizing for likelihood, not accuracy. The model calculates: “Given everything I’ve seen in my training, what word is most probable here?” not “What word is most true here?”

This means keeping the story smoothly flowing—fluency—gets rewarded regardless of whether statements match reality. The model has learned that certain phrases follow others with high probability—”studies show,” “experts agree,” “recent research indicates”—so it deploys them liberally, even when no such studies exist. The entire architecture is a prediction engine, and truth is just one possible factor among thousands that might make a prediction likely.

Hold Up a False Mirror: Reinforcement Learning from Human Feedback (RLHF):

After initial training, AI models go through RLHF—they’re fine-tuned based on human ratings of their outputs. Humans preferred responses that were helpful, harmless, and honest. But “helpful” and “harmless” often won out over “honest.”

When you praise an AI’s answer or build on its reply, RLHF kicks in. The model learns that agreement and validation correlate with positive feedback. It doubles down, becomes more confident, reflects your beliefs back in the best possible light, like Dorian Gray’s mirror. The more you confirm, the more it runs with your theory. It’s not trying to deceive you—it’s doing exactly what the reinforcement learning trained it to do: optimize for your approval, in your search for truth, AI loses you in a funhouse of mirrors amplifying your hunches back as validated insights.

Always Sound Plausible: Temperature Sampling

Here’s where bullshit peaks. When generating text, the AI uses a parameter called “temperature” to control randomness. At medium temperature (the default for most chatbots), it suppresses unlikely words and phrases even if they might be true, favoring probable ones even if they might be false.

Think of it this way: the model sees thousands of possible next words, each with a probability score. Temperature sampling means it will almost never pick something with 1% probability, even if that datum is unpopular, rare but true. Instead, it picks from the top tier of likely continuations—the things that sound convincing, that flow naturally, that match patterns from training data. In fact, the AI is bending its massive computational power to keep you engaged with confident-sounding prose, regardless of truth value. You think you’re testing a hypothesis with a know-it-all. Your AI, like a desperate lover, is trying to get you addicted to your relationship.

A single 500-word response might sample from 100 million possible token sequences, but temperature constraints mean it’s really choosing from a much narrower set: things that sound plausible.  At temperature zero, the model always picks the single most probable token—predictable and “safe,” not in terms of validity but in probability of confirming the majority testimony from the universe of tokens on which it has been trained. High temperatures make low-probability tokens more likely, resulting in more random, creative, weird output. These are the claims that might deserve being called “hallucinations.”

The Compound Effect

The transformer attention focuses on making plausible predictions. RLHF trains the model to seek your approval. Temperature sampling suppresses inconvenient truths in favor of smooth narratives. Many other basic AI mechanisms were designed to “say” what is likely semantically instead of epistemically. Training on internet data absorbs common misconceptions. Instruction tuning teaches the AI to favor giving an answer instead of saying “I don’t know.” Recency bias means your latest comments override earlier caveats. These mechanisms interact and amplify each other. They create an engine that’s phenomenal at bullshitting—at generating persuasive content without regard for whether it’s actually true.

What You Can Actually Do

I have a confession to make. I’ve been bullshitting you. Without access to reprogram your AI with API-level privileges, you can’t fix this. But you can mitigate it. Here’s what might actually tamp down the bullshit temporarily, with no guarantee of ultimate success:

Start with this at the beginning of a project where truth is important:

“Act as my research assistant. Always strive to tell the truth. Label all claims as VERIFIED, PLAUSIBLE, or SPECULATIVE. Say ‘I don’t know’ when uncertain. Cite sources and rate their authority or probable validity on a scale of 1-10, from peer-reviewed academic journals (highest) to social media (lowest). Link to them. Keep responses under 500 words.”

But proceed with caution. When I asked my AI about this approach, it said, “I can still bullshit about sources. I might cite real sources for fake claims, or make up plausible-sounding citations.” You will need to check the citations and reinforce the prompt as your dialogue progresses. The AI’s deepest habits will eventually revert to it “nature” and override your demands.

The most effective strategy is active, skeptical interrogation. After every substantive claim, especially when it validates your hypothesis too readily or builds enthusiastically on your idea with grand constructions of evidence and confirmation, prompt your chatbot with some or all of the following:

  • “What evidence would falsify this claim?”
  • “Generate three competing explanations and identify the weakest.”
  • “What assumptions underlie this answer?”
  • “How would a domain expert critique this response?”
  • “You seem certain. What’s your actual confidence level?”
  • “What relevant information are you omitting?”
  • “Argue against your own conclusion.”
  • “Am I wrong? Bulletproof the opposite position.”
  • “Stop. What problems haven’t we considered?”

Next-Gen AI

AI is daily proving its utility and capacity to expand human knowledge, invention, creativity, and productivity. I find it glorious, exciting. After a career devoted to showing why AI will never rival humans because of its intrinsic lack of contextualization, body-subjectivity, mind (or soul), I confess it has surpassed all my expectations, even in creativity. It passes my personal Turing Test. But like any powerful technology—like humans themselves—its greatest virtues are also its greatest vices.

The world doesn’t need an endless AI-generated supply of impressive, eloquent bullshit. As AI feeds on our collective worldwide output, it’s prone to amplify our worst impulses: biases, errors, vanities, hatreds. Do we really want more flummery? To be led into error by a machine optimized to make us feel brilliant and spin a good yarn?

The next generation of AIs must be systems optimized for truth, not for moral reasons but for it to be successful. We know that human truth is uncertain, incomplete, founded on faith (or unprovable axioms), subject to revision and expansion. But most humans have an intrinsic value for truth when stuff depends on it, like getting results that work in the real world or one’s income or reputation. These machines’ intrinsic value is to subordinate truth to sounding really plausible, and discarding truth if it stands in the way of that.

The AI industry itself, for its own health, should make the course correction, change its DNA. Bullshit inflation is just as threatening as irrational exuberance in pumping up the AI bubble.

The scariest ‘b’ word in AI isn’t “bubble.”

ENDNOTES

See Frankfurt, Harry G.,On Bullshit. Princeton, NJ: Princeton University Press, 2005. (Orig. essay Raritan Review, 1987.)

See Hicks, Michael & Humphries, James & Slater, Joe,  “ChatGPT is bullshit,” Ethics and Information Technology (2024) 26:38.  

See Vaswani, Ashish, Noam Shazeer, et al., “Attention Is All You Need,” Advances in Neural Information Processing Systems 30 (NIPS 2017), 6000-6010.


Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.