2026-05-02 · 5 min read

Can AI Lie Convincingly? Inside the Making of Artificially Incorrect

Most of the time you want AI to tell the truth. Artificially Incorrect asks Claude to do the opposite — take a verified fact about a topic, alter one piece of it, and present the result alongside two real truths across five daily rounds.

Building it sounded straightforward. It wasn't. The model can produce a convincing lie in seconds. Making sure that lie is fair, internally consistent, and actually false in the way I intended — that is where almost all the engineering went. This game is mostly QA, not generation.

Advertisement

Failure mode one: lies that were too easy

Early versions had a recurring problem: the model changed the most famous fact about the topic. Moon landing round? It falsified the year. Everyone knows 1969. Players spotted it immediately and the round felt pointless.

The fix was a prompt constraint: the lie must avoid the single most well-known fact about the subject. Swap something smaller — a secondary date, a count, a name most people half-remember. That one rule cut the obvious-lie problem sharply.

Failure mode two: confident mistakes that passed checks

Even after I added automated validation, bad rounds occasionally reached players. A statement that contradicted itself. An explanation that argued the wrong side — confidently telling you a true statement was the lie. The model sounds sure even when it's wrong.

I added a second verification pass at temperature zero — no randomness, just a consistency check — specifically to catch cases where the explanation contradicted its own statement. That closed a gap the first pass missed. It did not close every gap.

Why the truths matter as much as the lie

I learned this one the hard way. If both true statements are boring or obviously correct, the lie stands out by elimination even when it's subtle. A good round needs three statements that all sound plausible on first read.

Getting that balance across random daily topics — without a human editor hand-picking each round — is why generation is the easy part. Making the mix work day after day is the slog.

Quality control, honestly

Every daily challenge runs through automated checks before it goes live: consistency between statements and explanations, regeneration when something fails. I won't claim the system is airtight. Players still occasionally notice something off — a fact that doesn't sit right, an explanation that reads strangely — and when that happens, I patch it manually.

AI is not reliable enough to trust without checking. The game only works because I treat every round as guilty until verified, and still keep a human hand on the wheel when something slips through.

If you want to see what survived all that filtering, try a round is live.

Advertisement