LLM-Powered Self-Sabotage

We do not need LLM-powered industrial sabotage. With humans in the loop, we can self-sabotage much more efficiently.

The World Economic Forum lists LLM-driven misinformation among the most significant near-term global risks, because generative systems drastically reduce the cost of producing fake but credible content. Similarly, ENISA warns that generative AI increases the efficiency and scale of manipulation campaigns in politics.

But we do not need adversaries weaponizing LLMs against us. Many organizations are already undermining themselves by accepting ungrounded, LLM-generated numbers and synthetic research as if they were facts. Executives demand AI adoption because competitors are doing it and boards expect it. Speed is paramount, yet correctness is earily absent from each mandate.

Confabulated certainty

A startup pivots after discovering that the total addressable market in a related area is much larger than it previously imagined. Behind the number there is no survey, no industry dataset, no methodology section, and no sensitivity analysis. There is a prompt, though, and the model’s prediction of what a plausible market-sizing paragraph might look like. There may be a link or two, but they lead to non-existent reports that few bother to uncover by clicking. That is hardly surprising, because LLMs complete the text, and the most probable completion of “the total addressable market for XYZ is” happens to be a confident, yet hallucinated number.

Bullshit is more apt: an indifference to truth rather than correctness gone awry, which is what hallucination implies. LLMs do not care whether their claims are connected to anything real; they were never oriented towards correctness.

Eventually the board of directors approves the pivot. The figures survive due diligence because they appear in a deck with other figures that also appear precise, and precision is contagious: once one number carries decimals, adjacent numbers inherit authority by proximity. No one on the board can distinguish a researched estimate from a generated one, which means no one can gauge the risk of the strategy they just endorsed.

Selective scrutiny

If a number confirms what leadership already believes, it passes without friction. If it contradicts expectations, the model that generated it is suddenly blamed. This tendency to scrutinize evidence that goes against our views more harshly than whatever confirms our beliefs is known as motivated reasoning.

Motivated reasoning predates LLMs, but the models amplify it when applied without thought, because it is so easy to generate data that looks real but isn’t. This automation misuse was already identified back in 1997, though we have ignored its lesson ever since. Over time, an organization generates more numbers than ever, cites more sources than ever, yet knows less than it did before, because the numbers and sources are decorative.

Synthetic customers

Instead of recruiting participants, running interviews, and analysing transcripts, product teams prompt a model for “a 32-year-old power user” and receive feedback in seconds.

But the simulated power user has no commute, no frustration with last week’s release, no competing product open in the next tab, no children interrupting the session, no unforeseen bills to pay this month, no reason to lie about how often they actually use the feature. Real users are inconvenient, yet their inconsistencies reveal what surveys and personas cannot. Synthetic personas are useful for generating hypotheses or drafting interview guides, but treating them as empirical validation is like load-testing a bridge with a photograph of a truck. When a product team ships a feature because the simulated users loved it, they will pay in churn what they saved in their research budget.

Going ballistic

In recent warfare simulations, multiple leading language models repeatedly recommended nuclear strikes as part of crisis decision scenarios, even when escalation would be strategically irrational for humans. The models were completing text in a context where the training distribution made escalation the highest-probability continuation. They do not feel fear or experience human cost, no matter how much we anthropomorphize LLMs.

The same statistical mechanism that produces “launch a warhead” produces “the total addressable market is $4.2 billion” and “users overwhelmingly prefer the new onboarding flow.” The model does not know which of these outputs will end a career, a company, or a country.

You write it*, you own it (* even if only pretend)

When someone copy-pastes unvalidated model output, the burden to check facts and figures typically becomes the reader’s rather than the author’s, which is why it is perfectly rational to stop reading LLM-generated rubbish from colleagues. Whenever your name is at the top of a document, you must own its contents and provenance. The burden of proof always lies with the author, never the reader. And if you include Claude or whichever LLM du jour in the list of authors, the standard assumption is that the entire document was generated from a prompt that took only seconds to type, which is the maximum amount of attention any rational person ought to give it, too.

Mandated sabotage

Any decision-relevant number must include a source or derivation from primary sources. Any user insight influencing product direction must specify whether it is simulated or empirically observed with links to transcripts. These are the standards that existed before LLMs made it possible to bypass them at zero marginal cost.

Speed matters in business, but speed that compounds errors is simply deferred correction. We worry about adversaries weaponizing AI against us, but the more immediate risk is that we lower our own standards and call it progress. Companies destroy themselves when they confuse fluency with evidence and reward the production of numbers over the defence of numbers. The sabotage is mandated from the top and settled in severance.