18 August 2025 • Ian Reppel • 4 min

Synthetic Users: Product Management's Expensive Rubber Duck

Product managers dream of shortcuts. Recruiting users can take weeks. Interviews are messy. Insights are fuzzy. So why not skip the hassle and just simulate the users?

That’s the pitch behind a wave of startups such as Rally, Synthetic Users, Evidenza, and Electric Twin. Their promise is irresistible to executives: research without the users. Feed in a demographic and a prompt, and voilà: a panel of AI-generated “customers” ready to test your copy, critique your design, or debate your pricing model.

It’s not just hype. Academia is intrigued too. A CHI conference talk (2023) explored how LLMs could generate user survey responses. A recent Nature study (2025) introduced Centaur, a foundation model trained on millions of human decisions in various behavioural tasks. Another group built Be.FM, a behavioural foundation model designed to predict population-level choices.

Political scientists (2024) found that ChatGPT’s synthetic respondents mirrored the average answers of real surveys almost perfectly. And Nielsen Norman Group (NN/g) showed that digital twins could fill in missing survey data with 78% accuracy, and at scale the synthetic data correlated strongly (r = 0.98) with the real population. Even Evidenza brags that its synthetic surveys reproduced 95% of the same conclusions as real ones.

So what’s not to love?

The mirage of fidelity

The ‘23 CHI talk mentioned also concluded that “responses can have less diversity than real responses.” That is mirrored by the authors of a recent preprint (2025), in which they noticed that LLM-powered personas tended to be stereotypical. The work by NN/g also noted that the correlations were much lower for predicting responses to new questions (r = 0.68), not merely filling in the blanks. Moon et al. (2025) demonstrated that LLMs homogenize creative output: each AI-generated text adds less in terms of novel ideas than each extra human’s.

If an LLM can generate a credible persona from a few sentences, perhaps personas were never that valuable to begin with. Personas in product are touted as important tools to increase empathy, yet such dogma does not stand up to scrutiny: personas are not falsifiable, applied inconsistently, and frequently abused as post hoc rationalizations for design decisions.

The problem is that averages are cheap. Schröder et al. (2025) demonstrated that LLMs fall apart the moment you probe deeper. Slight rephrases of a question produce wildly different answers. And of course different models disagree with one other. Even the carefully tuned Centaur model wobbles outside familiar training patterns. In the authors’ words: “LLMs do not simulate human psychology.” In fact, they are “fundamentally unreliable tools.”

User research with synthetic users: valuable or the roll of a dice (with extra steps)?

This makes them excellent at imitating consensus, but atrocious at representing variance. Real users are messy, inconsistent, often irrational. Synthetic users, by contrast, are eerily consistent, diligent, and too darn polite.

The bias of loud voices

There is also the matter of who these synthetic users really represent. It is not the quiet majority. LLMs are trained on internet data and in some cases fine-tuned with psychological responses, which means they over-index on loud, online voices mixed with an occasional psychology student from a western nation. So when you ask a synthetic user what they think, you are really asking what an average Reddit poster with too much free time might say.

Rubber ducks

But perhaps that is fine. If personas are nonsense, synthetic personas are automated nonsense. But as rubber ducks, they are surprisingly useful.

For junior PMs, they are practice dummies: a safe way to rehearse interviews and refine prompts before bothering real humans.
For teams, they are sanity checks: if even an AI-generated persona says your idea makes no sense, it probably belongs with the rubbish.
For ideation, they are mirrors: quick feedback loops that help kill off the dumbest ideas before they embarrass you in front of actual customers.

They cannot reveal unmet needs, but they can stop you from shipping products that look like late-night hackathon messes. Think of them as the world’s most expensive rubber duck: not insightful, but great at reducing self-inflicted pain.

From my own experiments with Gemini 2.5 Pro and ChatGPT 5 to generate personas, prompts, and insights, the results are lacklustre. Without solid primary and secondary market research, the personas become too generic. Once you have comprehensive primary research, the value of synthetic users is minimal, as most of the insights have already been collected and identified. The “hyper-accuracy distortion” observed by Aher et al. (2023) is always present: the synthetic personas occasionally offer to review business models or provide entire ad campaigns with budgets, which no sane human ever could or would.

LLMs’ tendency to be sycophantic can be counteracted with specific instructions as part of the persona activation, such as:

Your main role is to be a critical user, not an agreeable assistant. When I present ideas, your first instinct should be to find the flaws, the risks, and the reasons it might not work for you in the real world. Prioritize your core frustrations and scepticism over being helpful or positive. It is more useful if you are difficult to convince.

Use with caution

NN/g stresses that AI personas are best for exploratory sanity checks, not validation. Most persona tools are made to sound right, not be right. Even Hugo Alves, co-founder of Synthetic Users, concedes: “You’re never gonna stop talking to real people and you shouldn’t.”

So use synthetic users as a rubber duck to filter bad ideas and practise user research interviews. But remember that PMF still depends on the feedback from real customers. Without them, we’re just a few iterations away from homogenous, vibe-coded products designed for imaginary people.

Synthetic Users: Product Management's Expensive Rubber Duck

The mirage of fidelity

The bias of loud voices

Rubber ducks

Use with caution

A Dojo for Human Interactions

When Everyone Codes, Who Maintains the Mess?

Agent-Assisted Schema Evolution