PMF: Product/Market Folklore
Ask a product manager about PMF and they will almost certainly rattle off the Sean Ellis test with its 40% threshold. What few realize is that it’s mostly rubbish.
Not rubbish in the sense that products never become viable, but rubbish in the sense that an entire industry has mistaken retrospective storytelling for a state of products that is empirically grounded. It is not.
The origins of PMF
The term product/market fit (PMF) entered startup culture via Marc Andreessen. In The Only Thing That Matters he proposed a sensation, not a metric. “You can always feel it” when the product and market fit.
And that is the original sin. It is a narrative device that is structurally incapable of being wrong, which is why it survives. If a company made it, the founder must have somehow known, and if not, the company never had PMF.
Sean Ellis tried to turn that vibe into a number. “How would you feel if you could no longer use this product?” was the key question. Across nearly 100 startups, he found that companies above 40% tended to grow, while those below struggled. That story is endlessly repeated and that specific threshold is taught far and wide.
That is, unfortunately, the entire evidential base.
There is no publicly available data set, no peer-reviewed study, or independently replicable analysis that validates the 40% threshold in the public record. Maybe there is, but I have been unable to find it. That also goes for the authors of two recent business master’s theses, in which PMF appears not as a measurable construct but as an interpretive managerial judgement, with multiple assessment archetypes and no empirically validated standard. If the validation exists but remains proprietary or inaccessible, the conclusion is the same: the claim cannot be independently evaluated. A metric used to make funding decisions does not earn legitimacy through repetition and authority. It must earn it by being falsifiable.
Selection bias all the way
The Sean Ellis test also fails methodologically. It is almost always administered to current users only. Users who tried the product and left are, by definition, excluded. Survivorship bias is hardwired: you sample from a group who have already tolerated the product long enough to answer a survey, and now you expect to infer population-level viability. A metric that ignores churn by design cannot claim to detect fit.
Stated preference vs revealed behaviour
What is more, the Sean Ellis score is a hypothetical stated-preference measure. Behavioural economists have spent decades documenting why such measures systematically overstate real commitment.
In Paying Not to Go to the Gym, the researchers show that consumers overestimate their own future usage, even when non-usage is costly. The same pattern appears in customer satisfaction metrics more broadly. NPS may be popular, but it mostly correlates with growth in some contexts, not retention. In terms of revenue growth, it is outperformed by much simpler metrics, such as the top-box metric, which is simply the share of promoters. The subtraction with detractors has no empirical basis.
Stated intent often diverges from revealed behaviour. Whether users who say they are “very disappointed” actually retain, expand, or refer at meaningfully higher rates has, to my knowledge, never been published.
Negative framing
There is a quieter problem with the Sean Ellis test that is independent of thresholds, sample size, or replication: the question itself is negatively framed.
When we ask users how they would feel if they could no longer use a product, this is not a neutral probe of value. It is framed as a loss, and that activates aversion to disruption, habit, endowment effects, and switching friction instead. A user can be indifferent to a product, dissatisfied with its price, or actively shopping for alternatives and still report being “very disappointed” at the prospect of abrupt removal.
The hypothetical scenario itself is psychologically artificial. The question therefore elicits an affective reaction rather than an evaluative judgement.
Cultural bias
Even if stated preference predicted behaviour perfectly, a universal threshold cannot survive cultural boundaries. Endpoints in Likert scales are not used the same across cultures: people in the Americas tend to rate more enthusiastically than in Europe or Asia because of cultural norms. For instance, a US customer may declare a product as “awesome” whereas the exact same product gets a subdued “above average” in Luxembourg even though both agree roughly on its value. A US-calibrated “very disappointed” cut-off is therefore more a sign of cultural bias than of product value.
Even retention patterns are not the same across cultures with differences of more than five times visible in the same continent: Chinese users abandon products more rapidly than their Japanese counterparts. No cultural calibration of the Sean Ellis test exists, so applying it globally is indefensible.
Once upon a time…
The Sean Ellis test emerged nearly twenty years ago when software distribution was laden with friction, switching costs were high, and user expectations were modest. Nowadays, users install and abandon apps in minutes. They also expect the UX to be polished, because they can discover alternatives in a few taps. A heuristic can survive environmental change only if it is periodically revalidated. To my knowledge, no such effort has been made in the past two decades. Beyond which, if you wish to be pedantic, you cannot re-validate a method until it has at least once been validated, which, as I have argued so far, has never happened.
The Superhuman case study is often thrown around as validation. It is not, though. It shows disciplined segmentation and iteration, not the discovery of a universal threshold. Any directional improvement is not evidence of a phase boundary.
Speaking of phase transitions…
The absence of phase transitions
If PMF were a discrete state, we would expect abrupt changes in behaviour, especially in retention curves. We’d expect a jump from, say, 10% retention of one cohort to the next cohort with 50%. Instead we see retention inch up from 10% to 12% and maybe 15% over time.
Large-scale analyses show steep early decay followed either by continued decay towards zero or flattening at a non-zero asymptote. Public within-cohort retention data looks continuous and category-dependent. Sure, you can improve the asymptote, but you rarely observe a discontinuous transition from “No PMF” to “PMF”, where between-cohort behaviour for new cohorts jumps. There simply is no phase transition, but perhaps a gradual improvement in retention over time. We do not even see such phase transitions in social networks.
Moreover, the idea that weekly or monthly retention is indicative of PMF presumes products that are used continuously, not episodically, such as tax software, payroll software, or grant application platforms.
Unit economics as constraints
Ratios such as LTV/CAC are often invoked as PMF proxies. They are at best accounting consequences of behaviour filtered through cost structures. Benchmarks such as LTV/CAC > 3 come from surviving companies and investor heuristics. They bound feasibility, but they do not reveal fit. No survival analysis shows a sharp jump in success probability when such ratios are crossed.
This becomes obvious when we look at Uber, Spotify, or OpenAI. All three have massive adoption. All three are widely described as having PMF, yet none have demonstrated economic sustainability across most of their respective histories. Without incredible amounts of capital, these businesses would have failed a long time ago. And even now, Uber is profitable yet mostly because it flouted the rules long enough to squeeze out the local competition. Spotify pays artists a pittance and is pushing for more AI-generated tunes to save on royalty payments. OpenAI is nowhere near profitable, and it remains to be seen whether the company can live up to its own hype. At sufficient scale, growth primarily reflects extraction. If PMF meant “a sustainable business”, they all fail. If it means “users want this”, they pass.
Product/market fit is best understood as a weak necessary condition masquerading as a strong sufficient one. Consider Pebble, which raised over twenty million dollars on Kickstarter and built an enthusiastic community. It most likely scored high on the Sean Ellis test, though the company still failed because the business tried to grow outside of its core customer base too soon, where people were at best ambivalent about a Pebble device.
Homejoy scaled rapidly, raised forty million dollars, and collapsed due to mediocre retention and premature international expansion. Either these companies had PMF and it was insufficient to scale up, or they never had PMF in any rigorous sense, which means the signals people usually cite to claim PMF are unreliable. Both outcomes do contradict how PMF is actually used in practice: as a decisive gate rather than a tunnel of unspecified length, navigable only by authority and hindsight. You can only form probabilistic beliefs in real time. Yet product/market fit is retrospective by design.
From apps to clothes
Clothes can fit well and still be ugly, overpriced, or inappropriate for the occasion. “Fit” alone tells you very little about perceived value. It merely tells you that wearing a garment is physically possible. Some garments fit better or worse. Treating PMF as a scalar does not solve the problem, though. PMF conflates usability, desirability, economics, and distribution.
The “I declare PMF” fantasy
What organizations often want is certification: a moment when an executive says “We have PMF!” At that moment, doubt turns into heresy. It is psychologically comforting and operationally dangerous. Fit is not a permanent state: markets move even when products do not. Clothes can remain the same, but bodies do not.
Kodak fit its market perfectly until the nature of photography changed. Nokia fit its market until platforms replaced devices. More recently, LLMs have plausibly shifted apps for knowledge workers not because incumbents were bad, but primarily because of executive parroting, as very little in the economy has actually changed as a direct consequence of LLMs. Declaring PMF as a binary, irreversible milestone is superstition.
FOSS and internal platforms
Internal platforms and open-source software expose the mistake immediately. The former have finite markets, mandatory adoption, capped growth, and indirect economics. Retention merely measures employee tenure. Satisfaction is orthogonal to use—hello, SAP! Yet such platforms can be critical and successful relative to their intent.
Open-source software may have zero revenue yet immense fit with durable voluntary adoption and enterprise integrations. Economic metrics fail entirely. If PMF were a universal product property, it would survive these contexts, but alas.
Unfalsifiable PMF
At this point the problem ought to be clear: PMF as practised is unfalsifiable. Making PMF falsifiable requires industry-wide data sets that track cohorts across products, contexts, cultures, and time, with clear hypotheses about what thresholds predict what outcomes. That data does not exist publicly and its absence after two decades is itself telling.
PMF from first principles
Strip away the folklore and product/market fit is the degree to which a defined segment persistently chooses a product over available alternatives because it meaningfully improves their situation under current conditions. Anything stronger is mythology.
The myth of PMF as a binary switch that is oblivious to culture or human psychology persists because VCs need shortcuts and founders need certainty. PM culture has normalized cognitive laziness: outsourcing judgement to elders (a.k.a. the rich), laundering anecdotes into “frameworks”, and treating repetition as validation. Few read the basic literature and even fewer question its foundations. The Sean Ellis test is merely a symptom of that culture. And for an industry that prides itself on metrics, the silence on validation is damning. Either PMF has clear boundaries and must be falsifiable or it is interpretive and therefore cannot justify go/no-go decisions.
What to do on Monday
Let’s go back to the definition: PMF is the degree to which a defined segment persistently chooses a product over available alternatives because it meaningfully improves their situation under current conditions. It works for VC SaaS, boutique profitable products, internal platforms for which alternatives include workarounds and shadow tools, open-source software, episodic software, and even mission-driven products. Note that it is gradual rather than binary (“degree”), it avoids vague markets (“defined segment”), it captures repeated customer behaviour (“persistently chooses”), and it covers utility not merely payment. In other words, PMF exists to the extent that a defined segment would be worse off without the product and demonstrates this through repeated voluntary choice when alternatives are available. Economic sustainability is an entirely separate concern.
So what can you do with that definition on Monday? Define a segment within the market and look for evidence of persistent voluntary choice and convergence. Launch excitement and viral influencer waves are not indicative of long-term product usage, hence the focus on convergence. For products that are used regularly not merely episodically, month-on-month retention is a decent place to start. Ask yourself whether cohort curves flatten to a stable asymptote and whether that asymptote improves cohort over cohort in a specific segment. For episodic software, look for annual re-subscription rates. For FOSS, check contributor growth or downstream ecosystem adoption over time. For internal platforms, voluntary feature usage or reduction in shadow systems are sensible options.
If you charge, does retention hold after price increases? If the product is free, do users invest time, data, or integration effort? For open-source software, do organizations deploy it in mission-critical systems? PMF must survive friction, so if friction ruins it, you only had enthusiasm.
Persistent voluntary choice is not enough, as evidenced by Pebble, Kodak, and Nokia. Economic sustainability is a must: can the persistent behaviour fund the structure required? For internal tools that question is not contradictory: if continued investment in a platform requires more than it produces in benefits to the company, it is not sustainable.
More importantly, state your own falsifiable condition: “We will conclude that we do not have PMF in a certain segment if…“
- 6-month retention drops below 50% for three consecutive cohorts.
- <25% of new usage is organic within two quarters.
- the renewal rate falls below 75% at the next contract cycle.
- the contributor count stagnates while forks grow.
- etc.
Furthermore, regularly check (e.g. every quarter) whether the segment, the competition, the capital structure, or switching costs have changed. Think of PMF as an ongoing Bayesian update about whether voluntary, persistent behaviour under constraints supports the stated strategic objective.
So, the next time a PM mentions the 40% rule, ask for the data. And if none is forthcoming, treat it as folklore, not fact.