Bayes' Rule in Product Management
How can you figure out the probability of success of a product from various metrics? With Bayes’ rule, of course!
Bayes’ rule
Bayes’ rule provides a means of incorporating fresh data into our beliefs. Let’s say we are interested in product success, indicated by \(S\). The probability of success is \(\mathbb{P}(S)\). This is the prior probability, because we have not yet included any data. If we are then presented with data \(D\), we can update this prior with that data according to Bayes’ rule:
\[\mathbb{P}\left(S\vert D\right) = \frac{\mathbb{P}\left(D\vert S\right) \mathbb{P}(S)}{\mathbb{P}(D)}\]\(\mathbb{P}\left(S\vert D\right)\) is the posterior probability. \(\mathbb{P}\left(D\vert S\right)\) is the likelihood, which represents the probability of the data given success. So, if we assume a product is already successful, it answers: What is the chance that we see the data or signal? The marginal probability \(\mathbb{P}(D)>0\) is needed to normalize the posterior properly.
As an aside, Bayes' rule provides a mathematical explanation for why true believers do not easily change their minds. If the prior is (close to) zero, updating beliefs is very slow or even impossible, no matter how solid the evidence to the contrary.
Sometimes it is beneficial to compute \(\mathbb{P}(D)\) with the law of total probability:
\[\mathbb{P}(D) = \mathbb{P}\left(D\vert S\right)\mathbb{P}(S) + \mathbb{P}\left(D\vert \neg S\right)\mathbb{P}(\neg S).\]This is possible because the entire sample space is partitioned by success (\(S\)) and failure (\(\neg S\)): there are no other possibilities for our product. The probability of failure is \(\mathbb{P}(\neg S)=1-\mathbb{P}(S)\), and \(\mathbb{P}\left(D\vert \neg S\right)\) is the probability of the data occurring when the product is a massive failure.
How to use
To make that all more concrete for product managers, we shall examine a few examples:
- Success with plenty of sign-ups in the first week after a product launch
- Success despite negative reviews in the first month after a product launch
- Feature effectiveness (success) after an A/B test
In what follows, the probability of product/market fit (success) for an arbitrary product ready to hit the market \(\mathbb{P}(S)=0.05\). That is for a single product including all its iterations prior to launch. That figure is low because product/market fit remains elusive for many startups, which is why so many fail.
Lots of sign-ups
Based on prior experience and market research of similar products, we may know that 10,000 sign-ups in the first seven days is a strong indicator of success. In fact, products that are successful show such user behaviour in, say, four out of five cases or \(\mathbb{P}\left(D\vert S\right) = 0.80\). However, products that fail in the market may still have such post-launch behaviour due to hype, so that \(\mathbb{P}\left(D\vert \neg S\right) = 0.10\). This means that one in ten failed products still achieves 10k sign-ups in the first week.
What is the probability of success after seeing ten thousand sign-ups in a single week right after launch? According to Bayes’ rule,
\[\begin{eqnarray} \mathbb{P}\left(S\vert D\right) &=&\frac{\mathbb{P}\left(D\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D\vert S\right)\mathbb{P}(S) + \mathbb{P}\left(D\vert \neg S\right)\mathbb{P}(\neg S)} \\ &=& \frac{0.80\cdot 0.05}{0.80\cdot 0.05 + 0.10\cdot\left(1-0.05\right)} \\ &\approx & 0.30. \end{eqnarray}\]That is a massive improvement! But there is still a high chance of failure if these users do not care to stick around. What is more, even if they do stick around, the product might never be profitable if it cannot be scaled across the market.
Negative reviews
This time we want to know what the chances of success are when 1,000 negative reviews pop up in the first month after launch. A successful product has only a chance of, say, 10% to see so many angry customers vent online, so \(\mathbb{P}\left(D\vert S\right)=0.10\). Instead, colossal failures see it much more often, maybe 60% of the time, so \(\mathbb{P}\left(D\vert \neg S\right) = 0.60\). From Bayes’ rule we calculate the posterior probability to be 0.80%:
\[\begin{eqnarray} \mathbb{P}\left(S\vert D\right) &=& \frac{\mathbb{P}\left(D\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D\vert S\right)\mathbb{P}(S) + \mathbb{P}\left(D\vert \neg S\right)\mathbb{P}(\neg S)} \\ &=& \frac{0.10\cdot 0.05}{0.10\cdot 0.05 + 0.60\cdot\left(1-0.05\right)} \\ &\approx & 0.0009 \end{eqnarray}\]This appropriately reduces our belief in success.
A/B testing
Instead of an entire product’s success or failure, we can look at a feature. Let’s say a feature has a 50:50 chance of success prior to an A/B test, that is, \(\mathbb{P}(S)=0.50\). The A/B test itself shows an increase in engagement by a significant amount, which is a result we see in successful features 90% of the time. If the feature is ineffective in boosting engagement, we expect to see the result only 10% of the time, so \(\mathbb{P}\left(D\vert\neg S\right) = 0.10\). Perhaps that is due to seasonal effects that have not yet been accounted for or random events in the world.
\[\begin{eqnarray} \mathbb{P}\left(S\vert D\right) &=& \frac{\mathbb{P}\left(D\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D\vert S\right)\mathbb{P}(S) + \mathbb{P}\left(D\vert \neg S\right)\mathbb{P}(\neg S)} \\ &=& \frac{0.90\cdot 0.50}{0.90\cdot 0.50 + 0.10\cdot\left(1-0.50\right)} \\ &=& 0.90 \end{eqnarray}\]This shows that our belief in the feature’s success after seeing the A/B test result has increased quite a bit. If we had defined a threshold prior to conducting an A/B test, we could have decided to drop the feature if the posterior probability had not exceeded that threshold. Note that such a statistical approach makes sense only if the probabilities used are realistic. Otherwise the threshold will be as arbitrary as the posterior itself.
The PMF trifecta: adoption, engagement, retention
According to Brian Balfour, the trifecta of product success is i) steady growth (adoption), ii) meaningful usage (engagement), and iii) a flat retention curve. We can now use Bayes’ rule with three signals to compute the chances of success given these signals.
What we therefore need is a generalization of Bayes’ rule for \(n\) signals (or \(n+1\) events, as it is referred to in the literature):
\[\mathbb{P}\left(S\vert D_{1},\ldots,D_{n}\right) = \frac{\mathbb{P}\left(D_{1},\ldots,D_{n}\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D_{1},\ldots,D_{n}\right)}.\]If the data or evidence is unconditionally independent, the joint probability in the denominator of Bayes’ rule becomes the product of the individual probabilities. Such independence of signals requires there to be no relationships whatsoever, neither linear nor non-linear. Unconditional independence is very rare in practice.
Any correlation, linear or nonlinear, is sufficient to establish a statistical dependence. In the case of conditional independence, the likelihood factors as
\[\mathbb{P}\left(D_{1},\ldots,D_{n}\vert S\right) = \mathbb{P}\left(D_{1}\vert S\right)\cdot \mathbb{P}\left(D_{2}\vert S\right) \cdot \ldots \cdot \mathbb{P}\left(D_{n}\vert S\right).\]Bayes’ rule for three signals \(n=3\) with conditional independence reads:
\[\begin{eqnarray} \mathbb{P}\left(S\vert D_{1},D_{2},D_{3}\right) &=& \frac{\mathbb{P}\left(D_{1},D_{2},D_{3}\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D_{1},D_{2},D_{3}\right)}\\ &=& \frac{\mathbb{P}\left(D_{1}\vert S\right)\mathbb{P}\left(D_{2}\vert S\right)\mathbb{P}\left(D_{3}\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D_{1},D_{2},D_{3}\vert S\right)\mathbb{P}(S)+\mathbb{P}\left(D_{1},D_{2},D_{3}\vert\neg S\right)\mathbb{P}(\neg S)} \\ &=& \frac{\mathbb{P}\left(D_{1}\vert S\right)\mathbb{P}\left(D_{2}\vert S\right)\mathbb{P}\left(D_{3}\vert S\right) \mathbb{P}(S)}{\begin{multline}\mathbb{P}\left(D_{1}\vert S\right)\mathbb{P}\left(D_{2}\vert S\right)\mathbb{P}\left(D_{3}\vert S\right)\mathbb{P}(S)+\\\mathbb{P}\left(D_{1}\vert\neg S\right)\mathbb{P}\left(D_{2}\vert\neg S\right)\mathbb{P}\left(D_{3}\vert\neg S\right)\mathbb{P}(\neg S)\end{multline}} \tag{NB} \end{eqnarray}\]by virtue of conditional independence and the law of total probability. This is the naive Bayes (NB) approach. The formula may look daunting, but it is mostly a matter of plugging in a few numbers, as we shall see.
In naive Bayes, we presuppose independence conditional on \(S\). Adoption, engagement, and retention are all positively correlated: high adoption and retention are more likely with strong user engagement. Still, naive Bayes provides a baseline. All we need to proceed are a few probabilities:
| Term | Value | Explanation | Rationale |
|---|---|---|---|
| \(\mathbb{P}(S)\) | 0.05 | Overall probability of success | Industry estimate (as before) |
| \(\mathbb{P}(\neg S)\) | 0.95 | Overall probability of failure | From \(\mathbb{P}(S) + \mathbb{P}(\neg S) = 1\) (as before) |
| \(\mathbb{P}(A)\) | 0.20 | Overall probability of product adoption | Industry estimate |
| \(\mathbb{P}(R)\) | 0.10 | Overall probability of product retention | Industry estimate |
| \(\mathbb{P}\left(A\vert S\right)\) | 0.50 | Probability of high adoption for successful product | Medium: hype may boost adoption (temporarily) and successful niche products do not have massive adoption by definition |
| \(\mathbb{P}\left(E\vert S\right)\) | 0.60 | Probability of high engagement for successful product | Medium/high: a product may be very useful to customers, but they do not engage much on average (e.g. finance, groceries, maps, or travel apps) |
| \(\mathbb{P}\left(R\vert S\right)\) | 0.90 | Probability of high/flat retention for successful product | Definition of success, though some successful products can burn through customers in the market or their customers churn and return for a while; not a long-term possibility for success |
| \(\mathbb{P}\left(A\vert\neg S\right)\) | 0.18 | Probability of high adoption for failure | From \(\mathbb{P}(A) = \mathbb{P}\left(A\vert S\right)\mathbb{P}(S) + \mathbb{P}\left(A\vert \neg S\right)\mathbb{P}(\neg S)\) |
| \(\mathbb{P}\left(E\vert\neg S\right)\) | 0.10 | Probability of high engagement for failure | Low: customers do not waste time on a product they do not care for |
| \(\mathbb{P}\left(R\vert\neg S\right)\) | 0.10 | Probability of high/flat retention for failure | From \(\mathbb{P}(R) = \mathbb{P}\left(R\vert S\right)\mathbb{P}(S) + \mathbb{P}\left(R\vert\neg S\right)\mathbb{P}(\neg S)\) |
Let’s substitute the relevant values into NB:
\[\begin{eqnarray} \mathbb{P}\left(S\vert A,E,R\right) &=& \frac{\mathbb{P}\left(A\vert S\right)\mathbb{P}\left(E\vert S\right)\mathbb{P}\left(R\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(A\vert S\right)\mathbb{P}\left(E\vert S\right)\mathbb{P}\left(R\vert S\right)\mathbb{P}(S)+\mathbb{P}\left(A\vert\neg S\right)\mathbb{P}\left(E\vert\neg S\right)\mathbb{P}\left(R\vert\neg S\right)\mathbb{P}(\neg S)} \\ &=& \frac{0.50\cdot 0.60\cdot 0.90\cdot 0.05}{0.50\cdot 0.60\cdot 0.90\cdot 0.05 + 0.18\cdot 0.10\cdot 0.10\cdot 0.95} \\ &\approx & 0.89 \end{eqnarray}\]Our probability of success has gone up to almost 90%!
Beyond naive Bayes
Before you pop the champagne cork, consider that naive Bayes assumes all signals to be conditionally independent given success. That is, success is the cause of all three symptoms: consistent adoption, steady engagement, and high and steady retention. That is too much to ask for, though. For instance, a successful product with great adoption may struggle to have highly engaged users across all segments. Some successful products cater to niche audiences. These products have stable but fairly low adoption, yet highly engaged users who rarely churn. Other products can be successful but not require exceptional engagement, such as a banking app, a file backup service, a suitcase, or a professional camera.
In the next post, we shall look at a sequential model, which models dependencies among signals without conditional independence. We shall see that naive Bayes overestimates the probability of success a lot. Still, it can be a useful baseline.