Bayes' Rule in Product Management: Sequential Bayes

Given several correlated metrics, how do you calculate the probability of product/market fit? With sequential Bayes!

To compute success \(S\) from \(n\) signals \(D_{1},\ldots,D_{n}\) we can use the generalization of Bayes’ rule:

\[\mathbb{P}\left(S\vert D_{1},\ldots,D_{n}\right) = \frac{\mathbb{P}\left(D_{1},\ldots,D_{n}\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D_{1},\ldots,D_{n}\right)}.\]

Bayesian networks are generally best for modelling the various joint probabilities. Naive Bayes assumes conditional independence on \(S\), which may or may not be true. An approach in-between both options in terms of complexity is known as sequential Bayes.

Sequential Bayes

In sequential Bayes, no assumption of conditional independence is made. By modelling the signals as a Markov chain \(D_{1}\to D_{2}\to \ldots \to D_{n}\), we include statistical dependencies that naive Bayes overlooks.

Bayes’ rule can be applied iteratively: the posterior of the \(i\)th signal becomes the \((i+1)\)th prior. This is by virtue of the chain rule for probabilities:

\[\begin{multline} \mathbb{P}\left(D_{1},\ldots,D_{n}\right)=\mathbb{P}\left(D_{n}\vert D_{n-1},\ldots,D_{1}\right)\mathbb{P}\left(D_{n-1}\vert D_{n-2},\ldots,D_{1}\right)\ldots \\ \ldots\mathbb{P}\left(D_{3}\vert D_{2},D_{1}\right)\mathbb{P}\left(D_{2}\vert D_{1}\right)\mathbb{P}\left(D_{1}\right). \end{multline}\]

Why can we apply Bayes’ rule iteratively? We shall focus on \(n=3\) for which our chain is \(D_{1}\to D_{2}\to D_{3}\). Phrased differently, \(D_{3}\) is affected by both \(D_{1}\) and \(D_{2}\), but \(D_{2}\) only by \(D_{1}\).

Recall Bayes’ rule for a single signal \(D_{1}\):

\[\mathbb{P}\left(S\vert D_{1}\right) = \frac{\mathbb{P}\left(D_{1}\vert S\right) \mathbb{P}(S)}{\mathbb{P}(D_{1})}\]

We can spot the one-signal posterior masquerading as the prior in equation with two signals:

\[\begin{eqnarray} \mathbb{P}\left(S\vert D_{1},D_{2}\right) &=& \frac{\mathbb{P}\left(D_{1},D_{2}\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D_{1},D_{2}\right)} \\ &=& \frac{\mathbb{P}\left(D_{2}\vert D_{1}, S\right)\mathbb{P}\left(D_{1}\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D_{2}\vert D_{1}\right)\mathbb{P}(D_{1})} \\ &=& \frac{\mathbb{P}\left(D_{2}\vert D_{1}, S\right)\mathbb{P}\left(S\vert D_{1}\right)}{\mathbb{P}\left(D_{2}\vert D_{1}\right)} \end{eqnarray}\]

Now let’s look at the case with three signals:

\[\begin{eqnarray} \mathbb{P}\left(S\vert D_{1},D_{2},D_{3}\right) &=& \frac{\mathbb{P}\left(D_{1},D_{2},D_{3}\vert S\right) \mathbb{P}(S)}{\mathbb{P}\left(D_{1},D_{2},D_{3}\right)} \\ &=& \frac{\mathbb{P}\left(D_{3}\vert D_{1},D_{2},S\right)\mathbb{P}\left(D_{2}\vert D_{1},S\right)\mathbb{P}\left(D_{1}\vert S\right)\mathbb{P}(S)}{\mathbb{P}\left(D_{3}\vert D_{1},D_{2}\right)\mathbb{P}\left(D_{2}\vert D_{1}\right)\mathbb{P}(D_{1})} \\ &=& \frac{\mathbb{P}\left(D_{3}\vert D_{1},D_{2},S\right)\mathbb{P}\left(S\vert D_{1},D_{2}\right)}{\mathbb{P}\left(D_{3}\vert D_{1},D_{2}\right)} \end{eqnarray}\]

So, the posterior with two signals \(\mathbb{P}\left(S\vert D_{1},D_{2}\right)\) is the prior of the case with three signals, as promised. In such a model, the order matters, as we shall see.

Product/market fit

Let’s return to the product/market fit example from the original post. We wish to compute the probability of product/market fit (success) based on three signals: adoption, engagement, and retention. Particularly, we ask ourselves: When we see steady user growth, meaningful usage, and a flat retention curve, what is the probability of success of our product?

How to use

In this case, we have \(D_{1}=A\), \(D_{2}=E\), and \(D_{3}=R\). With the chain \(A\to E\to R\), we can update our beliefs in that particular order, which is generally how these figures appear. We do not have to re-compute the probability of success in one go, but can do so whenever new data arrives.

We also need a few probabilities lifted from the post on naive Bayes:

Term Value Explanation
\(\mathbb{P}(S)\) 0.05 Overall probability of success
\(\mathbb{P}(\neg S)\) 0.95 Overall probability of failure
\(\mathbb{P}(A)\) 0.20 Overall probability of product adoption
\(\mathbb{P}(R)\) 0.10 Overall probability of product retention
\(\mathbb{P}\left(A\vert S\right)\) 0.50 Probability of high adoption for successful product
\(\mathbb{P}\left(E\vert S\right)\) 0.60 Probability of high engagement for successful product
\(\mathbb{P}\left(R\vert S\right)\) 0.90 Probability of high/flat retention for successful product

In what follows, I use unrounded probabilities.

1. Adoption

With \(D_{1}=A\) and values as before, we find that

\[\begin{eqnarray} \mathbb{P}\left(S\vert A\right) &=& \frac{\mathbb{P}\left(A\vert S\right) \mathbb{P}(S)}{\mathbb{P}(A)} \\ &=& \frac{0.5\cdot 0.05}{0.2} \\ &\approx & 0.13 \end{eqnarray}\]

Seeing solid adoption has boosted our belief in the success of the product significantly.

2. Engagement

Next up: \(D_{2} = E\), engagement, where it gets tricky in the denominator \(\mathbb{P}\left(E\vert A\right)\). That is the probability of solid engagement given steady adoption. People may sign up for a product en masse yet leave immediately. In fact, that’s the reality for three-quarters of customers. Remember Clubhouse? Google+? Ello? Or Vine?

Strong signals over time, which is our focus here, are rare for product failures. High engagement is more likely when users sign up through word-of-mouth marketing, because paid growth is usually not as effective as organic in B2C. Nevertheless, half of all consumers cannot really tell the difference between ads and regular online search results, so the line between organic and paid is blurry. In B2B, sales (paid) is the norm. Unless a product relies on product-led growth (PLG), that is, which is more common in SaaS businesses. In most cases, adoption is due to a mix of organic and paid. That makes estimating the probability of high engagement given strong adoption difficult.

A case in point: SAP. Their annual user growth is mediocre. Their NPS is 12, which is poor by B2B standards. Despite all that, they managed to increase revenue by 10% last year with their atrocious software, because it runs inside most corporations in the world. SAP's usage statistics may be high, but employees who are forced to use SAP in their jobs hardly have a choice. SAP is thus successful despite lacklustre user growth.

Instead, we can use the law of total probability by partitioning on \(S\) as follows: \(\mathbb{P}\left(E\vert A\right)=\mathbb{P}\left(E\vert A,S\right)\mathbb{P}\left(S\vert A\right)+\mathbb{P}\left(E\vert A,\neg S\right)\mathbb{P}\left(\neg S\vert A\right).\) Sometimes that makes it a little bit easier:

  • \(\mathbb{P}\left(E\vert A,S\right)\): if the product is a success and given good adoption over time, what is the probability of strong engagement? Better than 50%, I’d wager. Let’s say 60%.
  • \(\mathbb{P}\left(E\vert A,\neg S\right)\): if the product is a failure and given good adoption over time, what is the probability of strong engagement? Maybe 20%.
  • \(\mathbb{P}\left(S\vert A\right)\approx 0.13\), which we calculated in step 1.
  • \(\mathbb{P}\left(\neg S\vert A\right) = 1-\mathbb{P}\left(S\vert A\right)\approx 0.87\).

With these figures, \(\mathbb{P}\left(E\vert A\right) = 0.25\). Hence,

\[\begin{eqnarray} \mathbb{P}\left(S\vert A,E\right) &=& \frac{\mathbb{P}\left(E\vert A,S\right) \mathbb{P}\left(S\vert A\right)}{\mathbb{P}\left(E\vert A\right)} \\ &=& \frac{0.60\cdot 0.13}{0.25}\\ &=& 0.30 \end{eqnarray}\]

3. Retention

For \(D_{3}=R\), we need to come up with \(\mathbb{P}\left(R\vert E,A\right)\): given excellent adoption as well as engagement, what is the probability of a flat retention curve? Adoption may or may not be a cause for high retention. High engagement is very indicative of retention, though. It is uncommon for users to use a product a lot and not stick around unless something unexpected happens: price increases, half-baked features that cause lots of problems, a UX refresh that only confuses, an acquisition that causes feature development to halt and the product to rot, and so on. So, the probability we seek is high, but not exceptionally so. Maybe 75%.

We can either expand the denominator again with the law of total probability and therefore see the same term again, or try and guess \(\mathbb{P}\left(R\vert A,E,S\right)\) directly. Here, an educated guess suffices. If our product is successful with plenty of new users joining and a highly engaged customer base, the probability of seeing excellent retention is unity, almost by definition. To not bias the computation too much, let’s settle on 99%.

\[\begin{eqnarray} \mathbb{P}\left(S\vert A,E,R\right) &=& \frac{\mathbb{P}\left(R\vert A,E,S\right)\mathbb{P}\left(S\vert A,E\right)}{\mathbb{P}\left(R\vert A,E\right)} \\ &=& \frac{0.99\cdot 0.30}{0.75} \\ &\approx & 0.40. \end{eqnarray}\]

All together now

Here’s a summary of the steps and the probability of success at each step:

Signals \(s\) \(\mathbb{P}(S\vert s)\) Boost
\(\lbrace \emptyset \rbrace\) 0.05
\(\lbrace A\rbrace\) 0.13 2.5×
\(\lbrace A, E\rbrace\) 0.30 2.4×
\(\lbrace A, E, R\rbrace\) 0.40 1.3×

All in all, each signal boosts the probability of success, and, taken together, they almost increase it by an order of magnitude. That’s why it’s important to look at more than one signal of success.

You might wonder why the retention only bumps the probability of success so little compared to adoption and engagement. The reason is that by the time we include retention, we have already incorporated two other signals, so the baseline is already much higher: the posterior is always relative to the prior.

The quality of the joint probabilities is crucial. Here, I have mostly guessed values based on common product sense, but in real situations it is imperative that technical PMs compute or estimate these from actual data. If that is not possible, a guess from a domain expert is an acceptable fallback.

The order matters, though: had we started with the retention as the first signal, we would have arrived at \(P\left(S\vert R\right) = 0.45\) in the first step, which is already higher than the value we found after the third step. This is a good indication that we are dealing with complex dependencies that require a more sophisticated method, such as probabilistic graphical models.

Beyond naive and sequential Bayes

Naive Bayes overestimated the probability of success by more than a factor of two, because our signals are positively correlated and not conditionally independent. In sequential Bayes no such assumption is made. But we model dependencies in a linear chain: retention is affected by adoption and engagement, and engagement is affected by adoption.

Highly engaged users retain better than disengaged users, so engagement and retention are definitely correlated and therefore statistically dependent. If adoption is mostly driven by organic growth (e.g. word-of-mouth), adoption and engagement are statistically dependent, too. But if adoption is primarily driven by paid campaigns, adoption and engagement might be statistically independent; more new customers (adoption) does increase the probability of more usage (engagement). Organic growth is when long-term engaged customers recommend a product and therefore increase adoption, so from engagement and retention back to adoption.

In such a case, it would be better to model all joint probabilities with a probabilistic graphical model, such as a Bayesian network. Any loops would have to be unrolled over time slices, as Bayesian networks are acyclic. Once you have such a network, you can use a Monte Carlo simulation to model uncertainties in the joint probabilities, where you sample from probability distributions many times to obtain a confidence interval for the probability of success. I’ll leave that for a future post. Probably.