The Timestamp Delusion

At Spotify, we once chased a problem that sounded straightforward enough: how does a listener reach the moment where they press play?

It became months of engineering work, even though it was one product with full ownership of the client and the backend.

Some events arrived out of order. Others vanished into the void. A few were duplicated. Client clocks drifted. Devices lived offline, then reappeared with a different idea of the present. Some logs showed client time, others server time.

Instrumentation itself was a mess. Each feature team owned its own events. As such, identifiers were not stable. Telemetry behaved like sediment: layers built up over many years where any interpretation required a data geologist.

As a consequence, I supervised a master’s thesis on reconstructing missing events using machine learning, because straight ingestion was too lossy. A good part of this overlapped with the company’s later redesign of its event delivery system. Transactional guarantees were reserved for critical data, whereas non-critical events, such as telemetry from the user interface, could be dropped without notice. At half a trillion events a day you simply accept loss. Eventually this pain helped push the company towards a more centralized approach to telemetry and discipline.

Timestamp numerology

Now consider internal platforms.

The user base is tiny, but workflows are diverse. The toolchain sprawls across IDEs, CLIs, browser tabs, GitHub PRs, Slack threads, operations dashboards, wiki pages, and a constant stream of web searches or AI assistant chats. No one team owns the whole surface, not even with a central developer portal, though it can reduce context switches. Laptops routinely go dark and return with altered clocks. Everything that was painful at Spotify becomes worse. Much worse.

The idea is the same: stitch timestamps across tools, sort everything by time, draw a Sankey diagram, and proclaim a user journey. It looks analytical, but it is numerology.

While user identifiers are not the same across tools, timestamps are the real problem. No internal tool shares a dependable clock. Across tools, you get contradictory event orderings. Within a single tool the same nonsense appears across sessions: a device moves between time zones, corrects itself after being offline, or wakes up only when the VPN re-connects after a time-out.

This is hardly new. Leslie Lamport explained the core issues of time in distributed systems already back in 1978. Physical timestamps cannot reliably express causal order. You need a formal relation that expresses it; without such a causal structure, no amount of timestamp magic yields a coherent narrative.

At Spotify, with a serious centralization effort, keeping semantics coherent was a lot of work. In internal platforms the idea of shared meaning is fanciful, especially as governance tends to be much weaker. If semantics can drift within a single product, combining many of them only multiplies the ambiguity. What you can reconstruct from timestamps is not a user journey. The prettier the diagram, the easier it is to forget that the data is primarily noise with an occasional signal.

Metrics mysticism

The metrics layered on top are even stranger. Internal platform PMs often copy the consumer PM canon: NPS, MAU, retention. In consumer products these derive their meaning from choice, though I personally believe any engagement metric to be pure vanity. Inside an organization customers have no choice. Retention does not measure utility, merely continuation of employment.

This is not Goodhart in its classic sense, for the measure was never appropriate for the domain at all. It is simply copied out of habit or intellectual laziness. NPS is the most glaring case. Reichfeld’s original formulation rests on the premise of word-of-mouth advocacy in competitive markets, although a customer effort score (CES) may be a more appropriate metric at the feature level. Inside engineering teams nobody earnestly recommends an API or operations dashboard to family or friends. Among colleagues internal tools appear mostly in conversations as rants. An internal NPS score therefore measures nothing real. Even in consumer markets, the foundations are shaky: NPS fails to predict revenue growth, which makes its use in internal platforms even more asinine.

Metrics with empirical support are ignored. Software delivery performance is captured by deployment frequency, lead time, change failure rate, mean time to restore, and reliability. These are the outcomes that internal platforms influence, though many are outside of their immediate control. They are rarely used because non-technical PMs fear metrics that are not in the standard product management literature. A platform that shortens lead time, trims failure rates, and reduces cognitive load while requiring fewer user interactions is a success, even though MAU may insist it is not. When the goal is to reduce toil, a metric that rewards time spent instead of toil avoided is fundamentally absurd.

Once you remember what it took to make user journeys even vaguely credible inside a single, coherent product, it becomes impossible to take cross-tool journeys in internal platforms at face value. In controlled conditions the problem is hard enough to reshape an organization. In uncontrolled environments, the idea that you can stitch together timestamps to infer user behaviour is pure fiction.

Perhaps more importantly, time slices tell you astonishingly little about the overall cognitive load or task success. Internal platforms exist entirely to increase productivity and standardize for the sake of compliance. The most important metric, which is coincidentally also completely ignored in consumer products, is task completion. If people can achieve what they came to do fast and reliably, the product does what it is supposed to do. Task completion already subsumes lead time, deployment frequency, change failure rate, and cognitive load, because none of these can be deficient when a task is completed swiftly and without friction. You might see task completion in satisfaction surveys, but you cannot see it in engagement of internal platforms. D/W/MAU metrics are the opposite of insightful; they measure presence, not progress.