The Missing Middle of Machine Learning Infrastructure

The hard part of machine learning is not serving features but deciding what they are.

Everyone talks about going from prototype to production, yet hardly anyone talks about the absence of what ought to lie in between.

ML feature development happens in two incompatible environments. Notebooks and exploratory Python scripts are permissive and brittle. Meanwhile production systems demand explicit contracts: schemas, time semantics, backfills, and SLAs. They are expressed through platform-specific DSLs exposed through APIs that assume you already know exactly what you need. There is no middle ground. That absence, more than anything, is what slows machine learning down.

Exploration and production are not merely development and subsequent deployment. They act on different artefacts. In exploration, a feature is a hypothesis. It might matter to the model or it might be noise. In production, a feature is a contract. It must behave predictably under delays, loss, drift, and backfills. And no existing system bridges that transition well.

Feature platforms such as Chronon and Tecton are built around the assumption that features eventually settle into fixed shapes with stable meanings. That assumption is what makes their guarantees possible.

The problem is when these systems are treated as if they should also be hospitable places to discover features. They are not, and they cannot be, because discovery lives on ambiguity and these platforms exist to kill it. People explore in notebooks, because notebooks tolerate ambiguity. That is why the rewrite boundary exists: features must be translated into a form that can be backfilled, served, and defended.

Unnested or exploded columns and pivots strain the tabular view of such feature platforms. A feature such as “count per customer” or “activity by category” does not have a fixed schema; keys appear and disappear. Backfills re-materialize something whose shape evolves with the data. Platforms cope by allowing maps or structs, which is an admission that the tabular abstraction is insufficient on its own. The system can still guarantee that the computation was completed consistently, but it cannot guarantee that the feature means the same over time.

The same goes for one-hot encoding. It requires a vocabulary, a dimensionality, and a policy for unseen categories. That is why it tends to be pushed downstream in the model code itself rather than as a reusable feature in a machine learning platform. Embeddings break both the tabular shape and the idea of a stable meaning. An embedding’s geometry is defined by a model, an objective, and a model. Change any of those and the representation is no longer comparable. This is a case where the industry has built a parallel stack.

Vector stores such as Pinecone, Milvus, and Weaviate promise availability, performance, and similarity search over vectors. They do not promise point-in-time reconstruction, semantic stability, or reproducibility across model versions. Pinecone, for instance, sees versioning as an application-level concern handled by namespaces or metadata. They optimize for relevance now, not explanations later. They also bypass the exploration-to-contract problem by refusing to offer contracts at all.

Exploded columns, sparse maps, and pivots all live between tabular features and embeddings. They are neither columnar nor pure representations. Vector stores reject them, whereas feature platforms stretch to accommodate them. And this is exactly where the crucial ML work lives, and it is where the tools are weakest. The real accelerator for ML development is support for explicit semantic graduation, a way for features to accumulate obligations gradually.

DevOps succeeded not because CI/CD was sensible, but because it created environments between idea and commitment: staging, canaries, shadow traffic. No such equivalents exist for feature tools. We still force people to jump from the chaos of notebooks straight into production ceremony.

Until machine learning infrastructure treats the meaning of features as something that evolves rather than something that is declared once, the prototype-to-production gap persists. Feature platforms and vector stores are not failures, though. They are opposite responses to the same reality. One insists on contracts too early, whereas the other refuses them altogether. The space in-between, the middle, is still left to humans, glue code, and hope.