Metanode Lab

Why Bottom-Up Instrumentation Struggles to Scale in Biology

You Cannot Build a Sensor for What You Have Not Defined — AI Still Depends on What You Measure

Biology is not simple enough to be solved by better instruments. A living system operates across scales — molecular, cellular, population, process — with interactions between layers that are nonlinear, context-dependent, and rarely reproducible in isolation. We cannot design a sensor for every variable because we do not know all the variables. We cannot predict interactions because the interactions change depending on conditions we have not yet characterized. The complexity is not a temporary obstacle on the way to understanding. It is the structure of the system itself.

The traditional approach is bottom-up: identify a variable, build a sensor, measure it, optimize around it. This works when the system is well-characterized and the relevant parameters are known. But those are the parameters we already understand. The parameters that explain batch-to-batch variability, unexpected failures, and irreproducible results across sites are not in the monitoring framework because no one has identified them yet. You cannot build a sensor for a variable you have not named.

This is the bottleneck. Not that biology is too complex to measure — but that the complexity outruns any attempt to instrument it from the bottom up. Every new sensor captures one more variable. The system has thousands. The approach does not scale.

What scales is a layer that sits above the instruments — one that can navigate complexity rather than try to reduce it. Data linked across hundreds of runs, strains, scales, and facilities contains patterns no single experiment reveals. A parameter irrelevant in one dataset co-varies with a failure mode visible only in another. The information exists. It is distributed across the field’s collective data — and no one is looking at it as a whole.

This is AI not as optimization — but as the system that identifies what the design space should contain. What parameters matter that we have not measured. What interactions exist that we have not modeled. What instruments are needed that we have not built.

This requires a different relationship with data. Not aggregated endpoints shared in publications. Not filtered, averaged process data stored in isolated databases. It requires raw signal — preserved, structured, and linked across the boundaries of individual labs and individual processes. The next layer of biological understanding is not in any single experiment. It is in the connected data we have never built the infrastructure to keep.

We have the biology. We have the AI. What we are missing is the data layer between them — one that preserves what the biology produces, links it across contexts, and gives computational methods something real to work with.

— Pegah Farr

Leave a Reply

Your email address will not be published. Required fields are marked *