Essay · March 24, 2026

Smart Models, Dumb Pipes

Most people are using LLMs as question-answering machines. That's not wrong, it's just not the interesting part.

Braydon McCormick · Means of Production

Overhead view of an office floor, workers at desks, amber sparks streaming through the space and narrowing to cool blue light — information under pressure.
Where human judgment creates drag and exposure — that's where the interesting work is.


I keep saying “smart models, dumb pipes” in conversations and then moving on like I’ve explained something. I haven’t, really. So here’s my actual attempt to unpack it.

The phrase borrows from an old argument in network architecture, one I think got settled the right way and hasn’t gotten enough credit for the lesson it contains.

This post extends an argument I made in You Don’t Want Another Chatbot — that one approached the same gap from the product side. This one takes it into architecture.

If you want the builder view, open the “Technical” toggles as you go — I tucked the implementation detail there rather than turning the whole thing into an engineering essay.

Where the phrase comes from

In the early internet era, there was a real debate about whether network intelligence should live in the network itself or at the edges. The traditional telephone network was what people called an intelligent network: switching logic, routing decisions, and service features were all baked into the infrastructure. The internet took the other side: dumb pipes. The network just moves packets. It doesn’t know, and doesn’t need to know, what’s in them. The intelligence lives at the endpoints.

That design won, decisively. And I think the reason it won is worth holding onto: when you put the intelligence in the pipe, the pipe becomes the bottleneck. Every new use case requires changing the infrastructure. When the pipe is dumb and the endpoints are smart, the system stays flexible. New things become possible without rebuilding what’s underneath.

I keep thinking about how directly that maps to AI. Kind of uncomfortably so.

What most people’s mental model actually is

Most people’s mental model of an LLM comes from the chat box. You type something in, it responds. That’s how I started with these things too, and there’s real value in it, I’m not arguing against it.

But I think the chat box trains you toward a particular framing: LLM as question-answering machine. Ask something, get something back. Better prompt, better answer. Optimize the input, improve the output.

That’s fine as a framing for a tool, though. It’s not, as I see it, a useful framing for a system.

The cost is subtle. Once you’re in the Q&A frame, you start optimizing for the wrong variable. You get “AI strategy” that’s really prompt strategy. You get enterprise products that are basically search with a chat front end. You get the question of how to get AI to answer employees’ questions faster, which is a real question, but it’s the narrow version of what’s actually available.

The return on expense question for AI, as I think about it, isn’t “are the answers better?” It’s “where does judgment belong in the workflow, and what does it cost when that judgment is wrong?”

Those are very different questions. I was asking the first one for longer than I’d like to admit.

flowchart LR subgraph jl ["The Judgment Layer"] direction TB j1["Context + Task"] --> j2["Model\nJudges what should happen"] j2 -->|routing call| j3["Dumb Pipe\ncarries it faithfully"] j3 -->|governed execution| j4["Side Effects\n+ Audit Trail"] end subgraph qa ["The Q&A Frame"] direction LR q1["Prompt"] --> q2["Model"] --> q3["Answer"] end

The Q&A frame optimizes for the answer. The judgment layer asks what should happen and who should do it.

What I think these things actually are

Not a search engine with better grammar. Search retrieves. A model reasons. Those aren’t different in degree, in my view — they’re different in kind.

Not a chatbot that got promoted. A chatbot routes inputs to predefined outputs. A model interprets context and decides what the output should be.

What I keep coming back to: an LLM is, in my view, a judgment machine. Something that can hold a problem, reason over a space of possibilities, weigh competing options, and produce a decision about what should happen next.

Not retrieval. Judgment.

There, I said it.

I should note: that judgment doesn’t come ready-made. Models don’t have inherent opinions. They don’t arrive with a default point of view on your domain or workflow. Humans have to force the judgment function onto them, through focused, narrow system and user prompts that define what kind of reasoning is expected and from what angle. The “smart” in smart models isn’t a product of the model in the abstract. It’s the model given a specific thing to be smart about. A poorly scoped prompt into a capable model produces capable-sounding noise. That distinction matters more than most people realize when they’re designing a system around this.

Once you start thinking about it that way, the question changes. It’s no longer “what should I ask this?” It becomes: where does judgment belong in this system, and what should I route through it?

The model-mediated architecture

If models are judgment machines, a design principle follows directly — one I’d call model-mediated architecture — and I think it’s pretty straightforward once you see it.

Smart models own the judgment work: what should happen, why, when, in what order, which agents or processes are appropriate. The model decides what something means and what to do because of it.

Dumb pipes own the mechanical work: execution, delivery, storage, record-keeping, audit trails. The pipe moves data. It doesn’t need to understand what it’s carrying. It just needs to carry it faithfully and leave a trace.

I should note: model-mediated does not mean the model directly owns side effects. It means the model owns the judgment call, what should happen and why, and deterministic systems own the execution and the record.

Or more simply: the model decides; the infrastructure runs it.

That distinction matters because otherwise you get one of two failure modes I see constantly:

  • Brittle rules pretending to be intelligence. Complex conditional logic someone built to simulate judgment, which breaks the moment the real world doesn’t match the assumptions baked into the code.
  • Ungoverned model behavior pretending to be an operating system. Outputs that aren’t verified, logged, or traceable. “The model said so” as a source of truth. That’s not a system. That’s exposure.

Neither works. The clean separation is what makes it work.

Technical: deterministic vs. model-mediated orchestration

In a deterministic orchestration system, routing logic is hardcoded — if X, then Y. Predictable, but brittle. It can only handle what the designer anticipated. Every edge case requires a code change.

In a model-mediated system, the model makes routing calls dynamically based on context, task requirements, and available options. Consider a concrete case: a task arrives tagged "routine" by the upstream classifier. A deterministic system routes it to the standard handler. A model-mediated system reads the full context — and notices the timestamp, the originating account, the recent activity pattern — and routes it as time-sensitive instead. The routing infrastructure is identical. What changed is who made the call.

One implementation detail worth noting: model outputs in this pattern are structured — a routing decision, a classification, a list of agents with reasoning. Not prose. The reasoning field in the output serves as an audit trail, not just a result. You can inspect why the model routed something the way it did, after the fact, without asking it to reconstruct its reasoning.

This is the dumb network argument applied to agents. The pipe doesn't need to understand the payload. It just needs to move it reliably and leave a trace.

flowchart LR SM["Smart Model\njudges · routes · decides\nwhat should happen and why"] DP["Dumb Pipe\nmoves data faithfully\nno transformation, no interpretation"] DE["Deterministic System\nexecutes side effects\nlogs · audits · records"] ST[("Storage &\nAudit Trail")] SM -->|"decision output"| DP DP -->|"raw payload"| DE DE --> ST

The model decides. The pipe carries it unchanged. The deterministic system runs it and writes the record.

Three things I’ve built around this

The expert panel simulator is public on GitHub, MIT licensed, so you can actually look at it. It’s a Python tool that creates a virtual panel of domain experts to review any topic sequentially.

What it does: the model assembles the panel for the task. Economist, futurist, anthropologist, contrarian, others depending on what the problem needs. Each agent responds in character, reads what the previous agent said, and builds on it, qualifies it, or disagrees with it. The contrarian is always included. Not as a nice-to-have. It’s structural; without it the panel tends toward false consensus.

The data flowing between agents is structured data moving through a pipe. The pipe doesn’t know it’s carrying a disagreement. That’s the whole point. The model’s job was deciding who’s in the room. The pipe’s job was moving the work between turns. It’s the agentic model in its most literal form.

It’s a demonstration more than a production system. I built it partly to convince myself the pattern was real, which I realize isn’t exactly a ringing endorsement.

Technical: how the pipeline actually works

Each agent receives a structured context object: the task definition, the assembled panel roster (each entry includes an area of expertise, default stance, and a few characteristic concerns), and the full message history up to that turn. It appends its response and passes the updated object forward.

Agents never communicate directly. The "conversation" is an illusion produced by that context object moving through a deterministic loop: read context, generate response, append to history, pass forward. What makes it feel like actual deliberation rather than role-play is that each agent is reading the prior responses when it generates its own — so disagreements are real disagreements grounded in what came before, not manufactured ones.

The model's substantive work happened earlier, when it assembled the panel for this specific task — deciding who belongs in the room, in what order, and why. Everything after that is mechanical execution.

Compare this to a single-prompt multi-perspective request, which is the naive version: you get personas that tend toward similar conclusions because they're generated in the same context window at the same time, without genuine prior-response pressure. The pipeline structure is what produces the tension. That's not obvious until you've seen both side by side.

DraftForge, private. This applies the same principle inside a production content system. The model selects the right combination of agents for a task: what diversity of perspective is needed, what kind of challenge will sharpen rather than just validate. Agents work sequentially and contradict each other by design.

If I’m honest, the output is genuinely different from what a single prompt to a single model produces. You get something closer to deliberation than generation. That only works because the model owns the judgment layer. The pipes just carry the turns. I wrote more about this architecture in the AI Forge post.

Lodestar / Meridian, also private. This one adds something the others don’t: model tiering. Different model capabilities matched to different judgment functions in the same system. It came out of building out a more systematic AI studio and needing to think carefully about what each layer actually required.

Lodestar is a situation intelligence platform for scenario planning and decision de-risking. The interesting part here is how the model assignments work — not just which model runs what, but why. Each tier is matched to the judgment complexity of its function, not just its cost. The diagram below lays out the routing. The toggle goes deeper if you want to understand how the three-tier supervision layer actually works.

flowchart LR t1["Intel Classification\nfast · high-volume"] --> H["Haiku"] t2["System Supervision\nheartbeats · restarts"] --> H t3["Scenario Reasoning\nspecialist analysis"] --> S["Sonnet"] t4["Escalation\ncomplex judgment"] --> O["Opus"] H --> Inf["Model-Agnostic\nInfrastructure\nsame data structure\nregardless of tier"] S --> Inf O --> Inf Inf --> Out["Execution\n+ Record"]

Right model for the right judgment function. The pipes don't know or care which tier ran.

The infrastructure carries the same data structure regardless of which model tier produced the output. The pipes don’t know, and don’t need to know, whether Haiku or Opus ran. They just move it.

Technical: model tiering and the Factory Brain

Haiku runs two functions: continuous heartbeat monitoring (process-is-alive checks at high frequency, cheap enough to run constantly) and first-pass classification — is this signal relevant, which routing category does it belong to. The judgment required is deliberately narrow: yes/no, route A or B. Not scenario reasoning. Right-sizing it matters; putting Sonnet-level reasoning here would be expensive and slower for no benefit.

Sonnet handles the work that requires actual reasoning: scenario analysis, specialist synthesis, anything where the output needs to hold up to domain scrutiny. This is where most of the substantive judgment in the system lives.

Opus is for escalation — complex incident resolution, cases where Sonnet's analysis was insufficient, or high-stakes judgment calls that need full reasoning depth. It runs infrequently by design.

The Factory Brain's escalation logic is deliberately mechanical: if Haiku's restart attempt fails twice, pass full failure context to Sonnet. If Sonnet's diagnosis doesn't resolve it within a threshold, compile an incident brief for Opus. The escalation criteria aren't smart — they're deterministic rules. What's smart is what each tier does with the problem when it gets it.

One structural detail: the output at all three tiers hits the same data contract — process state, action taken, confidence, and a reasoning field that serves as an audit record. Haiku's output and Opus's output are the same schema. The pipes that carry them don't know, and don't need to know, which tier ran. That's the whole point in miniature.

So it’s not just smart models and dumb pipes. It’s the right model matched to the right judgment function, all running through the same model-agnostic infrastructure. That’s what the principle looks like when it’s fully developed.

The question I think matters more

Once you start seeing LLMs as judgment layers rather than answer machines, a lot of the “AI strategy” conversation starts to look like it’s optimizing for the wrong thing. I include my own earlier thinking in that.

There’s a lot of work on how to get better answers. Not enough work, as I understand it, on where judgment belongs in a workflow, who owns the execution, and what the pipe should actually carry.

The question I find more useful, and I think it’s the question that leads somewhere, isn’t “how do we add AI to this?”

It’s: where does judgment belong in this system, and what should the infrastructure just move?

That second question is harder. It requires actually mapping the workflow, identifying the decision points, and being honest about what you want the model to own versus what should be governed by deterministic code with a proper audit trail. I’ve been working through this in practice for a while, and the mapping step is where most of the value actually lives.

One way I’ve found to get at it concretely: look for where human friction exists in a workflow, and what risk that friction carries. Slow judgment calls. Inconsistent ones. Decisions that are expensive when they’re wrong, or that accumulate risk quietly over time. Those are the interesting places, not “where can AI answer questions?” but “where is human judgment creating drag or exposure?” That reframe tends to surface the actually valuable interventions. And it tends to surface them in places that aren’t obvious from a high-level “let’s add AI” conversation.

But it’s the right question.

The chat box is a fine interface for exploration. It is not a system design.

The models happen to be the new material. The architecture is the old argument, finally applied somewhere it really fits. Part of developing the intuition for this is learning to tell the difference — when the pattern genuinely applies versus when it just sounds like it applies.

Process note

Part of a running series on how my thinking about AI development has evolved: vibe coding, CLI journey, agentic model, workflow integration, AI studio, developing intuition, AI forge, you don’t want another chatbot.

Written with Claude Code assistance using my own voice dossier as the style anchor.

Resources

  • shell-scenario-panel, public, MIT licensed
  • DraftForge and Lodestar/Meridian (private); reach out if you want to talk architecture