
Intro
It was a Tuesday afternoon. I was vibecoding — three Claude Code sessions open in three Ghostty windows on my monitor. Session A was refactoring the order pipeline. Session B was building out a new admin endpoint. Session C was migrating a fixture loader to a new schema. I was supervising — reviewing diffs as they landed, approving tool calls, occasionally typing a clarification. I had been doing this for about an hour.
When I leaned back, the numbers were impressive. Roughly two thousand lines of production code had landed across the three branches. Four architectural decisions had been made — quietly, in passing: which retry strategy session B's endpoint should use, where session A would place the new transaction-log table, how session C should handle the version-skew between old and new fixture shapes, whether to keep a custom exception class or replace it with a Result-style return. Six files I hadn't planned to touch had been refactored по дороге — incidentally, on the way to the actual work.
I closed my laptop satisfied and went to make coffee.
The next morning I sat down to review what had shipped. By the third file I was sure of something I did not want to be sure of: I could not honestly defend a single one of those four architectural decisions in a code review. I could read the diff. I could see that each choice had been made. I could even reconstruct why by scrolling back through the chat transcripts. But I could not — sitting across from another engineer asking "why did you choose X over Y here?" — give the answer that ownership requires:
I considered Y. I considered Z. I chose X because of
.
I had not weighed anything. I had not chosen anything. And yet, on the pull-request description, my name was on it.
The conventional response to this is to demand more discipline — pair-program with the agent; review each diff harder; slow yourself down by hand. Operational discipline helps. But it bottoms out against a structural ceiling — and the ceiling, not the floor, is what AI velocity has moved. The next five sections explain where the ceiling is, what it costs you the day it surfaces, and what to do about it that actually changes the equation.
Why the brain is not a CPU
If you reason about this as if your brain were a CPU, the model is broken in the following way.
CPUs have cores. Cores run threads. So — hypothetically — three "cognitive cores" would mean three Claude sessions supervised in parallel, the cognitive load divided like a Linux process scheduler. Engineers don't articulate this assumption out loud, but it's the implicit mental model behind unbounded vibecoding. It does not survive contact with neuroscience.
Dehaene's global-workspace framework — one of several competing accounts of conscious access (others include Tononi's Integrated Information Theory and Higher-Order Theories), but the one most directly applicable to bottleneck arguments — points out that a "massive flow of sensory stimulation" reaches the brain at any moment, but only one attentional focus enters the global workspace at a time and becomes available for report, reasoning, and action (Consciousness and the Brain, p29-30). The brain has many parallel processors at lower levels — that is real — but conscious decision-making is a serial bottleneck. Below the workspace, parallel processing continues unconsciously; above it, there is one queue.
Working memory — the cognitive workspace where you hold the file you're reviewing, the architectural constraint, the bug you suspect, the alternative you considered — is even more constrained. Nelson Cowan's research (cited via Levitin, The Organized Mind, p457) pins reliable working-memory capacity at roughly 4 ± 1 chunks in controlled lab paradigms — visual arrays, digit span, the canonical experiments. Some researchers (Oberauer, Ma, Husain, Bays) argue capacity is better modelled as a continuous resource than discrete slots, but the magnitude is similar. Real-world supervision uses externalized state — on-screen diffs, written notes, IDE state — which raises the effective ceiling. But only modestly. The honest number for sustained held context across three Claude sessions is still single-digit.
Cognitive Load Theory (Sweller et al., Cognitive Load Theory, p61, p66) pins novel-information processing capacity even lower — two to three items per moment. Sweller's number is tighter than Cowan's because they're measuring different things (processing vs storage); both numbers are contested in detail; both converge on the same operational ceiling. Kalyuga's expertise-reversal effect adds a wrinkle that strengthens the argument for our case: the same instructional load that helps novices learn hurts experts who don't need it. The vibecoding operator is the expert; AI-imposed cognitive load eliminates the small advantage expertise normally affords.
But surely we multitask all the time? Surely I can talk on the phone while driving, write code while listening to a podcast? Gazzaley and Rosen (The Distracted Mind, p89) are blunt about what happens in those cases: there is a "failure of our brain to truly multitask at a neural level." What feels like multitasking is task-switching — the brain rapidly serializes the streams. Each switch costs anywhere from 100 milliseconds for simple, predictable task pairs up to several hundred milliseconds (sometimes seconds) for complex pairs (Monsell, Task switching, 2003). In a one-hour vibecoding session with three Claude streams interleaved, you can spend a substantial portion of the time in pure switching overhead, none of which produces output.
Watson and Strayer's 2010 lab study found roughly 2.5% of subjects passed a dual-task threshold without measurable decrement — the so-called "supertaskers." Self-assessment of one's own supertasking ability is unreliable in both directions, so assume you are not the 2.5%. The base rate is what it is.
So the engineer's CPU metaphor is broken. The brain is not a multi-core processor running cognitive threads. The closer match is the JavaScript runtime: an event loop, single-threaded, processing callbacks one at a time, with a microtask queue that fills faster than it drains. When you have three Claude sessions producing diffs and decisions, you do not have three workers reviewing in parallel. You have one event loop, deferring most of what arrives, dropping the rest.
The bottleneck is in the brain. There is no architectural change to your workflow that can move it.
Asymmetry — shipping accelerated, ownership did not
The standard productivity metric — lines of code, pull requests, features shipped — measures output. AI multiplies output substantially: credibly 2-3x in throughput-bounded tasks per published benchmarks (METR 2024; the GitHub Copilot and Microsoft 2024 studies report 20-55% across various measures), anecdotally higher in code-generation-bounded sessions like the Tuesday I described in the intro. The exact multiplier isn't the point. What matters is that it's real and large.
But for any codebase that lives longer than three months, output is not what limits you. The constraint is comprehension-bounded ownership — the set of operations that depend on you having personally held a mental model of the system in working memory:
- Defending a decision in code review three months later.
- Identifying the root cause when something breaks in a way that doesn't match the symptoms.
- Noticing drift between what the PRD specified, what the code does, and what the team thinks it does.
- Choosing the right level of abstraction when the next change request lands.
None of these are scaled by AI. They are scaled by what you read carefully, what you understood, and what you still hold in active working memory three weeks later. The AI's "memory" of why session B chose Result over an exception class evaporates the moment the session ends. Your memory of it never existed in the first place if you didn't personally consider the trade-off.
This is the asymmetry. AI accelerates the rate at which decisions are made. It does not accelerate the rate at which decisions are understood, owned, and defended later. Those happen in a brain whose serial bottleneck and working-memory capacity we covered earlier, and that brain didn't get a 2x upgrade in 2026.
A small empirical point that drives this home: Sophie Leroy's research on "attention residue" (cited in Cal Newport's Deep Work, p26) shows that even after you finish Task A and move to Task B, a measurable portion of your attentional capacity remains stuck on A — unfinished elements, unresolved choices, "did I handle the edge case there?" The effect is measurable up to 20+ minutes post-switch. The implication for vibecoding is uncomfortable: every Claude session you supervise leaves residue on you long after the session closes. My extrapolation — that residue accumulates roughly additively across N parallel sessions — is the natural one but is not strictly what Leroy measured; she tested sequential task pairs, not parallel supervision. The direction is right; the magnitude is unknown.
There is a secondary effect Levitin documents (The Organized Mind, p124-126): repeated task-switching drives anxiety, and anxiety raises cortisol — the body's stress signal — which then makes the next switch more expensive. The mediating step matters: perceived control over the workflow moderates the chain. Engineers who feel in-control during vibecoding show smaller cortisol responses at the same switch rate. Engineers who feel overwhelmed pay the full bill. Vibecoding feels productive in the moment and exhausting by evening for a specific neuroendocrine reason: at high switch rates and low felt-control, your endocrine system reads sustained vigilance as sustained threat.
When colleagues ask how I shipped twelve features last week, I have an honest answer. Supervising AI is real work; it has real value; it is part of the job. But it is a different category from authoring decisions. I did not author twelve features last week. I supervised the shipping of twelve features. There is a difference, and the difference is exactly what ownership is.
Responsibility debt — a new class of tech debt
We have a name for code that works today but has compounding hidden cost: technical debt. The metaphor is borrowed from finance and is largely accurate — you traded future engineering hours for present velocity, and the interest accrues until you refactor.
What I described above — code I cannot defend, decisions I did not make, choices that have my name on them — is the AI-accelerated case of what Ward Cunningham (1992) and later Martin Fowler called reckless-inadvertent debt: code that compounds liability not through conscious trade-off but through inattention. What makes the AI case categorically worth a new name is its accumulation rate — one to two orders of magnitude faster than human-authored debt, which makes it the dominant form by volume in any AI-augmented workflow. Call it responsibility debt.
The crucial differences:
| Tech debt | Responsibility debt | |
|---|---|---|
| Visibility | Visible in code (TODO, smells, dup) | Invisible — code clean, tests green, CI passing |
| Cause | Conscious trade-off, usually documented | Unconscious — happens while you supervise |
| Accumulation rate | Bounded by typing speed | Bounded by AI generation speed (1-2 orders of magnitude faster) |
| Surfaces when | Code review, refactor sprint | First production incident |
| Cost when surfaces | Hours to days | You may not be able to debug your own codebase faster than rewriting it |
That responsibility debt is invisible until the first incident is what makes it the more dangerous of the two. Tech debt at least nags you — every time you open the file you see the comment, the smell, the temporary fix that lasted a year. Responsibility debt does not nag. The code looks fine to you because you never read it carefully in the first place. You only discover the gap when something breaks in a way you cannot debug from cold.
There is a deeper reason this happens. Kahneman popularized it as cognitive ease (Thinking, Fast and Slow, p67, p80, p89); the lineage runs back through Schwarz and Reber's earlier work on processing fluency. System 1 — fast, automatic, pattern-matching — accepts inputs that fit the expected shape without engaging System 2's slower, deliberate scrutiny. AI-generated code is engineered to look reasonable. It compiles. It follows your house style. The tests pass. Every signal System 1 uses to decide whether to invoke System 2 says: fine, looks fine.
(Kahneman himself, in Noise (2021), called the System-1 / System-2 framing a "simplification convenient for exposition." The dual-system architecture isn't literal neurophysiology. But the underlying point — that fast automatic processes accept inputs the slow deliberate system would interrogate — survives the simplification.)
The faster the output flow, the more System-1 acceptance happens by default. Vibecoding's velocity is, structurally, the velocity at which you stop reviewing carefully because nothing flagged itself for review.
This is also why "more discipline" is not the fix. The endocrine bill above and the decision-quality bill compounding here are not personal failings — they are structural consequences. Discipline is a System-2 resource, finite by design; you will run out of discipline before you run out of AI output. This is why "just review more carefully" fails: not because you lack will, but because System 2 has a fixed daily budget and AI's daily budget is unbounded.
Tools should pace to humans, not the other way around
Once you accept that the bottleneck is structural and not operational, the response is obvious in direction: slow the AI down to match the operator's responsibility-bandwidth. Not the operator's typing speed. Not their reading speed. The bandwidth at which they can take a decision, weigh alternatives, and earn the right to call the decision theirs.
This is counter-intuitive because we have been trained to optimize for output. Slowing the tool down feels like reverse engineering. But there is precedent. Lisanne Bainbridge's Ironies of Automation (1983), later popularized by Nicholas Carr in The Shallows and The Glass Cage, named the automation paradox: automated systems that exceed the operator's monitoring bandwidth do not free the operator. They shift her role from agent to observer, and observers cannot intervene fast enough when the automation fails. Aviation discovered this in the 1980s, named it, and built training regimes around it. Software has not yet caught up.
The frame from the neuroscience of action gives the same prescription in different vocabulary. On the predictive-processing view — still contested, though increasingly influential — Andy Clark (Surfing Uncertainty, p272-274) describes how tools we use well become scaffolding for prediction: the brain delegates to the tool because the tool's behavior is predictable enough to fold into the brain's own model of action. When the tool produces outputs the brain hasn't predicted, every output becomes prediction-error to process — expensive in attentional cost and in cortisol. Anil Seth (Being You, p116, p211) generalizes: on this view, the brain is better understood as a prediction machine than as a passive data processor. Seth himself is careful to note this is "a theory" — at one point he calls Friston's Free Energy Principle "mathematical philosophy rather than a specific theory that can be evaluated by hypothesis testing." Take it as influential, not settled.
Annie Murphy Paul gives the same conclusion from outside neuroscience (The Extended Mind, p223): "in using a tool, our 'body schema' — our sense of the body's shape, size, and position — rapidly expands to encompass it." Tools extend the mind, but only when the mind can fold them into its own action loops. A tool that overruns the host system is not extension. It is replacement.
Concrete mechanics that pace AI to the human:
- Bounded parallelism. N = honest working-memory capacity. For most engineers, N ≤ 2.
- Forced consolidation passes. Between waves of AI work, the operator stops, reads what shipped, and writes a short summary in their own words. The act of writing is the test of understanding.
- Devil's advocate / adversarial review of every load-bearing decision. The AI proposes; a separate AI process (or a human) attacks. The operator decides the resolution.
- Salience tagging. Not all decisions are equal. High-blast-radius decisions get slow review. Low-blast-radius decisions get the fast lane. The operator chooses the cutoff.
- Prediction-error checks. Before approving a diff, the operator predicts what the diff will look like. If actual diverges substantially from predicted, that is a slowdown signal — something is happening you do not understand.
- Sunk-cost circuit breakers. After N non-converging iterations on the same problem, the system pauses for human judgment instead of trying again. The AI does not know to stop on its own.
(Your N is not the average N. Conscientious readers should resist the urge to drive N to 1 from guilt — over-correction has its own costs. Readers with ADHD may find different ergonomics work; the multitasking story is qualitatively different, not strictly worse. If you're 40+, your honest N is probably lower than you remember at 25. The number is yours to calibrate, not mine to prescribe.)
If you take only one thing from this article, take this: tomorrow, cap parallel Claude sessions at N=2 and write a 3-sentence summary in your own words after each. The act of writing is the test of understanding. This single change closes most of the gap without installing anything new.
Objections
Four common pushbacks readers will have, and where I land on each.
"I've supervised many AI sessions and survived just fine." Granted. The endocrine bill in the asymmetry section is below conscious threshold for short bursts; you feel it after months, not weeks. The diagnostic isn't "do you feel tired?" — it's "can you defend each of last week's commits in a cold code review?" If yes, you're outside the failure mode. If no, the bill is accruing whether you feel it or not.
"More discipline IS the answer." Discipline is real and helps. But it is a finite System-2 resource that does not scale 2-3x with AI output. Structural relief — bounded parallelism, forced consolidation, prediction-error checks — moves the bottleneck instead of asking discipline to absorb it.
"This is just personal-productivity advice in neuroscience drag." Partially fair. The neuroscience does load-bearing work in two places: explaining why the bottleneck is single-threaded (global workspace, working-memory capacity) and why discipline is finite (System-2 budget). Strip those out and yes, the prescriptions reduce to "do fewer things at once and write down what you understood." That's not nothing, but it's not new either. The contribution is identifying why the prescriptions are now structurally necessary in a way they weren't before AI.
"You're criticizing the tool because YOU can't use it." Possible read. My prescription is bounded parallelism, not abandonment — I use Claude Code daily, including to draft this article. The criticism is of unbounded vibecoding-as-promoted, not of AI coding generally. If you read this as anti-AI, re-read the prescription.
Case study — a multi-agent pipeline as architectural slowdown
The obvious objection to this section is that it is a product pitch. Here is why I am including it anyway: every mechanic in the prescription above is general — slow down, consolidate, devil's-advocate, prediction-error check — and none of them are tooled in the AI-coding stack today. I built one possible tool to embody them; you should build (or use) another. The point of this section is not the tool. The point is what the mechanics look like once you stop hand-waving and write them down.
The repository is at github.com/codefather-labs/claude-code-sdlc. It is an open-source pipeline of 22 specialist agents wrapping Claude Code, designed not to make AI smarter — Claude is already smart enough — but to slow it down to match my responsibility-bandwidth. Each of the six mechanics above is implemented as a specific protocol, each loosely modelled on a known neural function (read these as analogies, not 1:1 mappings — every brain region listed below does many other things in addition to the function I'm naming):
- Anterior cingulate cortex is implicated in post-error slowing — humans who err become measurably more careful on the next decision. The pipeline injects "deliberate mode" directives on the iteration after any failure: smaller diff target, mandatory pre-flight typecheck, no adjacent refactors.
- Orbitofrontal and ventromedial prefrontal cortex update the value of an ongoing strategy as evidence accumulates — humans (when those circuits work) disengage from losing trajectories instead of pouring more effort in. The pipeline fires a circuit breaker after three consecutive non-converging iterations touching the same files. No more attempts. Human judgment required.
- Hippocampal sleep-replay consolidates the day's memory by re-playing it at speed during sleep. The pipeline runs a "consolidator" agent between work waves doing the same: re-reading what each agent produced, surfacing cross-artifact drift before it compounds.
- Anterior insula and dorsal anterior cingulate together form the salience network, gating which signals get amplified into central-executive processing and which stay below threshold. The pipeline tags every fact and decision with high/medium/low salience so downstream reviewers spend their finite attention on what matters.
- Default Mode Network engages during rest, mind-wandering, and self-referential thought, surfacing connections the task-positive network is too narrowly focused to notice. The pipeline has a "reflection" agent that runs on demand with no specific task — just to walk the project state and surface what nobody asked about.
- Predictive coding (the Clark / Seth / Friston frame): every plan slice carries a predicted outcome; a verifier agent checks predicted-vs-actual delta and flags large deltas as plan-implementation drift.
- Confirmation-bias debiasing — a "red-team" agent argues against every plan before implementation begins. Plans that survive adversarial review proceed. Plans that don't get sent back to the planner.
I am not pitching this as best practice. It is one operator's pragmatic answer to a problem that has many possible answers, and you should build your own. The point worth taking is not the pipeline. The point is the direction: in 2026, the engineering question is no longer "how do I get AI to ship more?" That problem is solved. The engineering question is "how do I keep ownership of what AI ships on my behalf?"
I built the first version of this pipeline on a Sunday and used it the following Tuesday. The morning after, when I sat down to review what three Claude sessions had shipped overnight, the difference was qualitative: I could read each diff in the context of the predicted outcome the planner had written and the consolidator's drift summary, and I could defend every decision. The pipeline's only job is to keep me in that mode for longer than I can do unaided.
Real productivity gain from AI = output × your ability to defend each commit. The formula is not formal — call it a slogan with operators in it — but if you cannot defend more than half of what shipped on your behalf, your effective multiplier is less than 1. You have not accelerated. You have accumulated debt.
Vlad Benkovskyi · codefather.dev · 2026