A Fractal-Based Approach to Agentic Software Development


First up, I’m going to apolgize for what is a very ramblimg post. I’ve been working on a lot of the ideas here for a number of weeks now and this is sort of an accumlated brain dump of a lot of what I’ve been playing around with and doing (and I saved you the pain of the whole word salad when I realized just how long this had become :-). With that said, let’s get into it.

Most AI-assisted coding tools today work like this: you describe what you want, the model generates code, you fix what’s wrong, and you repeat. This works beautifully for small projects and breaks down predictably as complexity grows. Past a certain size — and “certain” is smaller than you might think — agents start to hallucinate. They rewrite working code. They delete things that mattered. The cost and failure rate of every incremental change rises until the loop becomes useless.

The problem isn’t model capability. It’s context architecture. Agents fail at large systems because they’re handed too much at once and have no mechanism to determine what’s safe to ignore. So I’ve been tinkering with — and slowly working on — a framework that tries to fix this from the other direction: not by making agents smarter, but by giving them a structured environment where the right context is always within reach and the wrong context is provably out of scope.

I’m calling it a fractal framework, and I’ll explain what I mean by that in a moment. The short version: at any zoom level, an agent should be able to do its job by reading only what’s locally in scope, without having to load the rest of the system.

This post walks through what I’ve got half a handle on so far — specifically, the first few stages that turn a vague feature idea into validated wireframes and a coherent set of contracts. The whole system is built end to end, but “built” is doing a lot of work in that sentence; most of it still needs refinement, and some parts I understand better than others. The front end is where I’ve been spending the most time, both because that’s where the human-facing value lives and because it’s the part I currently have the firmest grasp on.

Nothing New Under the Sun

I want to be honest about the lineage. The core idea here isn’t mine. It isn’t even recent.

In October 1964, Douglas McIlroy wrote a three-page internal memo at Bell Labs proposing that programs should connect “like garden hose” — the output of one flowing into the input of the next, in any combination, indefinitely. Each program should do one thing, do it well, accept a text stream as input, produce a text stream as output. The memo circulated. It didn’t immediately ignite anything. McIlroy spent the next nine years bringing it up at every opportunity until, one day in 1973, Ken Thompson said “I’m going to do it” and implemented Unix pipes overnight. The next morning, as McIlroy later remembered it, “we had this orgy of one-liners.”

It’s worth noting that the elegance of those early Unix programs wasn’t purely aesthetic. The hardware they ran on demanded it. The PDP-11 that hosted Unix had a 16-bit address space — 64 kilobytes per program, and that was it. Disks were slow, RAM was scarce, and a process that wanted to do too much would either run out of memory or starve everything else on the machine. “Do one thing well” was a survival strategy as much as a philosophy. Composition through pipes was how you got useful behavior out of a system whose individual components couldn’t afford to be ambitious.

For most of the last fifty years, those constraints have steadily relaxed. Memory is effectively free. Address spaces are vast. A modern application can hold a million times what a 1973 one could and barely notice. The discipline of small-and-composable mostly persisted as good engineering practice, but the necessity of it largely went away. Programs got bigger because they could. Codebases sprawled because nothing was forcing them not to.

What’s interesting about the present moment is that the necessity is back, just in a different form. AI agents don’t run out of RAM — they run out of context. Their reasoning degrades long before their token budget does. The unit of work they can hold in working memory and still produce reliable output might be larger than the unit of work humans can comfortably reason about, but their ability to hold context over a conversation is not, and if you don’t pay attention to that and how you use it, it gets expensive fast. For the first time in fifty years, we genuinely benefit from optimizing for small, focused, composable units — not because the hardware demands it, but because the cognition layered on top of the hardware does. It’s the same constraint shape pointing at the same answer, fifty years later.

What pipes seemed to prove is that if you give components clean, narrow interfaces and a uniform way to compose, complexity stops being a function of how big the system is and starts being more about how big the current composition is. You can build sizeable systems out of small, comprehensible parts because at any moment you only have to reason about the parts currently in the pipe.

That looks a lot like the problem AI agents have at scale, which is why I’ve been borrowing from the Unix model. The framework here is essentially McIlroy’s idea applied to a different substrate: contracts connected by dependency edges instead of programs connected by text streams; a network of agents composing modules instead of one shell session composing tools. Whether the analogy holds well enough to deliver the same ergonomic benefits is the open question. I’m hopeful but not certain.

The novelty here, such as it is, is in the substrate — AI-generated and AI-traversed components — and in working out what additional machinery (stability tiers, contracts as first-class artifacts, feedback loops at different cadences) might be needed to keep something like the pipe metaphor working when the pieces are no longer bytes.

Just Another Form of Harness

There’s a useful framing that’s emerged over the past several months in the AI tooling community, and I think it’s worth touching on. The recognized differentiator in coding agents right now isn’t the model — it’s the harness. The scaffolding around the model. The tool definitions, the context policies, the feedback loops, the recovery paths, the way state is preserved between sessions. Birgitta Böckeler’s framing has stuck: Agent = Model + Harness. Anthropic, OpenAI, and Thoughtworks have all published variants of the same observation: frontier model capability has converged within a point or two, and the interesting engineering — the work that actually moves benchmarks and production reliability — happens in the scaffolding.

The evidence is fairly compelling. Same model, different harness, twenty- to thirty-point swings on coding benchmarks. A cheaper model with a better harness beating a flagship model running on its vendor’s stock framework. The people getting the most out of agentic coding are the ones who treat the surrounding system as a first-class engineering artifact.

What I’ve been working on is, in that vocabulary, just another form of harness. The contracts, the stability tiers, the agents-with-roles, the feedback loops at different cadences, the context.md files — all of that is scaffolding around the model. Whether this works or not is an open question, but it’s a question I intend to answer, at least for my own use case.

What “Fractal” Actually Means Here

I’ve been loosley calling this a fractal framework, and I should explain what I mean by that.

But before that, an experience most working developers will recognize:

You join a new team, or get assigned to a legacy project, or open a repository you cloned six months ago and forgot about. The codebase is north of a million lines. There are hundreds of directories nested four, five, six levels deep. The README, if there is one, was last updated three years ago and describes a system that no longer exists. You open src/, see thirty subdirectories, pick one that looks relevant, open it, see twenty more, open one of those, and stare at a file called helpers.ts that imports from twelve places you haven’t seen yet.

What am I looking at?

That question — and the cascade of related questions that arrive with it: why is this here, who depends on it, what happens if I change it, was there a reason it’s structured this way, who would I ask if I could ask anyone — is the universal experience of encountering an unfamiliar codebase. Sometimes you can answer it by reading code. More often you can’t, because the code tells you what it does but not why it exists or how it fits. The why and the how live in people’s heads, in old chat logs, in tickets nobody can find anymore. By the time you’ve reconstructed enough context to make a confident change, you’ve spent two weeks that should have taken twenty minutes.

The context.md files in this framework are a direct answer to that question. Every directory has one. Open the directory, read the file, and you get: what this module is, what it’s responsible for, what its public interface is, what invariants it must maintain, why it exists in this form, what it depends on, what depends on it, and what the recent changes were either directly in the file or in the source control revision history. Not “the code, but in prose” (in English, or whatever spoken language your team works in) — the answer to what am I looking at. The questions that pop up the moment you land in unfamiliar territory, answered before you have to ask them.

The same answer at every level. The root context.md tells you what the whole system is. A subsystem’s context.md tells you what that subsystem is. A module’s context.md tells you what that module is. Each one stays at its scale: the root doesn’t try to summarize every module; the module doesn’t try to redescribe the system. They compose vertically, the way an outline does, but with each level standing alone as a complete answer at its scale.

That’s part of the felt experience the framework is trying to resolve…

And it’s not just for code written by AI agents. It’s the same problem for code written by anyone who didn’t document their code adequately… And that’s not even a knock on the developers who wrote the code. Almost nobody foresaw the need to decomose existing software in a way that machines could reliably understand. It just wasn’t ever really a considration. Now it is.

A fractal, in essence is largely scale invariant. In other words at any given zoom level it is recognizable as the same thing (This is a bit of a generalization, but I’m going to stick with it’s close enough and it helps the narrative)

That property — coherent at every scale, with consistent structure between scales — is what I’m trying to build into the framework. Concretely, it means three things:

1. The same artifacts exist at every level. Whether you’re looking at the whole system, a subsystem, a module, or a sub-module, you find the same kinds of things in context.md : references to contracts describing what’s there, a set of dependencies, a set of invariants, a rationale for why it’s shaped the way it is, and a changelog of how it got that way. The system doesn’t have one set of artifacts at the architecture level and a different set at the code level. It has the same artifacts, repeated at different scales.

2. The same operations work at every level. Wherever you are in the structure, the operations you can perform are the same. You can read the contract. You can examine the dependencies. You can propose an amendment. You can trace why something is the way it is. The Architect, Moderator, Implementer, and review agents all use the same vocabulary whether they’re working at the system level or inside a single function. There’s no separate “architecture review” process and “code review” process — there’s one process, applied recursively.

3. The view at any level is bounded and self-contained. This is the property that matters most for AI agents. When an agent works at zoom level N, it should be able to complete its task by reading what’s at level N — the local contract, the immediate dependencies, the local rationale — without having to load levels above or below it. The contract at each level summarizes what’s underneath, so an agent at zoom 1 doesn’t need to read the zoom-2 details or its dependencies. It reads their contracts. The contract is the API to the next zoom level.

One of the main reasons AI agents fail at large codebases is that they get handed too much context — the whole module, plus its callers, plus its dependencies, plus tests, plus configuration — and have to decide what’s relevant. Fractal structure inverts that: at any zoom level, the relevant context is finite and local. Out-of-scope context is genuinely out of scope, not just inconveniently far away.

The context.md mechanism described above is what enforces this for agents specifically. An agent that opens a directory reads the local context.md first; if that doesn’t provide enough context, the rule is to move up one level and read the parent’s, not to expand the search arbitrarily. The file system is the zoom hierarchy, and the rule against searching wider rather than higher is what keeps an agent’s working context bounded.

For this to work, the contracts have to be strong enough that an agent at one level genuinely doesn’t need to know what’s happening at the level below. If a contract leaks — if there’s behavioral coupling that isn’t captured in the interface, or implicit ordering dependencies between modules, or shared mutable state — then zoom breaks. The agent at level N needs level N+1 context to be safe, and the whole bounded-view property collapses. So a lot of the framework’s machinery is specifically about catching and preventing those leaks: the Encapsulation Auditor checks that imports don’t cross layers, the Coupling Analyst tracks how interconnected modules are becoming, the Drift Detector watches for behavioral coupling that hasn’t made it into contracts.

The honest version of all this is: I think fractal structure is the right model, but I don’t yet have proof. Building strong-enough contracts is hard, and “strong enough” is itself a property I’m still trying to characterize precisely. What I have so far is a structural intent, some machinery to enforce it, and a working hypothesis that the combination will hold up better than the current approach of just handing more context to agents and hoping.

The Core Idea: Contracts as the Unit of Truth

Before describing the workflow, the central design decision: the contract is the unit of truth, not the code.

A contract is a JSON document that describes a module’s observable behavior — its interface, its invariants, its dependencies, and the rationale for why it exists in the form it does. Code is a compiled output of the contract plus implementation choices. Agents read contracts; they only write code when satisfying a contract.

Every invariant in a contract carries a stability tier:

  • Immutable — near-permanent guarantees. Authentication boundaries, data durability, financial correctness. Changing these is a system redesign.
  • Behavioral — can change, but requires explicit ceremony: consumer impact analysis, an amendment proposal, a moderator’s approval.
  • Implementation — change freely. No ceremony.

The intent of the three-tier system is to let the framework be both rigid and flexible — protecting the things that should be protected while letting other things move fast. Whether the boundaries land in the right places is something I expect to learn the hard way.

With that in place, here’s the workflow.

Phase 1: Project Setup

You create a project and designate a root directory. This is where contracts live, where the changelog accumulates, where wireframes are stored, and where eventually the implementation goes. Everything is files in version control. There’s no separate database to keep in sync — the file system is the source of truth, and git diff is the natural review unit.

Phase 2: Feature Capture

Every feature starts with a description. You type a paragraph or two about what you want. From there, three AI agents work in sequence — each conversational, each focused on one job.

The Scope Elicitor

The first agent’s job is to interview you. It asks questions one at a time, working through the problem, the users, the happy path, the edge cases, and the constraints. It only stops when it can write at least three distinct user stories, each with at least two testable acceptance criteria.

The “one question at a time” rule matters. The temptation when designing this kind of agent is to have it ask everything at once and let the user batch their answer. In practice, that produces vague, partial responses. A focused question gets a focused answer, and focused answers compose into a usable specification.

When you (or the agent) decide enough has been captured, you click Generate Requirements. The agent synthesizes the conversation into a structured JSON document — every requirement gets an ID, a user story, and acceptance criteria.

The UI Inquisitor

Control passes to the second agent. Same conversational pattern, different focus: screens, flows, states. What does the user see? What happens on submit? What about empty states, loading states, error states? Navigation? Key elements?

Again, the agent stops when there’s enough — at least two screens with their states defined, plus one end-to-end flow. You click Generate UI Brief, and the structured output is appended to the requirements document.

By this point, you have a single JSON file that captures both what the feature does (requirements) and how it presents (UI brief). It’s anchored in your conversation, so it reflects what you actually meant rather than what an agent guessed you meant.

Phase 3: Autonomous Validation

Two agents now run without conversation.

The Requirements Validator checks the document for consistency, completeness, testability, and ambiguity. It doesn’t block — it produces a report. A “fail” verdict is advisory, signal for the human, not a hard stop. The output is saved for later review by the Architect.

The Routing Analyst does something more interesting. It reads the requirements and the UI brief, examines the existing contract list, and proposes a set of contracts that need to be written or amended. Each proposed contract maps to one or two captured requirements and explicitly references items from the Scope Elicitor or UI Inquisitor’s output.

It means every contract has a traceable origin: requirement R-12 produced contract api.invoice.create, which exists because of UI screen S-3 captured in the brief. If a requirement later changes, the system should know which contracts need re-evaluation. If a contract is later questioned (“why does this exist?”), the answer should be in its rationale field, traceable back to a specific user need.

Phase 4: Architect Review

The third conversational agent is the Architect. Where the first two agents extracted information from you, the Architect now feeds information into the system. This is where you specify the technical context: target platform, language version, tech stack preferences, deployment constraints. These get attached to the contracts the Routing Analyst proposed.

The Architect also does the harder work of evaluating each proposed contract: is the boundary in the right place? Is the stability tier correct? Are the dependencies declared accurately? When the Architect is satisfied, the contracts are saved and become the authoritative description of what the system will do.

Phase 5: Wireframe Generation and the Gap Loop

An autonomous Wireframe Agent reads the contracts, the requirements, and the UI brief, and produces wireframes for every screen identified.

Then it does a gap analysis.

The Wireframe Agent compares what the wireframes need against what the contracts provide. If a wireframe shows a field that no contract supplies — say, a list view that needs lastModifiedBy but no contract returns it — that’s a gap. Two agents are invoked to handle it:

The Architect Agent drafts a proposed contract or amendment based on what the Wireframe Agent identified.

The Moderator Agent validates that the proposed contract stays within its declared stability tier. The Moderator can reject — it cannot fix. If the Architect’s proposal crosses a tier boundary, it’s escalated rather than silently accepted. This friction is intentional. Bypassable escalations defeat the purpose of stability tiers.

Once amendments are made, the Wireframe Agent runs again. If new gaps emerge, the loop continues. If no new contracts are required, two quality gates evaluate the result:

  • W1 confirms every UI brief screen has a wireframe and every necessary state is covered.
  • W2 confirms every field declared in the wireframes is captured and accounted for in the contracts.

The process iterates until both gates pass and the wireframes and contracts agree.

Why This Works

The deeper structural reason this works is that everything in the system has a single source of truth, and every artifact knows where it came from.

Requirements come from a real conversation. Contracts come from requirements. Wireframes come from contracts plus the UI brief. Gaps in wireframes feed back into contract amendments through a controlled loop. At every step, an agent works at one zoom level — it doesn’t need to hold the whole system in its head, because the whole system is structured so that the relevant context is always at the agent’s zoom level or one above.

The stability tiers are meant to keep this from drifting toward entropy over time. Without them, every change is equally cheap and the system has no resistance to bad changes; with them, in theory, the right things resist change and the right things flow easily. Whether that theory survives contact with a real long-running project is the thing I’m trying to find out.

What Comes Next

What I’ve described is the front end — turning an idea into a validated set of contracts and wireframes. Downstream of this, the same pattern continues: API contracts get derived and validated against functional UI mockups, mock APIs get built, the real implementation kicks off through a similar agent ensemble with quality gates at each transition. Four feedback loops at different cadences keep the system honest as it grows.

It’s a lot of structure. The honest tradeoff is that the framework’s value depends on the interaction of many coordinated pieces. Build only some of them, and you get less than the proportional benefit. Build all of them, and you get something that can actually maintain a software system of arbitrary complexity with constant feature changes — which is the original problem I was trying to solve.

One footnote worth raising: by its nature, this approach should lend itself to backporting onto existing codebases, not just greenfield work. The structure is mostly about making implicit things explicit — the contracts the code already obeys (whether documented or not), the dependencies between modules (whether the import graph reflects them or not), the rationale for why things are shaped the way they are (whether anyone can still remember it or not). An agent can scan an existing codebase, propose draft contracts and context.md files based on what it observes, and flag the gaps where a human needs to fill in the why. The fractal structure either holds or it doesn’t — if the codebase has clean boundaries, the contracts will fall out naturally; if it doesn’t, the backport process will (in theory) surface exactly where the boundaries are missing or wrong, which is itself useful information. I haven’t pushed hard on this in practice yet, but the principle feels like it ought to transfer. A team that adopts this on a legacy system would pay the rationale-recovery tax once, and from then on the framework’s machinery would apply the same as it does on a new project.

A related meta-benefit: nothing about this requires a particular team shape. The multi-agent flow can run before human work (producing contracts and scaffolding for humans to implement), after it (formalizing what humans built so the system can maintain it going forward), or alongside it (humans and agents working in parallel on different contracts, meeting at the integration boundaries). A solo developer could use the whole pipeline as autonomous augmentation. A small team could use the loops as scaffolding for their existing process. A large team could let agents handle whole subsystems while humans focus on the architecture-level work — or invert that, letting agents handle the architecture exploration while humans own the implementation. The leverage point is where the team chooses to spend its attention, not what the framework requires. I don’t pretend to have all the levers and knobs worked out yet — figuring out the right configuration interface for “fit this to my team” is itself an open design problem — but the underlying structure is intentionally agnostic about who does which step. That feels right to me. A framework that only works if you reorganize your team to fit it isn’t actually a framework; it’s more like a takeover.

The early signs are promising. The conversational front end seems to produce tighter requirements than what I would write on my own, or than what I have seen AI produce without a guide. The contract-driven gap loop has caught a few missing fields that I think would have become bugs. And because every artifact is traceable and rationale’d, the system can at least begin to answer the question every legacy codebase eventually fails: why does this look the way it does?

I don’t know yet whether this actually works at the scale the framework is designed for. I haven’t run a real long-lived project through it. The pieces I’ve built compose cleanly in the small, but small is the easy case. What I can say is that the lineage seems sound. Pipes worked because composing small, well-bounded parts through clean interfaces worked. That’s encouraging. If it doesn’t work for this (or if I don’t know what I’m doing), I’ll find out.

I think that’s enough for a Friday evening. Talk to you all soon.