Newsletter

Fences, Agents, and the Edge: Guardrails for AI-Built Software

April 21, 2026

This week's through-line: the gap between what AI coding agents can do and what you should actually let them do — and the operational discipline that sits underneath either answer. We start with a 118-year-old rule about fences, walk through what Kimi K2.6 and agentic retrieval are actually delivering (and costing), and end with the unglamorous Cloudflare Workers config details that decide whether your agent-written code survives contact with production.

Chesterton's Fence Was Written for AI Refactors

The core rule — "Do not remove a fence until you know why it was put up" — was coined in 1929, but Farnam Street's exposition (September 2022) reads like a warning label for agentic coding tools. The essay quotes Will and Ariel Durant: of every 100 new ideas, 99 are probably inferior to the traditional responses they replace, because no single mind can in one lifetime outweigh centuries of accumulated practice. Swap "centuries" for "eight years of production bug fixes" and you have a description of most legacy codebases.

Chesterton's lamp-post parable — in which reformers tear down a shared light because no one can articulate its purpose, then fight a "war in the night" once it's gone — maps almost too neatly onto a confident AI agent stripping "unnecessary" null checks, retry loops, or seemingly redundant abstractions. Those constructs typically encode a bug report, a regulatory constraint, or a subtle concurrency edge case that will re-emerge the moment they disappear. The agent, by design, has no institutional memory of any of it.

The framing gets sharper when you pair it with Steve McConnell's claim that managing complexity is software's primary technical imperative. Dijkstra's observation that no human skull is big enough to hold a modern program means the job of design is to reduce how much of the program you have to think about at once — via decomposition into independent subsystems. Agentic coding pushes the opposite direction: it tends toward Hoare's second option, code "so complicated there are no obvious deficiencies," because the agent can hold (briefly) more context than a human and has no incentive to collapse complexity. Additional info: Code Complete, 5.2 Key Design Concepts (date unavailable).

The practical takeaway isn't "don't use agents." It's that the guardrails — coding standards, architectural invariants, the list of things the agent is not allowed to touch without human sign-off — now have to be written down explicitly, because the fastest contributor on the team no longer reads the commit history.

Kimi K2.6, Agentic Retrieval, and Where the Dollars Actually Go

Moonshot's new open-source model, Kimi K2.6, is being pitched explicitly at long-horizon agentic coding — the regime where most tools break. The headline demos are striking: a 12-hour, 4,000+ tool-call run that ported Qwen3.5-0.8B inference to Zig and landed at ~193 tok/s (20% faster than LM Studio), and a 13-hour autonomous refactor of an 8-year-old financial matching engine that modified 4,000+ lines and pushed throughput up 133–185%. Enterprise partners (Augment, OpenCode, Ollama) report +50% on Next.js benchmarks and 96.6% tool-call success. Note the caveat: the wins are greenfield ports and performance work, not stewardship of constraint-heavy legacy systems where a "successful" refactor can still silently violate an invariant. Additional info: Kimi K2.6 launch post (date unavailable).

The structural reason these runs work at all is a move away from classical RAG. Augment Code's walkthrough of agentic retrieval (August 2025) argues traditional one-shot RAG has four fatal gaps for real codebases: one-shot search, no relational understanding, no completeness detection, and no intra-conversation learning. Their replacement is an Observation–Reasoning–Action loop with specialized agents — planner, researcher, synthesizer, evaluator — decomposing a query like "Node.js connection pooling in microservices with monitoring and graceful degradation" across parallel domain specialists, with an evaluator agent deciding whether to ship the answer or iterate. That planning-and-evaluation layer is, in effect, the Chesterton's Fence of retrieval: it exists to surface why the code is the way it is before the agent touches it.

But the bill arrives in tokens. A March 2026 cost breakdown puts it starkly: Andreessen Horowitz data shows 40%+ of enterprise AI pilots cost more than double their original estimates, and the culprit is almost never raw model pricing — it's architecture. Agent workflows multiply tokens 4–7x over single-shot prompts via long system preambles (500–2,000 tokens baseline), tool-call loops, RAG retrieval per query, and retries that re-run entire requests. A support agent at 10k tickets/day works out to roughly 20M tokens/day, or $12k–$24k/month on GPT-4 Turbo for a single workflow. Claude 3 Opus at $0.015/$0.075 per 1K tokens gets punishing fast on output-heavy jobs.

The sleeper line item: OpenAI's Assistants API charges roughly $0.20/GB/day for thread and file storage — about $1,000/day at 500 concurrent agents holding 10 MB each. The economic rule that falls out of the piece is simple. GPT-4-class models pay for themselves in legal, finance, and health workloads where a single wrong answer costs more than a million right ones. Everywhere else — high volume, repetitive tool-calling, strong desire for on-prem control — open-source models like K2.6, Llama 3, or Mistral start looking less like a compromise and more like the default.

Cloudflare Workers: The Config Details That Actually Bite

Start in the browser: the Workers Playground (March 2026) is a zero-auth sandbox running the same editor as the authenticated dashboard, with a live HTTP tab for raw POST testing, a console-log viewer, and non-expiring "Copy Link" permalinks that open the exact code and preview for any recipient — useful for repro cases and code review. One gotcha: the log viewer currently can't stringify class instances like request.url directly. When you outgrow it, wrangler init --from-dash clones a deployed Worker into a local project.

In production, the Wrangler config file is the source of truth, not the dashboard. The configuration reference (April 2026) documents required top-level keys (name, main, compatibility_date) and a beta auto-provisioning feature that will create KV, R2, and D1 resources on first deploy if bindings omit IDs, then write the IDs back into your config. Named environments under [env.<name>] deploy as separate Workers — most keys inherit, but bindings (vars, kv_namespaces) must be redeclared per environment, and as the best-practices guide (April 2026) warns, the unsuffixed root Worker is a separate deployment — always pass --env or risk accidentally publishing it.

Routing is the single most common source of production confusion. Custom Domains (January 2026) attach a Worker as the origin for an entire hostname, with Cloudflare managing DNS and certificates; use them when the Worker is the origin. Routes are for when a Worker sits in front of an existing origin server, and they require a proxied DNS record — miss that and you get ERR_NAME_NOT_RESOLVED. The documented workaround when there's no real origin is a proxied AAAA record pointing at 100::, but the modern recommendation is just to use Custom Domains. Another subtle win: same-zone Worker-to-Worker fetch() across Routes needs a service binding, but a fetch to a sibling Worker on a Custom Domain works directly.

A few production hygiene items that catch even experienced teams:

Compatibility dates: the docs (March 2026) note that a Worker uploaded via API without an explicit date defaults to 2021-11-02 — before any flags took effect. Always set it to today's date on new Workers, and pair with nodejs_compat if you need node:crypto or node:buffer.
Types: never hand-write the Env interface. Run wrangler types so bindings fail at compile time, and rerun on every binding change.
Secrets: use wrangler secret put (or a gitignored .dev.vars/.env locally); the secrets reference (April 2026) also documents wrangler deploy --secrets-file .env.production for uploading secrets alongside code in CI — secrets not in the file are preserved from the previous version.
The 128 MB memory ceiling: streaming request/response bodies is mandatory on large payloads; enforce a max body size before calling request.arrayBuffer() or request.text(), which buffer the whole thing.

Fences, Agents, and the Edge: Guardrails for AI-Built Software

Chesterton's Fence Was Written for AI Refactors

Further Reading

Kimi K2.6, Agentic Retrieval, and Where the Dollars Actually Go

Further Reading

Cloudflare Workers: The Config Details That Actually Bite

Further Reading