STRATEGY· 9 MIN READ· JUN 22, 2026

Your AI Agent Is Only 10% Model. The Other 90% Is Why It Fails.

Most operators shop for a better model. The model is roughly 10% of what makes an agent work. The harness is the other 90%. Here's what that looks like in plain English.

Carlynn Espinoza
AI MARKETING STRATEGIST
Your AI Agent Is Only 10% Model. The Other 90% Is Why It Fails.

Every vendor pitching you an AI agent right now is selling you the same thing: a shinier model. GPT-4o. Claude 3.5. Gemini 1.5 Pro. Pick the right brain, the pitch goes, and the agent works. That framing is wrong, and it's why your last three pilots stalled.

The framework Google and Kaggle put on stage in their AI Agents course is blunt about it: an agent is roughly 10% model and 90% harness. The model is the reasoning engine. The harness is everything wrapped around it. the instructions, the knowledge it can reach, the memory it carries, the examples it copies, the tools it can use, and the guardrails that keep it from doing the wrong thing fast. Most operators are shopping hard for the 10% and ignoring the 90%.

The full conference talk is below. worth thirty minutes if you want the engineering detail. What follows is the operator translation: what the harness actually is, why context rot kills agents in production, and how to stop measuring the lumber instead of the sawmill.

The full talk this post is built on — the agent = model + harness framework, in the speakers' own words.
(01)

The equation vendors skip

Here is the equation: agent = model + harness. The model generates text. The harness decides what text to generate, with what information, within what limits, using what tools. Swap the model from GPT-4o to Claude and the agent behaves roughly the same if the harness is good. Leave the harness weak and no model upgrade fixes it.

Think of it like a franchise kitchen. The recipe. the model. is nearly identical at every McDonald's location. What separates a smooth location from a chaotic one is the system: the training, the station layout, the quality checks, the escalation path when something goes wrong. Two locations, same recipe, wildly different outcomes. The recipe is the 10%. The system is the 90%.

This matters because the vendor conversation almost always starts with model selection. Which platform, which API, which provider. That is the wrong first question. The right first question is: what is the harness, and who builds it? Most operators never hear that question from the vendor selling them the agent.

(02)

Six things that keep it on the rails

The harness has six components. Each one is a place an agent can fail quietly. Here they are in plain language.

Instructions

Instructions are the spec the agent wakes up with on every run. Its persona, its scope, its tone, what it is and is not allowed to do. A home services company that deploys a scheduling agent without tight instructions ends up with an agent that cheerfully books appointments outside service area, quotes jobs it shouldn't quote, and apologizes in a voice that sounds nothing like the brand. Instructions are not a system prompt you write once and forget. They are a living document the team maintains.

Knowledge

Knowledge is the durable reference material the agent can reach. Your service menu. Your pricing. Your brand voice guide. Your standard scripts. A content agent with no knowledge base hallucinates offer details. A patient-intake agent at a multi-location clinic that can't reach the current fee schedule will quote fees from six months ago. Knowledge is not stuffed into the prompt. it lives in a retrieval layer the agent pulls from on demand.

Memory

Memory is what the agent carries across runs versus what it relearns fresh each time. Without memory, every run is a cold start. The agent has no record of what it tried last week, what the client approved, what copy performed, what tone got flagged. A review-response agent with no memory writes the same generic reply to the same recurring complaint for six months straight. Memory stops that loop.

Examples

Examples are the reference patterns the agent copies. A few strong outputs, a few documented failures. This is what practitioners call few-shot prompting, and it is the fastest way to close the gap between what the model produces by default and what your business actually needs. A proposal-writing agent given three real approved proposals and two rejected ones will outperform the same agent with a two-page instruction document, every time.

Tools

Tools are the specific verbs the agent can execute. Send an email. Log a row in a spreadsheet. Pull a report from GA4. Post to Google Business Profile. Update a CRM record. The model can reason about anything; it can only act on what the harness has wired up. An agent with no tools is an agent that gives you a plan and then stops. Tools are where reasoning becomes work.

Guardrails

Guardrails are the interceptors that stop the agent before it does damage. Output filters, confidence thresholds, human-in-the-loop checkpoints, hard blocks on actions above a certain dollar threshold or scope. A lead-nurture agent with no guardrails can send the wrong offer to a prospect who just signed. A budget-pacing agent with no guardrail can pause a campaign that is actually performing fine because the signal looked like an anomaly. Guardrails are not optional. They are the difference between an agent you can trust and one you babysit.

The model is the brain. The harness is the job description, the file cabinet, the memory, the tools, and the manager watching over its shoulder.
(03)

Context rot and the backpack fix

Here is the failure mode nobody mentions in the sales demo: the more you pile into an agent's context window, the worse its reasoning gets. Past a certain point, adding information hurts performance. The agent loses focus in the noise. Researchers call this context rot, and it is the quiet reason production agents underperform the demo.

The fix is dynamic context. Instead of loading everything the agent might need into every run, you give it a lightweight menu of skills and it pulls only what it needs for the task in front of it. The Google and Kaggle course uses a useful analogy: a tradesperson's backpack. You don't carry every tool you own to every job. You load for the job, use what you need, and put it back.

Here is a concrete version. A content agent that writes blog posts, checks GA4 performance, and updates GBP listings should not run all three skill sets simultaneously. When it is writing, it loads the brand-voice skill and the editorial knowledge base. When it is pulling GA4 data, it loads the analytics tool and the reporting template. When it is updating GBP, it loads the location data and the posting guardrails. Each run loads what it needs and nothing else. That is what makes the agent reliable at scale rather than impressive in a controlled test.

(04)

Stop measuring the lumber

Most operators measure agent output by volume. Posts published this month. Leads followed up. Emails sent. Reports generated. That is measuring the lumber leaving the sawmill. The real asset is the sawmill.

The shift the Google and Kaggle course pushes hard on is this: the system you build is the output. The blog posts, the reports, the follow-up sequences are artifacts the factory ships. They have value. But the compounding value is the factory itself. the harness, the workflows, the guardrails, the memory. operating reliably without you in every loop.

Operators who build real agent systems stop being floor workers and become plant managers. The job is no longer writing the post or pulling the report. The job is architecture and quality control. Designing the system, setting the standards, and checking the output at the right checkpoints. That is a fundamentally different use of a marketing director's time, and a much more valuable one.

The operators who miss this are the ones who run a pilot, see the agent produce 40 posts in a week, declare success, and then wonder why nothing changed in revenue three months later. Volume is not the metric. Reliable, on-brand, conversion-producing output from a system that runs without constant supervision. that is the metric. The workflow architecture post we published earlier this year covers how the task dependencies have to change to make that happen.

(05)

Conductor vs orchestrator

There are two ways to work with agents. Most operators need to understand both before they get sold the more complicated one.

Conductor mode is one agent, hands-on direction, real-time. You define the goal, the agent works through it, you check in at the right moments. Good for debugging. Good for any workflow where a single thread of reasoning covers the work. A clinic owner running a patient-reactivation sequence through a single agent that drafts, personalizes, and queues messages is in conductor mode. One agent. One clear job. Easier to monitor, easier to fix when something breaks.

Orchestrator mode is multiple agents handing work to each other. Agent A does the research, passes it to Agent B for drafting, which passes it to Agent C for scheduling. This pattern is genuinely useful when each step requires fundamentally different capabilities or context. It is genuinely dangerous when the reason for splitting is 'it felt cleaner' or 'two LLM calls seemed more powerful than one.' Premature orchestration is the distributed systems problem of AI agents. When something breaks, and it will, you are debugging a pipeline instead of a prompt.

For most service businesses running their first or second real agent workflow, conductor mode is the right answer. Get one agent doing one job reliably. Then, and only then, consider whether another agent needs to hand it something it can't produce itself.

  • Conductor: one agent, one workflow, clear checkpoints. Start here.
  • Orchestrator: multi-agent pipeline. Only when the steps genuinely require different tools or context windows.
  • The wrong move: splitting a single workflow across two agents because it feels more 'agentic.' It is not. It is just harder to debug.
(06)

The build that compounds

Bolt-on AI is the self-checkout kiosk at CVS: looks automated, still requires a person standing there, doesn't change the underlying economics. A real agent harness is the Costco warehouse operation: the system does the work, the humans manage the system, and the output compounds because the infrastructure is sound.

The honest take from the Google and Kaggle course is that building the harness is the hard part. Not technically hard in the 'you need a PhD' sense. Hard in the 'this requires judgment, iteration, and someone who knows your business well enough to write the right instructions and guardrails' sense. That is where most pilots fail. Not the model. The harness.

If you are at the stage where the model is fine but the system keeps falling over, that is exactly what our Build Your Own AI System service addresses. We build the full harness, hand the keys over, and train your team to run it. The point is not dependency. it is a working factory you own.

The operators who will be in a meaningfully different position twelve months from now are the ones who stopped upgrading their model and started building their system. The bolt-on vs operator AI divide is not closing. It is widening. And the gap lives almost entirely in the 90% most vendors never mention.

● READY WHEN YOU ARE
Talk to a senior strategist. We’ll tell you honestly which AI setup fits your team, no decks, no boilerplate.
Book a call
END OF PIECE · TAKE IT WITH YOU
KEEP READING

Three more from the journal.

▸ READY WHEN YOU ARE

Talk to a senior strategist about your next move.

We will tell you honestly which AI setup fits your team. No decks, no boilerplate.