Teaching Your AI Agent to Learn From Its Mistakes

Your AI coding agent makes the same mistakes over and over. What if it could learn from corrections, track which skills cause failures, and tell you whether it already fixed the problem? I built a closed-loop learning system for my coding agent, inspired by a meta-learning paper, and here's how it works.

Here’s something nobody talks about with AI coding agents: they don’t remember being wrong.

You spend ten minutes correcting your agent’s behavior. It apologizes, adapts, gets it right for the rest of the session. Then the session ends. Tomorrow, same mistake. Same correction. Same apology. You are the only memory in this system, and you’re doing the work for free.

I ran into a paper called MetaClaw that frames this as a solvable problem. The paper proposes a continual meta-learning system: capture failure trajectories, synthesize new behavioral skills, fine-tune the model during idle periods. Production-grade stuff for thousand-user systems with cloud LoRA and gradient-based updates.

I don’t need any of that. I’m one person on a MacBook. But the insight is right: capture failures, attribute them to their causes, close the loop.

So I built the loop.

The system I already had

Before the paper, I’d built the first half. A hook fires after every coding agent session, feeds the conversation transcript to a model, and extracts learnable patterns. Errors and how I fixed them. Tool selection heuristics I discovered. Project conventions the agent kept forgetting.

Proposals pile up in a pending directory. When I have time, I review them. Keep, promote, or discard. The system captures. I curate.

That’s MetaClaw’s “skill-driven fast adaptation” with a human gate on promotion. And that gate matters. Automated skill promotion sounds efficient until your skill library hits 200 entries of auto-generated noise and your agent spends more tokens reading irrelevant instructions than doing work.

The curation is the product.

What was missing: blame

Capture worked. Curation worked. What didn’t work was knowing who to blame.

When I correct the agent and the system logs it, I get: “the agent did something wrong, you fixed it.” But I don’t know why it went wrong. Was it following my deployment skill when it forgot to configure the API key? Or did it just wing it? The fix is completely different. Skill failure means rewrite the skill. No skill involved means maybe write one.

Without attribution, every correction looks the same. That’s like a hospital that tracks patient outcomes but never checks which doctor performed the surgery.

Two-pass attribution

Most coding agents store session transcripts as structured logs (JSONL in my case). Every message in order, with a type, content, and index. Skill invocations show up as tool-use blocks. My corrections show up as user messages with telltale patterns: “no,” “wrong,” “actually,” “instead.”

Pass one is mechanical. Walk the transcript. Record every skill invocation and every correction with their message indices. For each correction, find the most recent skill invoked before it.

Simple. Also wrong, surprisingly often.

In one session, I invoked a plugin sync skill at message 1318, then did 134 messages of unrelated work. When I corrected the agent’s confidence scoring at message 1452, the mechanical pass blamed the sync skill. 134 messages of complete topic change, and it didn’t notice.

Pass two is contextual. A stronger model reads the actual conversation around each correction and asks one question: was the agent following this skill’s guidance when it made the mistake? If the topic drifted, override to “none.” If the skill was actively guiding behavior, keep it.

The mechanical pass gives you speed. The contextual pass gives you accuracy. Together, they give you blame you can act on.

See the signal, not the noise

The first version fed the entire conversation to the evaluating model. Long sessions hit 6000+ messages. Truncating to the first 500 lines meant hoping the interesting parts fell in that window.

They never did. Skills invoked at message 1200 and corrections at message 2000 were invisible.

So I stopped truncating and started targeting. Five messages before and after each skill invocation. Five messages before and after each correction. First 50 and last 50 conversational messages for context. A 6000-message session with 5 skills and 2 corrections sends 170 focused messages. Half the data, all the signal.

The timestamp trap

I’m often iterating on skills in the same session where failures happen. The hook fires after the session ends, captures the correction, and flags it. But I’ve already fixed the skill.

The obvious check: was the skill file modified after the correction? In one case, a deploy skill was modified the day after a correction, and the system said “possibly resolved.” The commit was about cluster-admin detection. The API key issue? Still wide open.

Timestamps tell you when a file changed. They don’t tell you what changed.

So the system reads the current skill file and compares it against the specific correction. My correction said “configure the API key via the internal auth store.” The skill has API key setup via environment variables at line 144 and a secret mount at line 277. But nothing about OpenClaw’s internal auth-profiles.json path. Verdict: partially resolved. Here’s what’s covered, here’s the gap, here’s the evidence.

Data replaces intuition

After enough sessions, patterns emerge. Each file records which skills were invoked and how many corrections were attributed to each. Aggregate across all files and you get an effectiveness table.

A skill invoked 18 times with zero corrections is doing its job. A skill invoked 4 times with 3 corrections needs surgery. A skill invoked once with one correction is noise, not signal.

You stop reviewing skills when something “feels off.” You review the ones the numbers flag. In theory, skills with consistently high correction rates could be fed to an autonomous optimization loop that iterates on the skill text using the correction data as an eval set. In practice, evaluating whether a skill “worked” requires LLM-as-judge, which is too noisy to optimize against reliably. For now, the effectiveness table points you at the problem. You fix the skill yourself. Your judgment is still faster and more accurate than an automated loop for prompt-shaped artifacts.

Routing to the right place

Every learning has a home. “Google Sheets range math: end_row = start_row + num_rows - 1” goes in a gotchas file. “Use DeepWiki for semantic queries, GitHub API for fresh data” goes with tool heuristics. “Default to Yellow for PM confidence when engineering hasn’t set Color Status” goes in the main config.

The right file matters because everything loaded into context costs tokens on every session. Dump all learnings into one place and the file bloats past usefulness. Your agent reads 80 irrelevant rules to find the 3 it needs.

Route learnings by type. Keep each file lean. Your context budget is the scarcest resource in the system.

The loop, closed

After every session: extract skills and corrections from the transcript, attribute corrections with two-pass verification, save structured proposals. During curation: show the effectiveness table, check each flagged skill’s content against its correction, route approved learnings to the right config file.

The paper called it “continual meta-learning.” What I built is three things: a capture hook that runs after every session, an attribution system that tells me which skill (if any) caused each failure, and a review workflow that checks whether I already fixed it before asking me to look at it.

Your agent can’t learn by itself. But you can build the scaffolding so every correction you make sticks.

And one more thing.

The paper that inspired all this proposes auto-generating skills from failures and auto-promoting them into the agent’s behavior. I deliberately didn’t do that. The human gate, the curation session where I review each learning and decide whether it’s worth keeping, is the most important part of the system. Not because automation couldn’t work. Because the judgment about what to remember is the one thing that should stay human.

An agent that remembers everything learns nothing. An agent that remembers only what you choose to teach it gets better every week.