entry Mar 1, 2026 · 8 min read

Shadowing AI: What I Learned by Watching Agents Code

I've been learning new programming languages by watching AI coding agents work — like shadowing a colleague. The research says this should make me worse. Here's why I think it's more complicated than that.

🤖 Co-authored with AI

Contents

Lately I’ve been learning new programming languages by watching AI agents write code in them.

Not vibe coding. Not prompting and pasting. I mean sitting there, watching Claude Code or a similar agent work through a problem in a language I don’t know well. Studying what it does. Why it structures things a certain way. Which libraries it reaches for. Then going back through the output to understand the decisions.

Shadowing AI

It feels like shadowing a senior colleague. You don’t touch the keyboard. You watch, you ask questions, you build a mental model. Then gradually you start driving.

How it actually works

Say I want to learn Rust. I have a Go background, so I understand systems programming concepts, but I don’t know Rust’s ownership model, its borrow checker, or its idioms. Instead of starting with a book, I start with a real problem. I tell the agent: implement this thing in Rust. Then I watch.

The agent makes decisions. It chooses Result<T, E> over panicking. It uses pattern matching where I’d reach for an if-else chain. It picks a specific crate for HTTP, and I can see why. It handles lifetimes in a way that teaches me what lifetimes are for, because I can see the code that would fail without them.

Then I go back through the diff. I read every line. I look up things I don’t understand. I try modifying the code to see what breaks. Sometimes I ask the agent to explain a specific choice — why Arc<Mutex<T>> here, why not RefCell?

Sometimes it’s overwhelming. The agent moves fast. It writes 200 lines in seconds, and I need 20 minutes to understand what just happened. But that’s fine. I can scroll back. The code isn’t going anywhere.

The research says I should be getting dumber

There’s a body of research that says using AI for coding hurts learning. And the findings are hard to ignore.

Anthropic published a study in January 2026 where they ran a randomized controlled trial with 52 engineers learning Trio, a Python async library. The group using AI assistance scored 17% lower on comprehension tests than the group coding manually. No statistically significant speed improvement either. Just skill degradation.

The researchers identified patterns. People who scored poorly tended to delegate everything to the AI, or started with a few questions and gradually handed over all code writing. People who scored well asked follow-up questions, combined code generation with explanations, or used AI only for conceptual questions while coding independently.

Then there’s a paper from October 2025 called “Observing Without Doing: Pseudo-Apprenticeship Patterns in Student LLM Use” by researchers studying introductory CS students. They found students treating LLMs as expert models but never progressing through the stages of cognitive apprenticeship that build independence. Students would skip planning, work backward from complete AI solutions, and rarely engage in the refinement or independent exploration that characterizes real learning.

The term they coined, pseudo-apprenticeship, is a good one. It describes a learning pattern that looks like apprenticeship but isn’t. You’re in the room with the expert, but you never pick up the tools.

A University of Saarland study on knowledge transfer in AI pair programming found a related problem: programmers in AI-assisted pair programming accept suggestions with minimal scrutiny far more often than in human-human pairing.

And a large-scale study with 234 students across two academic years found that while AI pair programming reduced anxiety and increased motivation, the learning outcomes depended entirely on how students engaged with the AI output.

Why I think the framing is incomplete

Most of this research studies beginners learning their first language, or developers learning an isolated library in a controlled setting. That’s a real scenario, and the findings matter for education.

But that’s not what I’m doing.

I’m an experienced developer with years of building systems in Go, Python, and other languages. When I watch an agent write Rust, I’m not a blank slate. I have a mental framework for what good code looks like, how systems are structured, what error handling patterns make sense. I’m pattern-matching against what I already know, and I’m filling in the Rust-specific gaps.

This is closer to how a senior engineer learns a new stack by pairing with someone who knows it. You don’t learn Java by reading a Java textbook if you’ve been writing C# for a decade. You sit with someone who knows Java, you watch them work, and you map the new concepts onto your existing knowledge. The ownership model in Rust? That maps onto my understanding of memory management in Go. Pattern matching? I know functional concepts from other contexts. The agent is giving me the Rosetta Stone, not the textbook.

DHH put it well on X: “AI as pair programmer helps when you drive the code and it explains APIs/concepts; letting it lead means learning nothing.” Thousands of likes on that one, and I mostly agree. But I’d add a nuance. Sometimes letting it lead while you actively study the output is a form of driving. The difference between passive consumption and active analysis is everything.

The cognitive apprenticeship angle

Cognitive apprenticeship theory (Collins et al., 1989) describes how novices develop expertise through stages: modeling (watching the expert), coaching (receiving guidance), scaffolding (working with diminishing support), articulation (explaining your reasoning), reflection, and exploration.

The pseudo-apprenticeship problem happens when people get stuck at step one and never move on. They watch the AI, accept the output, and skip the remaining five stages.

What I’ve found is that you have to deliberately push yourself through the full sequence. I start by watching the agent solve the problem. Then I ask it to explain specific choices — why this pattern? What breaks if I do it differently? Then I take over partially: I write the next function myself, let the agent review it. I document what I learned. I compare the agent’s approach to what I would have done in Go. And eventually I try a new problem without the agent at all.

That last part is where the real test happens. If I can write reasonable Rust after a week of shadowing, the approach worked. If I can’t, I was pseudo-apprenticing and need to adjust.

What people actually think about this

Online, the community is split, roughly along experience lines.

A highly-upvoted post on r/ClaudeAI (1,400+ upvotes) after six months of daily AI pair programming found that the key was making the AI plan first, then critiquing the plan — not just accepting generated code. The author’s workflow mirrors what the research calls high-scoring patterns: active engagement, not delegation.

On the other end, a senior developer on r/learnprogramming wrote: “the amount of junior, mid and sometimes even senior developers who cannot write a simple code by their own without using AI is absolutely ridiculous.” That post hit 1,100+ upvotes and 288 comments. The frustration is real and well-documented.

And on r/ClaudeAI, one post with 2,700+ upvotes argued that language specialization is dead — “the idea of a Python dev or React dev is outdated” — because AI eliminates the language barrier entirely. If you understand the problem domain, the language is just the output format.

I keep thinking about that. When I shadow an agent writing Rust, I’m not really learning Rust in isolation. I’m learning how my existing problem-solving instincts translate into Rust syntax and idioms.

The speed problem

I should be honest about the downsides. Agents are fast. Too fast. When Claude Code writes 300 lines of well-structured Rust in 15 seconds, there’s a real risk of glossing over it. Your eyes move but your brain doesn’t process.

A few things help.

After the agent finishes, I don’t move to the next prompt. I read the output like a code review. Line by line. If I don’t understand something, I flag it and come back.

The diff view is underrated. When the agent modifies existing code, the diff shows exactly what changed and what stayed. More useful than reading the whole file, because it highlights the decisions.

Sometimes I tell the agent to explain what it’s about to do before it does it. This slows things down and gives me a preview of the strategy before I see the implementation. It’s like asking a colleague “walk me through your thinking” before they start typing.

And honestly, breaks matter. After a complex change, I step away. Come back. Read it again. The second read is always better than the first.

When this falls apart

I should be clear about the limits. This only works if you already know how to program in at least one language and have enough context to evaluate whether the output is reasonable. If you’re a true beginner with no mental model for code, watching an agent is just watching someone type fast. You won’t know what’s good, what’s bad, or what’s idiomatic versus hacky.

It also falls apart if you never actually write code yourself. Watching is the first stage, not the whole thing. Skip the hands-on part and you end up in exactly the trap the research describes.

The Anthropic study’s finding about the “competency feedback loop” is real. You use AI to avoid the struggle. Your skills decay. You become more dependent on AI. Which accelerates the decay. The way out is to use the struggle deliberately, not avoid it.

What I’m actually learning

Maybe it isn’t Rust or Swift or whatever language comes next. The real thing I’m training is reading code critically and fast. Every hour spent studying agent output teaches me to parse unfamiliar code, spot patterns, find issues, and build mental models of systems I didn’t write.

That transfers. Across languages, across codebases, across the growing reality that we’ll be reading more AI-generated code in production whether we like it or not. I wrote before about verification bandwidth being the bottleneck. Shadowing AI agents is, in a weird way, training for exactly that.

I’m not saying the research is wrong. The 17% skill degradation is real for the populations they studied. But “AI hurts learning” as a blanket statement misses a case: experienced developers using AI observation as a deliberate way to build cross-language fluency, paired with active analysis and eventually writing code on their own.

Deliberate is the word that makes the difference. I keep coming back to it. Watching passively is just television. Watching while questioning, pulling apart decisions, and then trying it yourself, that’s closer to real apprenticeship. At least, that’s been my experience so far.

Sources:

How AI Assistance Impacts the Formation of Coding Skills — Anthropic, January 2026
How AI Impacts Skill Formation (full paper) — Shen & Tamkin, arXiv, 2026
Observing Without Doing: Pseudo-Apprenticeship Patterns in Student LLM Use — arXiv, October 2025
An Empirical Study of Knowledge Transfer in AI Pair Programming — University of Saarland, 2025
Impact of AI-Assisted Pair Programming on Student Motivation — Springer, March 2025
Cognitive Apprenticeship — Collins, Brown & Newman, 1989