How I Use Claude Code Agent Teams to Review My Codebase
I pointed three Claude agents at my codebase at the same time. One checked for API inconsistencies, one audited error handling, one reviewed the database schema. In 40 minutes they found issues I'd missed for months.
What You'll See
- What Agent Teams actually is and how it works
- The real multi-agent workflow I ran
- What it found that I couldn't find alone
- Where it broke down and what I'd change
What Agent Teams Does
Agent Teams is an experimental feature in Claude Code that lets you run multiple Claude agents in parallel. Each agent gets its own task and context. They work independently, then you see all their results together.
It's token-intensive. One review session can eat through a significant portion of your daily allocation. But for the right task, it's worth it.
To enable it, set the environment variable: CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
The Setup
I had a codebase with about 80 database tables and 23 modules. The kind of project where things drift apart over time — the frontend assumes one thing, the backend does another, and nobody notices until something breaks.
I created three agents:
Agent 1: API Consistency Check — "Compare every frontend API call against the backend route definitions. Flag any mismatches in parameters, response shapes, or authentication requirements."
Agent 2: Error Handling Audit — "Check every API endpoint for proper error handling. Flag any that return raw errors to the client or silently swallow failures."
Agent 3: Schema vs. Code Review — "Compare the database schema against the ORM models. Flag missing fields, type mismatches, or unused tables."
What It Found
The results surprised me. Agent 1 found 4 endpoints where the frontend was sending parameters the backend no longer expected. Agent 3 found 2 database tables that weren't referenced anywhere in the codebase.
None of these were bugs that would crash the app. They were the kind of technical debt that silently accumulates and eventually causes weird behavior that's hard to trace.
Where It Broke Down
⚠️ Agent 2 hit the context limit on a large module and started hallucinating file paths. I had to re-run it with a narrower scope.
⚠️ The agents don't coordinate. Agent 1 found an API mismatch that was actually intentional (a migration in progress). Without shared context, it couldn't know that.
⚠️ Token consumption was heavy. Three agents reviewing a medium-sized codebase used roughly the equivalent of a full day's Opus allocation.
What I'd Do Differently
💡 Scope each agent more tightly. "Check all 80 tables" is too broad. "Check the user and payment modules" gives better results.
💡 Add a synthesis step. After all agents finish, I manually created a new session to combine their findings and prioritize. This step is manual for now, but it's where the real value crystallizes.
💡 Run it monthly, not daily. Agent Teams makes sense for periodic deep reviews, not everyday coding.
Getting Started
You need Claude Code with Opus 4.6 for best results. Set the environment variable to enable Agent Teams. Start with two agents on a single module before scaling up.
Docs: Claude Code Agent Teams