Use multiple coding agents when the next change has independent scopes, stable contracts, isolated workspaces, and enough review capacity. If two agents need the same files, the interface is still moving, or nobody can inspect both diffs, keep the work with one agent.
The safest small-team default is not an autonomous agent swarm. Start with a human lead, one isolated implementer, and one verifier. Each task needs an owner, a worktree or branch, allowed files, a handoff artifact, a test command, and a merge gate.
| Route | Use it when | Minimum gate |
|---|---|---|
| One agent | Files overlap, contracts are moving, or review capacity is low. | Human review of one diff. |
| Two-agent pipeline | One agent can implement and another can verify without touching the same files. | Handoff artifact plus tests. |
| Three-agent split | Two independent scopes share a stable interface. | Interface owner plus sequential merge. |
| Agent team | Many independent streams exist and reviewers can keep up. | Strong ownership, CI, merge queue, and stop rules. |
Use the route board, task brief, worktree setup, handoff format, verification gate, merge queue, tool-surface choices, and review-hour metric before adding more parallel agents.
Start With The Route, Not The Agent Count
A multi-agent coding workflow is useful only when the team can split work into separate ownership zones. "Frontend agent plus backend agent" is not a safe split if both agents are still inventing the API. "Fix bug plus add tests" is not safe if the test agent must guess how the fix works. A good split has a stable contract, an isolated workspace, and a reviewer who can inspect the result without replaying every agent conversation.
For a small team, the first decision is usually one of four routes:
| Route | Team shape | Good first task | Stop signal |
|---|---|---|---|
| One agent | One human supervising one agent | Tight bug fix, moving refactor, sensitive file | The agent needs repeated steering in the same files. |
| Two-agent pipeline | Human lead, implementer, verifier | Implement a scoped change and have a second agent write or run checks | Verifier cannot explain the diff or needs to edit the same files. |
| Three-agent split | Lead plus two implementers or one implementer and one specialist | Frontend and backend after an API contract is frozen | Agents disagree about interface shape. |
| Agent team | Lead plus specialists plus verifier | Many independent streams with strong CI and review capacity | Review queue grows faster than accepted work. |
The route can change during a week. Start with a narrow two-agent pipeline, measure it, then add a second implementer only when the first split produced reviewable work. Adding agents before the ownership model exists usually increases review load faster than output quality.
When Multiple Coding Agents Actually Help
The best fit is breadth-first work with visible boundaries. Examples include one agent implementing a UI state while another writes integration tests, one agent investigating a failing path while another checks regression coverage, or two agents implementing separate modules after the shared type or API contract is already fixed.
Use more than one coding agent when all five conditions are true:
- The scope can be described in one or two filesets, not "the whole app."
- The shared contract is stable enough to hand to another worker.
- Each agent has an isolated workspace or branch.
- A human can review the diff without trusting a chat transcript.
- The team has enough CI and reviewer capacity to merge sequentially.
The fifth condition is the one small teams miss. Two agents can produce code faster than one reviewer can inspect it. If the lead becomes a full-time conflict resolver, the workflow is not faster; it just moved the bottleneck.
When One Agent Is Better
One agent is the right choice for tightly coupled work. Use a single supervised agent when the change touches the same hot files, the database migration is still being designed, the public API is unsettled, or the system has weak tests. Security-sensitive edits, billing flows, auth middleware, and repo-wide refactors also benefit from one agent and one human line of sight.
The practical stop rule is simple: if an agent needs another agent's unfinished code to know what to do, do not run them in parallel. Sequence the work instead:
- Human lead writes the contract.
- One agent implements the first change.
- Human or verifier approves the contract and diff.
- Next agent starts from the updated main branch.
This slower sequence often ships sooner because every later step has a real base instead of a guessed interface.
The Day-One Workflow
Use this small-team workflow for the first week:
- Human lead writes the issue brief and chooses one route.
- Implementer agent works in a separate branch or worktree.
- Verifier agent reads only the brief, changed files, and handoff artifact.
- Human reviewer checks the semantic diff and merge risk.
- Main branch receives one merge at a time.
- Remaining worktrees update from main before continuing.
Do not give every agent the full backlog. Give each agent a task it can finish, explain, and hand back. The output should be a branch plus a written artifact, not just a folder full of edits.
Use A Task Brief That Blocks Overlap
The task brief is the most important object in the workflow. It is the contract that prevents two agents from silently solving different problems.
mdAgent Task Brief owner: agent-a workspace: ../agent-a branch: agent/a-checkout-state goal: Add checkout state to the order summary component. non_goals: - Do not change payment provider code. - Do not change database schema. allowed_files: - src/features/checkout/** - tests/checkout/** forbidden_files: - src/lib/payments/** - db/migrations/** - package-lock.json interface_contract: - Use existing OrderStatus values. - If a new status is required, stop and ask the human lead. shared_state: - No local database migration. - Use test fixtures only. required_checks: - pnpm lint - pnpm test tests/checkout handoff_required: - changed files - commands run - proof or failing output - known risks - review focus done_means: - The verifier can reproduce the checks and explain the diff.
The forbidden-file list is not bureaucracy. It is how a small team keeps global files, schemas, package locks, route maps, and API contracts single-owner. If the task cannot name forbidden files, it is probably too broad for parallel coding.

Isolate Work With Worktrees Or Branches
Git worktrees are useful because each agent can have a separate working tree associated with the same repository. The official git worktree documentation covers the mechanics. In a small-team agent workflow, the reason to use worktrees is operational: one agent can run tests and edit files without dirtying another agent's checkout.
bashgit switch main git pull --ff-only git worktree add ../agent-a -b agent/a-checkout-state git worktree add ../agent-b -b agent/b-profile-copy git worktree add ../verify -b verify/checkout-state git worktree list
Worktrees isolate files; they do not isolate everything. Assign a human owner for shared resources before any agent starts:
| Shared resource | Owner rule |
|---|---|
| Database schema and migrations | One human owner, no parallel edits. |
| Ports and local services | Reserve names and ports in the task brief. |
| Secrets and env files | Read-only unless the human lead approves. |
| Package lock and dependency files | One branch owns dependency changes. |
| CI runners and queues | Merge one branch at a time when capacity is tight. |
| Feature flags and public API contracts | Freeze the interface before parallel work starts. |
If your environment cannot use worktrees, use separate branches and clean checkouts. The rule is the same: each agent needs a visible workspace boundary and a merge plan.

Make Handoffs Durable
A chat summary is not a handoff. The verifier and human reviewer need an artifact that survives outside the agent session. Put it in the pull request description, issue comment, or a short handoff.md file in the branch if your workflow allows it.
mdAgent Handoff task: checkout state in order summary owner: agent-a branch: agent/a-checkout-state changed_files: - src/features/checkout/OrderSummary.tsx - tests/checkout/order-summary.test.ts commands_run: - pnpm lint - pnpm test tests/checkout proof: - All checkout tests pass locally. known_risks: - Empty order state still uses the existing fallback copy. review_focus: - Confirm the OrderStatus mapping is complete. next_owner: - verifier
The verifier should not continue the implementation by default. Its first job is to check whether the implementer obeyed the brief: allowed files, tests, interface contract, and proof. If the verifier must edit the same feature files, treat that as a signal that the first task was underspecified.
Add Roles Only When They Reduce Load
Small teams do not need a large role taxonomy. Start with four roles and add specialists only when the work justifies them:
| Role | Job | Add when | Remove when |
|---|---|---|---|
| Human lead | Owns scope, contracts, merge order, and risk | Always | Never fully remove |
| Implementer | Produces a scoped diff | The brief has allowed and forbidden files | The task keeps crossing boundaries |
| Verifier | Tests, reviews, or challenges the diff | The implementer output is separable | The verifier needs to rewrite the same diff |
| Reviewer | Human semantic review and merge decision | Any code reaches the merge queue | Never skip for production work |
| Specialist | Handles docs, tests, migration, or investigation | There is an independent stream | It creates more handoff work than it saves |
The lead does not have to write every line, but the lead owns the contract. If no human owns the contract, agents will invent one, and each invention becomes a merge conflict later.
Verification Gate And Merge Queue
Parallel work becomes product work only at the merge queue. Merge branches sequentially, rerun checks, and update the remaining worktrees after every merge.
bashgit switch main git pull --ff-only git merge --no-ff agent/a-checkout-state pnpm lint pnpm test git push cd ../agent-b git fetch origin git rebase origin/main
The verification gate should check five things before a branch reaches main:
- The diff changes only allowed files.
- Required tests or checks were run and recorded.
- The interface contract is still intact.
- The handoff artifact names risks and review focus.
- A human understands the semantic change.
Use AI review as a helpful pass, not the final authority. OpenAI's Codex docs describe reviewable diffs, cloud tasks, and PR-oriented workflows in the Codex quickstart. Codex CLI also exposes operational controls such as /diff, /review, /status, /plan, /goal, /permissions, and /agent in the slash-command reference. Those controls help supervision, but they do not remove the need for human semantic review.

Tool Surfaces: Codex, Claude Code, Cursor, Worktrees, Agents SDK
Pick tools after the workflow shape is clear.
Codex is a good fit when you want reviewable diffs across app, IDE, CLI, and cloud task surfaces. OpenAI's Codex quickstart describes those surfaces, and the Agents SDK workflow guide shows how Codex can participate in more deterministic multi-agent pipelines with handoffs, guardrails, and traces. Use that path when the team wants explicit review checkpoints instead of loose parallel chat.
Claude Code is a good fit for supervised local sessions and product-specific agent-team experiments, but keep version-sensitive features and token use out of the operating contract. If the decision is specifically Codex versus Claude Code, use the separate Claude Code vs Codex comparison rather than turning the workflow into a product ranking.
Cursor background agents, worktree wrappers, and orchestration frameworks can implement the same process. The deciding factor is not the brand. It is whether the tool can preserve the brief, isolate the workspace, show the diff, record proof, and let a reviewer stop the merge.
If usage windows, quotas, or current Codex allowance are the real blocker, use the separate OpenAI Codex usage limits page. Do not mix quota policy with the workflow decision unless a limit changes the route for the current task.
Measure Accepted Work Per Review Hour
Raw agent count is a weak metric. Commits are also weak because agents can create large diffs that are hard to trust. Use a one-week scorecard:
| Metric | Why it matters | Stop threshold |
|---|---|---|
| Accepted diff per review hour | Measures useful work against the scarcest small-team resource | Falls below the single-agent baseline. |
| Rework rate | Shows whether agents are guessing at contracts | More than one major rewrite per task. |
| CI failure rate | Shows whether parallel branches are landing unstable changes | Failures repeat for the same reason. |
| Merge conflict minutes | Exposes hidden overlap | Conflicts recur in the same files. |
| Cost per accepted change | Keeps token or plan burn connected to shipped value | Cost rises while accepted work is flat. |
Run two or three issues through the workflow before standardizing it. Compare against one good single-agent baseline. If the two-agent pipeline produces cleaner tests, clearer review, or more accepted work per lead hour, keep it. If it produces more coordination work, shrink the split.
A One-Week Rollout Plan
Use a controlled rollout instead of changing the whole team's process at once.
| Day | Action | Output |
|---|---|---|
| Day 1 | Choose one low-risk issue with separate test coverage. | Task brief and one implementer branch. |
| Day 2 | Add verifier pass without implementation edits. | Handoff artifact and verification notes. |
| Day 3 | Merge sequentially and record review minutes. | Baseline merge queue data. |
| Day 4 | Try a second independent implementer only if contracts are stable. | Two isolated branches. |
| Day 5 | Compare accepted work, rework, conflicts, CI, and review time. | Keep, shrink, or stop decision. |
The goal is not to prove that agent teams are impressive. The goal is to find the smallest process that lets a small team ship reviewable code with less lead time and less rework.
FAQ
What is the smallest safe multi-agent coding workflow?
Start with a human lead, one implementer agent, and one verifier agent. The implementer writes a scoped diff in an isolated workspace. The verifier checks the brief, tests, allowed files, and risks. A human reviews and merges.
Should a two-person team use multiple coding agents?
Yes, but only for separable work. A two-person team should usually start with a two-agent pipeline, not a large agent team. The human lead needs enough time to inspect the diff and the handoff artifact.
Are git worktrees required?
No. Worktrees are a strong default because they keep each agent in a separate working tree, but separate branches or clean checkouts can work. The real requirement is workspace isolation plus a merge plan.
What files should stay single-owner?
Keep schemas, migrations, package locks, environment files, auth middleware, feature flags, public API contracts, and shared route maps single-owner unless the human lead explicitly changes the contract.
Can AI review replace human review?
No. AI review can catch syntax, test, style, and consistency issues. Human review still owns semantics, product risk, security, data handling, and merge responsibility.
When should the team stop using multiple agents?
Stop when agents need the same files, the interface is still changing, reviewers cannot keep up, CI failures repeat, or accepted diff per review hour drops below the single-agent baseline.
Where do Codex and Claude Code fit?
Use Codex, Claude Code, Cursor, worktree wrappers, or orchestration frameworks as surfaces for the same workflow. Pick the surface that preserves the brief, isolated workspace, diff, proof, and review gate for the task.
How many agents should a small team run at once?
Most small teams should prove one implementer plus one verifier before trying more. Three agents can work when scopes are independent. A larger agent team needs strong ownership, CI, a merge queue, and review capacity.
