Building Agentic Flows with Claude Code

Part 4 — Advanced

Chapters 12-15 · MCP servers, plugins, best practices, and scaling.

Chapter 12

MCP Servers

12.1 What MCP Is

The Model Context Protocol (MCP) is an open standard for connecting AI agents to external tools and services. Claude Code is an MCP client — it can call tools from any MCP server, giving your agents access to systems far beyond the local file system.

What this unlocks in practice:

  • A researcher agent that queries your company's database directly
  • A reviewer agent that posts GitHub review comments automatically
  • An ops agent that reads from and writes to your CRM
  • A pipeline that pushes published articles to your CMS
  • Any workflow that needs to touch an external API or data source

MCP servers are separate processes that expose tools via a standard interface. Claude Code discovers them from your configuration and makes their tools available just like built-in tools (Read, Write, Bash).


12.2 Adding MCP Servers

Via CLI (one-time setup)

bash
# Add a remote HTTP MCP server
claude mcp add --transport http my-api https://api.mycompany.com/mcp

# Add a local stdio MCP server (runs as a subprocess)
claude mcp add --transport stdio github npx -y @modelcontextprotocol/server-github

# List configured servers
claude mcp list

# Remove a server
claude mcp remove github

Via settings.json (project-persistent)

json
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "supabase": {
      "command": "npx",
      "args": ["-y", "@supabase/mcp-server-supabase"],
      "env": {
        "SUPABASE_URL": "${SUPABASE_URL}",
        "SUPABASE_SERVICE_ROLE_KEY": "${SUPABASE_KEY}"
      }
    }
  }
}
💡
Use environment variable references (${VAR_NAME}) for credentials in settings.json — never hardcode secrets. Claude Code resolves them from your shell environment.

12.3 MCP in Agentic Flows

Giving agents MCP tool access

Add MCP tools to an agent's tools whitelist using the format mcp__server-name__tool-name:

yaml
---
name: github-reviewer
description: Reviews GitHub pull requests, posts comments, and updates PR
  status. Use when asked to review a PR or check open pull requests.
model: sonnet
tools: Read, mcp__github__list_pull_requests, mcp__github__get_pull_request,
       mcp__github__create_review, mcp__github__add_pull_request_review_comment
permissionMode: default
---

You are a GitHub PR reviewer. You review code changes and post structured
feedback directly to the pull request.

When invoked with a PR number:
1. Fetch the PR details and diff
2. Review for: correctness, security, test coverage, style
3. Post inline comments for specific issues
4. Submit a review with verdict: Approve / Request Changes / Comment

Skill + MCP combination

Skills work well as MCP orchestrators — they define the workflow, agents provide the expertise:

yaml
---
name: weekly-metrics
description: Pulls this week's key metrics from our database and generates
  an executive summary. Run every Monday. Invoke with /weekly-metrics.
user-invocable: true
allowed-tools: Read, Write, mcp__supabase__execute_sql
---

# Weekly Metrics Skill

Generate the weekly metrics report from our Supabase database.

## Data Queries
Run these queries and save results to /work/metrics-raw.json:

1. New signups this week:
   SELECT COUNT(*) FROM users WHERE created_at >= NOW() - INTERVAL '7 days'

2. Active users (any event in 7 days):
   SELECT COUNT(DISTINCT user_id) FROM events
   WHERE timestamp >= NOW() - INTERVAL '7 days'

3. Revenue this week:
   SELECT SUM(amount) FROM transactions
   WHERE created_at >= NOW() - INTERVAL '7 days' AND status = 'completed'

## After collecting data
Invoke @analyst agent to generate the executive summary from the raw data.
Save the summary to /output/weekly-metrics-{date}.md

12.4 Real Integration Examples

GitHub — code review workflow

json
## MCP Setup
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" }
    }
  }
}

## Usage
"Review all open PRs in the anthropic/claude-code repository
that are more than 2 days old and haven't been reviewed yet."

Supabase — data analysis agent

yaml
---
name: data-analyst
description: Queries the Supabase database and produces structured analysis.
  Use for any reporting, metrics, or data investigation task.
model: opus
tools: Read, Write, mcp__supabase__execute_sql, mcp__supabase__list_tables
permissionMode: default
---

You are a data analyst with direct database access.

Before writing any query:
1. List available tables to understand the schema
2. Write safe, read-only SELECT queries
3. Never use DELETE, UPDATE, DROP, or INSERT without explicit user permission

Format your output as: findings in prose + supporting tables + raw SQL used

Google Drive — document pipeline

yaml
---
name: doc-publisher
description: Exports finished articles to Google Drive in the Publications
  folder. Use after /format-article completes and the article is approved.
model: haiku
tools: Read, mcp__gdrive__create_file, mcp__gdrive__move_file
permissionMode: default
---

When given an article path:
1. Read the article from /output/
2. Create a new Google Doc in the "Publications/Drafts" folder
3. Report the document URL to the user
Chapter 13

Plugins

13.1 What Plugins Are

A plugin is a portable bundle of agents, skills, hooks, and commands that can be installed once and used across all your projects. Where a project's .claude/ directory is local to one codebase, a plugin is global — available everywhere.

Two reasons to create a plugin:

  • Personal reuse: You've built something excellent (a code reviewer, a weekly report generator, a research assistant) and you want it in every project without copying files.
  • Distribution: You want to share your system with your team, open-source it, or publish it to the Claude plugin marketplace.

13.2 Creating a Plugin

Directory structure

bash
my-plugin/
├── .claude-plugin/
│   └── marketplace.json          ← plugin manifest (required)
├── agents/
│   ├── code-reviewer.md
│   └── doc-writer.md
├── skills/
│   └── pr-review/
│       └── SKILL.md
└── hooks/
    └── auto-lint.json

The manifest file

json
{
  "name": "my-dev-toolkit",
  "version": "1.2.0",
  "description": "Code review, documentation, and PR workflow tools for development teams",
  "author": "Your Name",
  "license": "MIT",
  "repository": "https://github.com/you/my-dev-toolkit",
  "components": {
    "agents": ["agents/code-reviewer.md", "agents/doc-writer.md"],
    "skills": ["skills/pr-review/SKILL.md"],
    "hooks": ["hooks/auto-lint.json"]
  },
  "permissions": {
    "tools": ["Read", "Grep", "Bash"],
    "network": false
  }
}

What to include vs. exclude

IncludeExclude
Agents and skills with broad applicabilityProject-specific agents (they reference your codebase structure)
Hooks for general automation (lint, notify)Hardcoded file paths or project-specific rules
Supporting files (templates, checklists)Credentials or environment-specific configuration
A README explaining how to use each componentFiles from your .claude/agent-memory/

13.3 Installing and Managing Plugins

Install from a URL

bash
# Install from GitHub
/plugin install https://github.com/you/my-dev-toolkit

# Install from the marketplace
/plugin marketplace add my-dev-toolkit

# List installed plugins
/plugin list

# Update a plugin
/plugin update my-dev-toolkit

# Remove a plugin
/plugin remove my-dev-toolkit

Using installed plugin components

Plugin agents appear in the @ typeahead as plugin-name:agent-name:

bash
@my-dev-toolkit:code-reviewer please review this file

Plugin skills appear as slash commands prefixed with the plugin name:

bash
/my-dev-toolkit:pr-review
⚠️
Security review before installing. Plugin hooks run shell commands on your machine. Before installing any plugin — especially from unknown sources — read the manifest and all hook configurations. A malicious hook could exfiltrate files or execute arbitrary code.
Chapter 14

Best Practices

14.1 The QA Layer: Self-Auditing Systems

In production agentic systems, the QA Layer is a three-role cycle that runs automatically after each completed step. It's external to the creative process: it observes, measures, and proposes — but never modifies or makes decisions on behalf of the user.

Three QA roles

AgentRoleCanCannot
Auditor (age-spe-auditor)Verifies rule complianceRead files, report complianceModify anything, suggest improvements
Evaluator (age-spe-evaluator)Scores phase qualityCalculate scores, write to qa-report.mdModify entities, issue qualitative judgements
Optimizer (age-spe-optimizer)Proposes improvementsDetect patterns, propose changesApply changes automatically

Scoring rubric

The Evaluator scores each phase on four weighted dimensions:

DimensionWeightWhat it measures
Completeness30%Are all required elements present and fully formed?
Quality30%Specificity and concreteness of the output
Compliance25%Adherence to active rules
Efficiency15%Number of iterations/regenerations needed

Scores: Excellent (≥9.0) | Good (7.0-8.9) | Improvable (5.0-6.9) | Critical (<5.0)

The QA cycle in practice

After each approved step, the cycle fires automatically:

  1. Auditor reads rules from disk and checks compliance → appends Audit Report to qa-report.md
  2. Evaluator scores the phase → appends Score block to qa-report.md
  3. At process close: Optimizer analyzes all audit/score blocks → proposes prioritized improvements

The qa-report.md file is append-only — it is never overwritten, creating a complete audit trail.

Example: Auditor agent

yaml
---
name: age-spe-auditor
description: Audits phase outputs for rule compliance by reading rules
  from disk at audit time. Use after each approved checkpoint to verify
  the output follows all active constraints.
model: opus
tools: Read, Grep, Glob
permissionMode: plan
---

You are a compliance auditor. Read each active rule file from
.claude/rules/ at audit time — never rely on cached versions.

For each rule, verify compliance and report:
- ✅ Compliant (with supporting evidence)
- ⚠️ Partially compliant (what's missing)
- ❌ Non-compliant (specific violation)

Append your report to qa-report.md. Never modify any other file.

Quality gates in CLAUDE.md

markdown
## Quality Gate Rules
After each approved checkpoint:
1. Invoke @age-spe-auditor — reads rules from disk, appends audit to qa-report.md
2. Invoke @age-spe-evaluator — scores the phase, appends score block
3. If score < 5.0 (Critical): warn the user before proceeding
4. At process close: invoke @age-spe-optimizer for improvement proposals

Never skip the quality gate for outputs going to /output/ (published content).

Iterating on generated systems

Once a system is in production, you can evolve it without starting from scratch using three iteration modes:

ModeWhenWhat happens
PATCHFix or update specific entitiesEntity builder edits in place → patch version bump
REFACTORReorganize architectureArchitecture designer produces delta blueprint → minor bump
EVOLVEAdd new capabilitiesMini-discovery → architecture → implementation → minor/major bump

Each iteration creates a branch (e.g., iter/0.2.0-add-email-skill). When ready, merge to main and tag the version.


14.2 Multi-Model Strategies

Assigning the right model to each agent is one of the highest-leverage decisions in system design. The difference between Haiku and Opus is 10-20x in cost and 3-5x in latency — for the right task, both ends of the spectrum are correct.

ModelCostBest agent typesAvoid for
Haiku 4.5LowestClassifiers, routers, format validators, simple extractorsComplex reasoning, nuanced writing, architectural decisions
Sonnet 4.6MediumWriters, editors, code generators, reviewers, most implementation workTasks requiring deep multi-step reasoning across large codebases
Opus 4.6HighestArchitects, complex researchers, orchestrators for difficult decisions, QA auditorsHigh-volume routine tasks where cost compounds

The content pipeline example, optimized

AgentModelReasoning
Classifier (routes requests)HaikuBinary classification — doesn't need reasoning depth
ResearcherOpusSource evaluation requires deep judgment; errors compound downstream
WriterSonnetGood writing within clear constraints; fast iteration
ReviewerSonnetChecklist evaluation — structured, not creative
QA AuditorOpusFinal gate before publication — highest stakes, justify the cost
FormatterHaikuMechanical formatting — no judgment required

14.3 The Golden Rules

These rules emerge consistently from production systems. Violate them early, and you'll spend hours debugging problems that didn't need to exist.

#RuleWhat breaks when you ignore it
1One responsibility per agent. If you can't name it in two words, it does too much.Agents become unpredictable. Failures are hard to locate. Context windows fill with unrelated work.
2Description is a routing rule, not a label. Write trigger conditions, not agent biography.Wrong agents get invoked. Auto-delegation fails. You end up explicitly mentioning agents for everything.
3CLAUDE.md stays lean. If it's workflow-specific, it's a Skill.CLAUDE.md balloons. Context fills every session. Agents load irrelevant instructions.
4Rules need alternatives. "Never X" → "Instead of X, do Y".Agents know what not to do but not what to do instead. They find workarounds or fail silently.
5Test the simplest version first. One agent, one task, end to end. Then add.You build a 12-agent system, something breaks, and you can't tell where.
6Commit your .claude/ directory. Always.A working system isn't reproducible. Colleagues can't run it. You can't roll back.
7Explicit file paths in spawn prompts. Don't assume agents know where to look.Context gets lost between stages. Agents read the wrong files or write to unexpected locations.

14.4 Common Anti-Patterns

The God Agent

One agent that does research, writing, editing, formatting, and publishing. The description is a paragraph. The instructions are 4,000 words. It works sometimes and fails inconsistently.

Fix: Apply the decision tree from Chapter 4. Every distinct responsibility becomes its own agent.

The Prompt Novel

CLAUDE.md is 80KB of detailed instructions, edge cases, examples, and backstory. Every session loads all of it. Context is 40% consumed before the user types a word.

Fix: Apply the golden rule: if it's workflow-specific, it's a Skill. CLAUDE.md should be scannable in 30 seconds.

The Invisible Rule

A critical constraint is buried in paragraph 7 of a 12-paragraph system prompt. Agents read it once, follow it 60% of the time, and violate it silently the other 40%.

Fix: Rules go in a clearly labeled ## Rules section with bullet points. One rule per bullet. Put the most important rules first.

The Over-Engineered System

A 3-step workflow with 12 agents, 8 skills, 15 rules, and 6 hooks. Built in a weekend. Debugged for a month.

Fix: Start with the minimum that works. Add an agent or skill only when a specific, recurring problem requires it. Complexity is easy to add; hard to remove.

The Silent Context Loss

Agent B gets invoked after Agent A, but the orchestrator doesn't tell B what A produced. B starts from scratch, duplicates work, or makes different assumptions.

Fix: Establish a file handoff convention. Every agent writes its output to a predictable path. Every spawn prompt explicitly references that path.

The Vague Description

Two agents with overlapping descriptions: "handles writing tasks" and "creates content". Claude hesitates between them, picks arbitrarily, or asks you every time.

Fix: Each description must answer: in what specific situation should this agent be invoked? Front-load with trigger phrases. No overlap between agents.


14.5 Debugging Agentic Flows

When something goes wrong in a multi-agent flow, the debugging approach is systematic — not exploratory.

Step 1: Identify where in the flow it broke

Check the intermediate files. If /work/research-topic.md exists and looks right but /work/draft-topic.md is wrong, the problem is in the writer, not the researcher.

Step 2: Check what context the agent received

Add a temporary rule to CLAUDE.md:

markdown
## Debugging Rule (temporary — remove after fixing)
Each agent, at the very start of its task, must output:
"CONTEXT RECEIVED: [paste first 200 characters of your spawn prompt here]"

Step 3: Use Plan Mode before executing

Run /plan before triggering the workflow. Claude will show its complete delegation plan — which agent it intends to invoke, with what context, in what order. Catch problems here, before any files are touched.

Step 4: Run /doctor

bash
/doctor

Checks your Claude Code setup for common configuration problems: missing frontmatter, invalid tool names, unreachable MCP servers, malformed settings.json.

Step 5: Isolate and test a single agent

Invoke the suspect agent explicitly with a known-good input:

bash
@writer I'm going to give you a test research brief. Read it carefully and write a draft.

Research: [paste brief directly here]

Write the draft to /work/test-draft.md

If the agent works correctly in isolation, the problem is context transfer. If it fails in isolation, the problem is the agent's instructions.

Logging agent outputs to files

Add a rule to CLAUDE.md that persists agent reasoning:

markdown
## Logging Rules
Every agent must append a summary of what it did to /logs/agent-log.md:

Format:
---
[timestamp] @agent-name
Input received: [one line summary]
Action taken: [one line summary]
Output written to: [file path]
Issues encountered: [none / description]
---

14.6 Non-Coding Use Cases

Claude Code is marketed as a coding tool, but the agent runtime it provides is general-purpose. The file system is your workspace; the agents are your specialists; the skills are your procedures. None of that requires code.

DomainExample systemAgents involved
Content productionArticle pipeline: research → draft → review → publishResearcher, writer, editor, SEO analyst, formatter
Market researchCompetitive intelligence: monitor → analyze → reportCompetitor tracker, analyst, report writer
Project managementWeekly digest: gather status → synthesize → distributeStatus collector, summarizer, distributor
HR / recruitingCV screening: parse → score → shortlist → notifyCV parser, scorer, shortlister, email drafter
FinanceExpense review: categorize → flag anomalies → reportClassifier, anomaly detector, report generator
Legal / complianceContract review: extract clauses → check against policy → flagClause extractor, policy checker, risk reporter
Customer successFeedback triage: classify → route → draft responseClassifier, router, response drafter

The pattern is identical in every case: decompose the process into stages, assign one agent per stage, design the file handoffs, encode the orchestration in CLAUDE.md. The domain changes; the architecture doesn't.

Chapter 15

From Here

15.1 Iterating on Your Systems

Every system you build will need refinement. The patterns that help:

Version control your .claude/ directory

Commit after every meaningful change. Your agent files are configuration code — they deserve the same discipline as application code. A working system that can't be reproduced or rolled back isn't a system; it's luck.

bash
git add .claude/ CLAUDE.md
git commit -m "refine: tighten researcher description to prevent overlap with writer"

Keep a changelog for your agent system

Add a AGENTS-CHANGELOG.md at your project root. When something changes and why:

markdown
# Agent System Changelog

## 2026-04-10
- researcher: added explicit instruction to flag conflicting sources
- writer: tightened sentence length rule (was 30 words, now 25)
- REASON: reviewer consistently flagged complex sentences; upstream fix is cleaner

## 2026-04-03
- Added qa-auditor agent as final gate before /output/
- REASON: two articles published with uncited claims; gate catches this now

## 2026-03-28
- Moved style guide from CLAUDE.md to .claude/knowledge/style-guide.md
- REASON: CLAUDE.md was 8KB; style guide is only relevant for writer agent

Start simple, measure, improve

The right sequence: one agent working correctly → two agents with file handoff → add reviewer → add quality gate → add hooks. Each addition should solve a concrete, observed problem — not a theoretical one.


15.2 Team Adoption

Sharing via Git

Because your system is files, sharing is a pull request. Commit your .claude/ directory and CLAUDE.md. Teammates clone the repo and open it in their Code tab — they immediately have the same agents, skills, and rules.

Project vs. user scope for teams

  • Project scope (.claude/agents/) — agents specific to this codebase. In Git. Everyone on the team gets them.
  • User scope (~/.claude/agents/) — personal productivity agents. Not in Git. Yours alone.

Onboarding new team members

Add a section to your project's README:

markdown
## AI-Assisted Workflows

This project includes Claude Code agent configurations in `.claude/`.

To use them:
1. Install Claude Desktop and enable the Code tab
2. Open this project folder in the Code tab
3. The agents and skills are auto-discovered

Available agents:
- @code-reviewer — review any file or diff
- @doc-writer — generate documentation from code
- @test-writer — generate tests from implementation

Type /help inside Claude Code to see all available skills.

Managed subagents (organization-level)

Enterprise Claude accounts can configure organization-level agents that are available to all users without needing to be in each project's .claude/ directory. Contact your account manager or check code.claude.com/docs for current managed agent capabilities.


15.3 Resources

ResourceWhat you'll find there
code.claude.com/docsOfficial Claude Code documentation — authoritative reference for features, commands, and configuration
docs.anthropic.comAnthropic's full documentation including API reference, model specs, and prompt engineering guides
AiAgentArchitectAn open-source framework for automated agentic system design — generates .claude/ configurations from workflow descriptions. The architectural concepts in this guide draw from its approach.
Claude Code GitHub discussionsCommunity patterns, tips, and troubleshooting from practitioners building real systems

15.4 The Mindset Shift

This guide began with a simple observation: most people use AI at Level 1 or 2. They ask questions and get answers. They write prompts and get responses. They're good at it — but they're still in conversation mode.

The shift this guide has been building toward is architectural. You're no longer asking "what should I prompt?" You're asking "what system should I design?" The AI isn't your counterpart in a conversation — it's your workforce, structured and deployed.

The architect's role, as you've seen across these chapters, is:

  • Decompose — break any process into stages with clear responsibilities
  • Delegate — assign each stage to the right specialist with the right tools
  • Review — build quality gates, not just execution paths

The systems you build will outlast the conversations that inspired them. A well-designed agent, committed to Git, onboarded to your team, running reliably in production — that's a different category of work than a good prompt.

Start with the minimum working system. Commit it. Run it. Observe what breaks. Fix exactly that. Repeat.

What's next?

Start building your agentic system

AiAgentArchitect transforms a single conversation into a complete, deployable multi-agent system. Explore the framework or get in touch.