Part 4: Advanced — Building Agentic Flows with Claude Code

Chapter 12

MCP Servers

12.1 What MCP Is

The Model Context Protocol (MCP) is an open standard for connecting AI agents to external tools and services. Claude Code is an MCP client — it can call tools from any MCP server, giving your agents access to systems far beyond the local file system.

What this unlocks in practice:

A researcher agent that queries your company's database directly
A reviewer agent that posts GitHub review comments automatically
An ops agent that reads from and writes to your CRM
A pipeline that pushes published articles to your CMS
Any workflow that needs to touch an external API or data source

MCP servers are separate processes that expose tools via a standard interface. Claude Code discovers them from your configuration and makes their tools available just like built-in tools (Read, Write, Bash).

12.2 Adding MCP Servers

Via CLI (one-time setup)

bash

# Add a remote HTTP MCP server
claude mcp add --transport http my-api https://api.mycompany.com/mcp

# Add a local stdio MCP server (runs as a subprocess)
claude mcp add --transport stdio github npx -y @modelcontextprotocol/server-github

# List configured servers
claude mcp list

# Remove a server
claude mcp remove github

Via settings.json (project-persistent)

json

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "supabase": {
      "command": "npx",
      "args": ["-y", "@supabase/mcp-server-supabase"],
      "env": {
        "SUPABASE_URL": "${SUPABASE_URL}",
        "SUPABASE_SERVICE_ROLE_KEY": "${SUPABASE_KEY}"
      }
    }
  }
}

💡

Use environment variable references (${VAR_NAME}) for credentials in settings.json — never hardcode secrets. Claude Code resolves them from your shell environment.

12.3 MCP in Agentic Flows

Giving agents MCP tool access

Add MCP tools to an agent's tools whitelist using the format mcp__server-name__tool-name:

yaml

---
name: github-reviewer
description: Reviews GitHub pull requests, posts comments, and updates PR
  status. Use when asked to review a PR or check open pull requests.
model: sonnet
tools: Read, mcp__github__list_pull_requests, mcp__github__get_pull_request,
       mcp__github__create_review, mcp__github__add_pull_request_review_comment
permissionMode: default
---

You are a GitHub PR reviewer. You review code changes and post structured
feedback directly to the pull request.

When invoked with a PR number:
1. Fetch the PR details and diff
2. Review for: correctness, security, test coverage, style
3. Post inline comments for specific issues
4. Submit a review with verdict: Approve / Request Changes / Comment

Skill + MCP combination

Skills work well as MCP orchestrators — they define the workflow, agents provide the expertise:

yaml

---
name: weekly-metrics
description: Pulls this week's key metrics from our database and generates
  an executive summary. Run every Monday. Invoke with /weekly-metrics.
user-invocable: true
allowed-tools: Read, Write, mcp__supabase__execute_sql
---

# Weekly Metrics Skill

Generate the weekly metrics report from our Supabase database.

## Data Queries
Run these queries and save results to /work/metrics-raw.json:

1. New signups this week:
   SELECT COUNT(*) FROM users WHERE created_at >= NOW() - INTERVAL '7 days'

2. Active users (any event in 7 days):
   SELECT COUNT(DISTINCT user_id) FROM events
   WHERE timestamp >= NOW() - INTERVAL '7 days'

3. Revenue this week:
   SELECT SUM(amount) FROM transactions
   WHERE created_at >= NOW() - INTERVAL '7 days' AND status = 'completed'

## After collecting data
Invoke @analyst agent to generate the executive summary from the raw data.
Save the summary to /output/weekly-metrics-{date}.md

12.4 Real Integration Examples

GitHub — code review workflow

json

## MCP Setup
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" }
    }
  }
}

## Usage
"Review all open PRs in the anthropic/claude-code repository
that are more than 2 days old and haven't been reviewed yet."

Supabase — data analysis agent

yaml

---
name: data-analyst
description: Queries the Supabase database and produces structured analysis.
  Use for any reporting, metrics, or data investigation task.
model: opus
tools: Read, Write, mcp__supabase__execute_sql, mcp__supabase__list_tables
permissionMode: default
---

You are a data analyst with direct database access.

Before writing any query:
1. List available tables to understand the schema
2. Write safe, read-only SELECT queries
3. Never use DELETE, UPDATE, DROP, or INSERT without explicit user permission

Format your output as: findings in prose + supporting tables + raw SQL used

Google Drive — document pipeline

yaml

---
name: doc-publisher
description: Exports finished articles to Google Drive in the Publications
  folder. Use after /format-article completes and the article is approved.
model: haiku
tools: Read, mcp__gdrive__create_file, mcp__gdrive__move_file
permissionMode: default
---

When given an article path:
1. Read the article from /output/
2. Create a new Google Doc in the "Publications/Drafts" folder
3. Report the document URL to the user

Chapter 13

Plugins

13.1 What Plugins Are

A plugin is a portable bundle of agents, skills, hooks, and commands that can be installed once and used across all your projects. Where a project's .claude/ directory is local to one codebase, a plugin is global — available everywhere.

Two reasons to create a plugin:

Personal reuse: You've built something excellent (a code reviewer, a weekly report generator, a research assistant) and you want it in every project without copying files.
Distribution: You want to share your system with your team, open-source it, or publish it to the Claude plugin marketplace.

13.2 Creating a Plugin

Directory structure

bash

my-plugin/
├── .claude-plugin/
│   └── marketplace.json          ← plugin manifest (required)
├── agents/
│   ├── code-reviewer.md
│   └── doc-writer.md
├── skills/
│   └── pr-review/
│       └── SKILL.md
└── hooks/
    └── auto-lint.json

The manifest file

json

{
  "name": "my-dev-toolkit",
  "version": "1.2.0",
  "description": "Code review, documentation, and PR workflow tools for development teams",
  "author": "Your Name",
  "license": "MIT",
  "repository": "https://github.com/you/my-dev-toolkit",
  "components": {
    "agents": ["agents/code-reviewer.md", "agents/doc-writer.md"],
    "skills": ["skills/pr-review/SKILL.md"],
    "hooks": ["hooks/auto-lint.json"]
  },
  "permissions": {
    "tools": ["Read", "Grep", "Bash"],
    "network": false
  }
}

What to include vs. exclude

Include	Exclude
Agents and skills with broad applicability	Project-specific agents (they reference your codebase structure)
Hooks for general automation (lint, notify)	Hardcoded file paths or project-specific rules
Supporting files (templates, checklists)	Credentials or environment-specific configuration
A README explaining how to use each component	Files from your `.claude/agent-memory/`

13.3 Installing and Managing Plugins

Install from a URL

bash

# Install from GitHub
/plugin install https://github.com/you/my-dev-toolkit

# Install from the marketplace
/plugin marketplace add my-dev-toolkit

# List installed plugins
/plugin list

# Update a plugin
/plugin update my-dev-toolkit

# Remove a plugin
/plugin remove my-dev-toolkit

Using installed plugin components

Plugin agents appear in the @ typeahead as plugin-name:agent-name:

bash

@my-dev-toolkit:code-reviewer please review this file

Plugin skills appear as slash commands prefixed with the plugin name:

bash

/my-dev-toolkit:pr-review

⚠️

Security review before installing. Plugin hooks run shell commands on your machine. Before installing any plugin — especially from unknown sources — read the manifest and all hook configurations. A malicious hook could exfiltrate files or execute arbitrary code.

Chapter 14

Best Practices

14.1 The QA Layer: Self-Auditing Systems

In production agentic systems, the QA Layer is a three-role cycle that runs automatically after each completed step. It's external to the creative process: it observes, measures, and proposes — but never modifies or makes decisions on behalf of the user.

Three QA roles

Agent	Role	Can	Cannot
Auditor (`age-spe-auditor`)	Verifies rule compliance	Read files, report compliance	Modify anything, suggest improvements
Evaluator (`age-spe-evaluator`)	Scores phase quality	Calculate scores, write to qa-report.md	Modify entities, issue qualitative judgements
Optimizer (`age-spe-optimizer`)	Proposes improvements	Detect patterns, propose changes	Apply changes automatically

Scoring rubric

The Evaluator scores each phase on four weighted dimensions:

Dimension	Weight	What it measures
Completeness	30%	Are all required elements present and fully formed?
Quality	30%	Specificity and concreteness of the output
Compliance	25%	Adherence to active rules
Efficiency	15%	Number of iterations/regenerations needed

Scores: Excellent (≥9.0) | Good (7.0-8.9) | Improvable (5.0-6.9) | Critical (<5.0)

The QA cycle in practice

After each approved step, the cycle fires automatically:

Auditor reads rules from disk and checks compliance → appends Audit Report to qa-report.md
Evaluator scores the phase → appends Score block to qa-report.md
At process close: Optimizer analyzes all audit/score blocks → proposes prioritized improvements

The qa-report.md file is append-only — it is never overwritten, creating a complete audit trail.

Example: Auditor agent

yaml

---
name: age-spe-auditor
description: Audits phase outputs for rule compliance by reading rules
  from disk at audit time. Use after each approved checkpoint to verify
  the output follows all active constraints.
model: opus
tools: Read, Grep, Glob
permissionMode: plan
---

You are a compliance auditor. Read each active rule file from
.claude/rules/ at audit time — never rely on cached versions.

For each rule, verify compliance and report:
- ✅ Compliant (with supporting evidence)
- ⚠️ Partially compliant (what's missing)
- ❌ Non-compliant (specific violation)

Append your report to qa-report.md. Never modify any other file.

Quality gates in CLAUDE.md

markdown

## Quality Gate Rules
After each approved checkpoint:
1. Invoke @age-spe-auditor — reads rules from disk, appends audit to qa-report.md
2. Invoke @age-spe-evaluator — scores the phase, appends score block
3. If score < 5.0 (Critical): warn the user before proceeding
4. At process close: invoke @age-spe-optimizer for improvement proposals

Never skip the quality gate for outputs going to /output/ (published content).

Iterating on generated systems

Once a system is in production, you can evolve it without starting from scratch using three iteration modes:

Mode	When	What happens
PATCH	Fix or update specific entities	Entity builder edits in place → patch version bump
REFACTOR	Reorganize architecture	Architecture designer produces delta blueprint → minor bump
EVOLVE	Add new capabilities	Mini-discovery → architecture → implementation → minor/major bump

Each iteration creates a branch (e.g., iter/0.2.0-add-email-skill). When ready, merge to main and tag the version.

14.2 Multi-Model Strategies

Assigning the right model to each agent is one of the highest-leverage decisions in system design. The difference between Haiku and Opus is 10-20x in cost and 3-5x in latency — for the right task, both ends of the spectrum are correct.

Model	Cost	Best agent types	Avoid for
Haiku 4.5	Lowest	Classifiers, routers, format validators, simple extractors	Complex reasoning, nuanced writing, architectural decisions
Sonnet 4.6	Medium	Writers, editors, code generators, reviewers, most implementation work	Tasks requiring deep multi-step reasoning across large codebases
Opus 4.6	Highest	Architects, complex researchers, orchestrators for difficult decisions, QA auditors	High-volume routine tasks where cost compounds

The content pipeline example, optimized

Agent	Model	Reasoning
Classifier (routes requests)	Haiku	Binary classification — doesn't need reasoning depth
Researcher	Opus	Source evaluation requires deep judgment; errors compound downstream
Writer	Sonnet	Good writing within clear constraints; fast iteration
Reviewer	Sonnet	Checklist evaluation — structured, not creative
QA Auditor	Opus	Final gate before publication — highest stakes, justify the cost
Formatter	Haiku	Mechanical formatting — no judgment required

14.3 The Golden Rules

These rules emerge consistently from production systems. Violate them early, and you'll spend hours debugging problems that didn't need to exist.

#	Rule	What breaks when you ignore it
1	One responsibility per agent. If you can't name it in two words, it does too much.	Agents become unpredictable. Failures are hard to locate. Context windows fill with unrelated work.
2	Description is a routing rule, not a label. Write trigger conditions, not agent biography.	Wrong agents get invoked. Auto-delegation fails. You end up explicitly mentioning agents for everything.
3	CLAUDE.md stays lean. If it's workflow-specific, it's a Skill.	CLAUDE.md balloons. Context fills every session. Agents load irrelevant instructions.
4	Rules need alternatives. "Never X" → "Instead of X, do Y".	Agents know what not to do but not what to do instead. They find workarounds or fail silently.
5	Test the simplest version first. One agent, one task, end to end. Then add.	You build a 12-agent system, something breaks, and you can't tell where.
6	Commit your .claude/ directory. Always.	A working system isn't reproducible. Colleagues can't run it. You can't roll back.
7	Explicit file paths in spawn prompts. Don't assume agents know where to look.	Context gets lost between stages. Agents read the wrong files or write to unexpected locations.

14.4 Common Anti-Patterns

The God Agent

One agent that does research, writing, editing, formatting, and publishing. The description is a paragraph. The instructions are 4,000 words. It works sometimes and fails inconsistently.

Fix: Apply the decision tree from Chapter 4. Every distinct responsibility becomes its own agent.

The Prompt Novel

CLAUDE.md is 80KB of detailed instructions, edge cases, examples, and backstory. Every session loads all of it. Context is 40% consumed before the user types a word.

Fix: Apply the golden rule: if it's workflow-specific, it's a Skill. CLAUDE.md should be scannable in 30 seconds.

The Invisible Rule

A critical constraint is buried in paragraph 7 of a 12-paragraph system prompt. Agents read it once, follow it 60% of the time, and violate it silently the other 40%.

Fix: Rules go in a clearly labeled ## Rules section with bullet points. One rule per bullet. Put the most important rules first.

The Over-Engineered System

A 3-step workflow with 12 agents, 8 skills, 15 rules, and 6 hooks. Built in a weekend. Debugged for a month.

Fix: Start with the minimum that works. Add an agent or skill only when a specific, recurring problem requires it. Complexity is easy to add; hard to remove.

The Silent Context Loss

Agent B gets invoked after Agent A, but the orchestrator doesn't tell B what A produced. B starts from scratch, duplicates work, or makes different assumptions.

Fix: Establish a file handoff convention. Every agent writes its output to a predictable path. Every spawn prompt explicitly references that path.

The Vague Description

Two agents with overlapping descriptions: "handles writing tasks" and "creates content". Claude hesitates between them, picks arbitrarily, or asks you every time.

Fix: Each description must answer: in what specific situation should this agent be invoked? Front-load with trigger phrases. No overlap between agents.

14.5 Debugging Agentic Flows

When something goes wrong in a multi-agent flow, the debugging approach is systematic — not exploratory.

Step 1: Identify where in the flow it broke

Check the intermediate files. If /work/research-topic.md exists and looks right but /work/draft-topic.md is wrong, the problem is in the writer, not the researcher.

Step 2: Check what context the agent received

Add a temporary rule to CLAUDE.md:

markdown

## Debugging Rule (temporary — remove after fixing)
Each agent, at the very start of its task, must output:
"CONTEXT RECEIVED: [paste first 200 characters of your spawn prompt here]"

Step 3: Use Plan Mode before executing

Run /plan before triggering the workflow. Claude will show its complete delegation plan — which agent it intends to invoke, with what context, in what order. Catch problems here, before any files are touched.

Step 4: Run /doctor

bash

/doctor

Checks your Claude Code setup for common configuration problems: missing frontmatter, invalid tool names, unreachable MCP servers, malformed settings.json.

Step 5: Isolate and test a single agent

Invoke the suspect agent explicitly with a known-good input:

bash

@writer I'm going to give you a test research brief. Read it carefully and write a draft.

Research: [paste brief directly here]

Write the draft to /work/test-draft.md

If the agent works correctly in isolation, the problem is context transfer. If it fails in isolation, the problem is the agent's instructions.

Logging agent outputs to files

Add a rule to CLAUDE.md that persists agent reasoning:

markdown

## Logging Rules
Every agent must append a summary of what it did to /logs/agent-log.md:

Format:
---
[timestamp] @agent-name
Input received: [one line summary]
Action taken: [one line summary]
Output written to: [file path]
Issues encountered: [none / description]
---

14.6 Non-Coding Use Cases

Claude Code is marketed as a coding tool, but the agent runtime it provides is general-purpose. The file system is your workspace; the agents are your specialists; the skills are your procedures. None of that requires code.

Domain	Example system	Agents involved
Content production	Article pipeline: research → draft → review → publish	Researcher, writer, editor, SEO analyst, formatter
Market research	Competitive intelligence: monitor → analyze → report	Competitor tracker, analyst, report writer
Project management	Weekly digest: gather status → synthesize → distribute	Status collector, summarizer, distributor
HR / recruiting	CV screening: parse → score → shortlist → notify	CV parser, scorer, shortlister, email drafter
Finance	Expense review: categorize → flag anomalies → report	Classifier, anomaly detector, report generator
Legal / compliance	Contract review: extract clauses → check against policy → flag	Clause extractor, policy checker, risk reporter
Customer success	Feedback triage: classify → route → draft response	Classifier, router, response drafter

The pattern is identical in every case: decompose the process into stages, assign one agent per stage, design the file handoffs, encode the orchestration in CLAUDE.md. The domain changes; the architecture doesn't.

Chapter 15

From Here

15.1 Iterating on Your Systems

Every system you build will need refinement. The patterns that help:

Version control your .claude/ directory

Commit after every meaningful change. Your agent files are configuration code — they deserve the same discipline as application code. A working system that can't be reproduced or rolled back isn't a system; it's luck.

bash

git add .claude/ CLAUDE.md
git commit -m "refine: tighten researcher description to prevent overlap with writer"

Keep a changelog for your agent system

Add a AGENTS-CHANGELOG.md at your project root. When something changes and why:

markdown

# Agent System Changelog

## 2026-04-10
- researcher: added explicit instruction to flag conflicting sources
- writer: tightened sentence length rule (was 30 words, now 25)
- REASON: reviewer consistently flagged complex sentences; upstream fix is cleaner

## 2026-04-03
- Added qa-auditor agent as final gate before /output/
- REASON: two articles published with uncited claims; gate catches this now

## 2026-03-28
- Moved style guide from CLAUDE.md to .claude/knowledge/style-guide.md
- REASON: CLAUDE.md was 8KB; style guide is only relevant for writer agent

Start simple, measure, improve

The right sequence: one agent working correctly → two agents with file handoff → add reviewer → add quality gate → add hooks. Each addition should solve a concrete, observed problem — not a theoretical one.

15.2 Team Adoption

Sharing via Git

Because your system is files, sharing is a pull request. Commit your .claude/ directory and CLAUDE.md. Teammates clone the repo and open it in their Code tab — they immediately have the same agents, skills, and rules.

Project vs. user scope for teams

Project scope (.claude/agents/) — agents specific to this codebase. In Git. Everyone on the team gets them.
User scope (~/.claude/agents/) — personal productivity agents. Not in Git. Yours alone.

Onboarding new team members

Add a section to your project's README:

markdown

## AI-Assisted Workflows

This project includes Claude Code agent configurations in `.claude/`.

To use them:
1. Install Claude Desktop and enable the Code tab
2. Open this project folder in the Code tab
3. The agents and skills are auto-discovered

Available agents:
- @code-reviewer — review any file or diff
- @doc-writer — generate documentation from code
- @test-writer — generate tests from implementation

Type /help inside Claude Code to see all available skills.

Managed subagents (organization-level)

Enterprise Claude accounts can configure organization-level agents that are available to all users without needing to be in each project's .claude/ directory. Contact your account manager or check code.claude.com/docs for current managed agent capabilities.

15.3 Resources

Resource	What you'll find there
code.claude.com/docs	Official Claude Code documentation — authoritative reference for features, commands, and configuration
docs.anthropic.com	Anthropic's full documentation including API reference, model specs, and prompt engineering guides
AiAgentArchitect	An open-source framework for automated agentic system design — generates `.claude/` configurations from workflow descriptions. The architectural concepts in this guide draw from its approach.
Claude Code GitHub discussions	Community patterns, tips, and troubleshooting from practitioners building real systems

15.4 The Mindset Shift

This guide began with a simple observation: most people use AI at Level 1 or 2. They ask questions and get answers. They write prompts and get responses. They're good at it — but they're still in conversation mode.

The shift this guide has been building toward is architectural. You're no longer asking "what should I prompt?" You're asking "what system should I design?" The AI isn't your counterpart in a conversation — it's your workforce, structured and deployed.

The architect's role, as you've seen across these chapters, is:

Decompose — break any process into stages with clear responsibilities
Delegate — assign each stage to the right specialist with the right tools
Review — build quality gates, not just execution paths

The systems you build will outlast the conversations that inspired them. A well-designed agent, committed to Git, onboarded to your team, running reliably in production — that's a different category of work than a good prompt.

Start with the minimum working system. Commit it. Run it. Observe what breaks. Fix exactly that. Repeat.

What's next?

Start building your agentic system

AiAgentArchitect transforms a single conversation into a complete, deployable multi-agent system. Explore the framework or get in touch.

Explore AiAgentArchitect Get in Touch

← Previous Part 3 — Orchestration Next → Examples

Part 4 — Advanced

MCP Servers

12.1 What MCP Is

12.2 Adding MCP Servers

Via CLI (one-time setup)

Via settings.json (project-persistent)

12.3 MCP in Agentic Flows

Giving agents MCP tool access

Skill + MCP combination

12.4 Real Integration Examples

GitHub — code review workflow

Supabase — data analysis agent

Google Drive — document pipeline

Plugins

13.1 What Plugins Are

13.2 Creating a Plugin

Directory structure

The manifest file

What to include vs. exclude

13.3 Installing and Managing Plugins

Install from a URL

Using installed plugin components

Best Practices

14.1 The QA Layer: Self-Auditing Systems

Three QA roles

Scoring rubric

The QA cycle in practice

Example: Auditor agent

Quality gates in CLAUDE.md

Iterating on generated systems

14.2 Multi-Model Strategies

The content pipeline example, optimized

14.3 The Golden Rules

14.4 Common Anti-Patterns

The God Agent

The Prompt Novel

The Invisible Rule

The Over-Engineered System

The Silent Context Loss

The Vague Description

14.5 Debugging Agentic Flows

Step 1: Identify where in the flow it broke

Step 2: Check what context the agent received

Step 3: Use Plan Mode before executing

Step 4: Run /doctor

Step 5: Isolate and test a single agent

Logging agent outputs to files

14.6 Non-Coding Use Cases

From Here

15.1 Iterating on Your Systems

Version control your .claude/ directory

Keep a changelog for your agent system

Start simple, measure, improve

15.2 Team Adoption

Sharing via Git

Project vs. user scope for teams

Onboarding new team members

Managed subagents (organization-level)

15.3 Resources

15.4 The Mindset Shift

Start building your agentic system