How the 8 agents collaborate: dispatch pattern, batch strategy, intermediate files, context injection, and normalization.
The main agent (controlled by skill.md) acts as a dispatcher. It reads the phase definition, determines which agent is needed, and launches it as a sub-agentAn isolated Claude instance, started via the Task tool. It receives its own prompt and works in a limited context — only the files and data it needs. via Claude Code's Task tool.
# Dispatch flow (pseudocode)
def dispatch_agent(agent_name, batch_data):
# 1. Load prompt template
template = read(f"agents/{agent_name}.md")
# 2. Inject context
prompt = template.replace(
"{{FILES}}", batch_data.files,
"{{README}}", project.readme,
"{{IMPORT_MAP}}", project.import_map,
"{{LANG_ADDENDA}}", get_addenda(batch_data)
)
# 3. Start sub-agent (Task tool)
result = task(
prompt=prompt,
description=f"{agent_name} batch {batch_data.id}"
)
# 4. Parse result as JSON
return parse_json(result)
# Dispatch for Phase 2: file-analyzer x5
batches = split(inventory, size=25)
results = parallel(
dispatch_agent("file-analyzer", b) for b in batches
)
Step 1: The prompt template is loaded from agents/file-analyzer.md. It contains instructions for which nodes and edges the agent should extract.
Step 2: Placeholders in the template are replaced with real data: the batch's files, README, import map, and language-specific hints.
Step 3: Claude Code starts a new Claude instance (Task tool) with the prepared prompt. The sub-agent works in isolation.
Step 4: The result (JSON with nodes + edges) is parsed and written to the intermediate file.
The dispatch pattern isolates each agent from the overall context. A file-analyzer only sees its 25 files, not the entire codebase. This saves context tokens and enables parallelization.
The codebase is split into batches because a single Claude context cannot analyze hundreds of files simultaneously. The batch strategy optimizes the trade-off between context quality and parallelization:
# Batch configuration in skill.md
batch_config:
target_size: 25 # files per batch
min_size: 10 # minimum (after retry split)
max_parallel: 5 # simultaneous sub-agents
sort_by: "directory" # related files in same batch
import_map_injection:
enabled: true
scope: "cross_batch" # imports of ALL files, not just batch
purpose: "Enables cross-batch edges"
Batch sorting: Files are sorted by directory so related modules (e.g., all files in src/auth/) end up in the same batch. This improves edge detection quality.
Import map injection: The key element. The import map contains ALL import relationships across the entire codebase. Each batch receives this map so files in different batches can still produce correct edges to each other.
Without import map injection, cross-batch relationships would be lost. File A in Batch 1 imports File B in Batch 3 — without the map, Batch 1 wouldn't know B exists. The map solves this elegantly.
Agents don't communicate directly — they communicate through the file system. Each agent writes its results as JSON to the intermediate/ folder. The next agent reads these files as input.
# Intermediate folder structure
.claude-learning/
intermediate/
manifest.json # P1: project-scanner
import-map.json # P1: project-scanner
dir-tree.txt # P1: project-scanner
batch-1.json # P2: file-analyzer
batch-2.json # P2: file-analyzer
batch-3.json # P2: file-analyzer
assembled-graph.json # P3: assemble-reviewer
architecture.json # P4: architecture-analyzer
tour.json # P5: tour-builder
review-log.json # P6: review-validator
# Data flow:
# P1 output → P2 input (manifest, import-map)
# P2 output → P3 input (batch-*.json)
# P3 output → P4 input (assembled-graph.json)
# P4 output → P5 input (architecture.json)
# P5 output → P6 input (tour.json)
Why files instead of context? Claude sub-agents have isolated contexts. They cannot share variables. The only way to transfer data between agents is the file system.
Advantage: Intermediate files are inspectable. When something goes wrong, you can open the JSON files and see what each agent produced. This makes debugging dramatically easier.
Cleanup: After P7 (Save), intermediate files are optionally deleted. The final Knowledge Graph contains all relevant data.
This pattern is inspired by classic Unix pipelines: each process reads stdin, processes, writes stdout. Here it's JSON files instead of streams, but the principle is identical — loose coupling through defined interfaces.
Each agent receives shared context alongside its specific data, injected into its prompt template. Three files form the "shared context":
# agents/file-analyzer.md — Template excerpt
---
name: file-analyzer
role: Extract nodes and edges from source files
output: JSON with nodes[] and edges[]
---
## Injected Context
### Project README
{{README}}
### Directory Tree
{{DIR_TREE}}
### Import Map (cross-batch)
{{IMPORT_MAP}}
### Language Addenda
{{LANG_ADDENDA}} # e.g. languages/python.md
README: Gives the agent project context. What does the project do? Which technologies are used? Without README, the agent produces more generic descriptions.
Dir Tree: Shows the entire folder structure. Helps the agent recognize dependencies between directories.
Import Map: The map of all import relationships. Allows the agent to produce correct cross-batch edges.
Language Addenda: Language-specific instructions from the languages/ folder. For Python files, decorator patterns and __init__.py conventions are explained.
Context injection ensures that each agent — despite being isolated — has enough knowledge about the overall project. The art lies in the balance: too much context wastes tokens, too little context produces lower-quality results.
After parallel batch analysis, results must be merged. This is non-trivial: different batches may produce the same node under slightly different IDs. The merge processExecuted by the assemble-reviewer. Can optionally be supported by the Python script merge-batch-graphs.py, which produces more deterministic results than purely LLM-based merging. consists of three steps:
# merge-batch-graphs.py — Core logic
def normalize_node_id(node):
# Step 1: Deterministic ID generation
type_prefix = node["type"]
path = node["filePath"].replace("/", "-").replace(".", "-")
name = node["name"]
return f"{type_prefix}:{path}::{name}"
def deduplicate_nodes(all_nodes):
# Step 2: Merge duplicates
seen = {}
for node in all_nodes:
nid = normalize_node_id(node)
if nid in seen:
# Longer description wins
if len(node["description"]) > len(seen[nid]["description"]):
seen[nid]["description"] = node["description"]
else:
seen[nid] = node
return list(seen.values())
def clean_edges(edges, valid_node_ids):
# Step 3: Remove dangling edges
return [
e for e in edges
if e["source"] in valid_node_ids
and e["target"] in valid_node_ids
]
Step 1 — ID normalization: Node IDs are deterministically generated from type, file path, and name. This ensures identical nodes across batches receive the same ID, regardless of how the LLM named them.
Step 2 — Deduplication: Nodes with identical IDs are merged. On conflicts, the longer (presumably more informative) description wins.
Step 3 — Edge cleanup: Edges referencing non-existent nodes (e.g., because a batch failed) are removed. This ensures referential integrity.
Normalization is the fragile point of the pipeline. When two batches name the same node differently (e.g., "LoginForm" vs. "loginForm"), duplicates emerge. The Python script resolves this through case-insensitive comparisons and path-based heuristics — but edge cases are inevitable.
Dive deeper into sub-agent architecture