How 8 agents are orchestrated: agent definitions, batch algorithm, intermediate files, merge script, two-level architecture.
Each of the 8 agents is defined by its own Markdown fileThe files in agents/ follow a uniform schema: YAML frontmatter (name, description, model: inherit), then role description, task definition, and output format specification. in the agents/ folder. The frontmatter sets name, description, and model. The body describes the task in two phases: script execution + LLM semantic analysis.
# agents/project-scanner.md — Header Analysis
---
name: project-scanner
description: |
Scans a codebase directory to produce a structured
inventory of all project files, detected languages,
frameworks, import maps, and estimated complexity.
model: inherit
---
# Role description
You are a meticulous project inventory specialist.
Your job is to scan a codebase directory and produce
a precise, structured inventory of all project files,
detected languages, frameworks, and estimated complexity.
Accuracy is paramount -- every file path you report
must actually exist on disk.
name: The slug is used internally for dispatch and logging. No spaces, no special characters.
description: Multi-line YAML (with |). Describes precisely what the agent does -- not what it COULD do.
model: inherit: The agent uses the same model as the main context. No fallback to a cheaper model.
Role description: The first paragraph defines identity. "Meticulous" and "Accuracy is paramount" are targeted prompt engineering techniques that increase output precision.
All 8 agents follow the two-phase pattern: first a deterministic discovery scriptA Node.js or Python script that the agent writes and executes. It extracts structural data (file paths, functions, imports) that is reproducible. The LLM phase builds on top of this., then LLM semantic analysis:
# The 8 agents and their tasks
| Agent | Phase | Task |
| project-scanner | 1 | File inventory, languages, |
| | | frameworks, import map |
| file-analyzer | 2 | Nodes + edges per file batch |
| | | (5 parallel, 20-30 files/batch) |
| assemble-reviewer | 3 | Graph validation after merge |
| architecture-analyzer | 4 | Layer assignment (3-10 layers) |
| tour-builder | 5 | Learning path through the graph |
| graph-reviewer | 6 | Final graph validation |
| domain-analyzer | - | Domain-specific extension |
| knowledge-graph-guide | - | Dashboard configuration |
# Each agent has two phases:
Phase 1: Write script + execute (deterministic)
Phase 2: LLM analysis on script output (semantic)
file-analyzer is the only agent launched MULTIPLE times in parallel (up to 5 instances). Each instance receives a different batch of 20-30 files. Batches NEVER overlap -- partitioning is deterministic.
The batch algorithmPartitions the file list from Phase 1 into groups of 20-30 files. Related files (e.g. Dockerfile + docker-compose.yml) are kept in the same batch. partitions the file list from the project scanner into analyzable units. Goal: each file-analyzer agent receives enough context to detect meaningful edges, but not so much that the context buffer overflows.
# Batch Construction: Pseudocode
FUNCTION create_batches(files, import_map):
# 1. Group files by fileCategory
groups = group_by(files, f.fileCategory)
# categories: code, config, docs, infra,
# data, script, markup
# 2. Identify co-located files
co_locate_pairs = [
("Dockerfile", "docker-compose.*"),
("package.json", "tsconfig.json"),
("*.prisma", "*.sql"),
("Makefile", "*.sh"),
]
# 3. Fill batches
batches = []
current_batch = []
FOR EACH file IN files (sorted by path):
current_batch.append(file)
# Add co-located partners
FOR EACH partner IN co_locate_pairs:
IF file matches partner[0]:
add_matching(partner[1], current_batch)
# Batch full? (20-30 files)
IF len(current_batch) >= 25:
batches.append(current_batch)
current_batch = []
# Append remaining batch
IF current_batch:
batches.append(current_batch)
RETURN batches
fileCategory grouping: Files are classified by type: code, config, docs, infra, data, script, markup. Related files like Dockerfile + docker-compose.yml land in the same batch so the agent can detect their relationship.
Batch size 20-30: A tradeoff. Fewer files per batch = better analysis quality per file, but more agent calls. More files = fewer calls, but context buffer can overflow and analysis quality drops.
25 as default: Empirical value. For a typical project with 200 files, this yields 8 batches processed in 2 rounds with 5 parallel agents.
Each batch receives its own batchImportDataA JSON object containing pre-resolved import paths for each file in the batch. The file-analyzer does NOT resolve imports itself -- the project scanner already did that. -- the pre-resolved import data from Phase 1:
# batchImportData: Pre-resolved imports per file
{
"batchImportData": {
"src/index.ts": [
"src/utils.ts",
"src/config.ts",
"src/routes/api.ts"
],
"src/utils.ts": [
"src/types.ts"
],
"README.md": [],
"Dockerfile": []
}
}
# IMPORTANT: The file-analyzer MUST NOT
# resolve imports itself. It uses ONLY these
# pre-resolved data for import edges.
# Reason: The project-scanner has a global
# view of all files. The file-analyzer sees
# only its batch of 20-30 files.
All agents communicate via intermediate filesJSON files in .claude-learning/intermediate/. Each agent writes its result as a JSON file. The next agent in the pipeline reads this file as input. There is no direct agent-to-agent communication. in the directory .claude-learning/intermediate/. This directory is created in Phase 0 and contains all intermediate results after completion.
# Intermediate directory after complete pipeline
.claude-learning/intermediate/
scan-result.json # Phase 1: project-scanner
batch-1.json # Phase 2: file-analyzer #1
batch-2.json # Phase 2: file-analyzer #2
batch-3.json # Phase 2: file-analyzer #3
...
batch-N.json # Phase 2: file-analyzer #N
assembled-graph.json # Phase 2->3: merge script
assemble-review.json # Phase 3: assemble-reviewer
layers.json # Phase 4: architecture-analyzer
tour.json # Phase 5: tour-builder
graph-review.json # Phase 6: graph-reviewer
scan-result.json: Contains the complete file list with paths, languages, fileCategory, line counts, and the import map. Input for the batch algorithm.
batch-*.json: Each file-analyzer writes exactly one file. It contains nodes (file, function, class, etc.) and edges (imports, calls, contains, etc.) for the files in that batch.
assembled-graph.json: The result of the merge-batch-graphs.py script. All batches combined into one graph, nodes normalized, edges deduplicated.
layers.json: The architecture layers with assignment of each file to exactly one layer.
tour.json: The ordered learning path through the most important nodes of the graph.
The JSON structures follow a uniform schema. Here are the most important ones:
# batch-*.json Schema (output of one file-analyzer)
{
"nodes": [
{
"id": "file:src/index.ts",
"type": "file",
"name": "index.ts",
"filePath": "src/index.ts",
"summary": "Application entry point...",
"tags": ["entry-point", "server"],
"complexity": "moderate",
"lineCount": 150
},
{
"id": "function:src/index.ts:main",
"type": "function",
"name": "main",
"filePath": "src/index.ts",
"startLine": 10,
"endLine": 45
}
],
"edges": [
{
"source": "file:src/index.ts",
"target": "file:src/utils.ts",
"type": "imports"
},
{
"source": "file:src/index.ts",
"target": "function:src/index.ts:main",
"type": "contains"
}
]
}
[type]:[filePath]:[name]. For files, the name part is omitted: file:src/index.ts. For functions, the name is required: function:src/index.ts:main. If an agent forgets the type prefix or doubles it (e.g. file:file:src/index.ts), the merge-batch-graphs.py script corrects this automatically.
batch-existing.json and run through the merge again.
The merge-batch-graphs.pyA Python script bundled with the skill. It reads all batch-*.json from the intermediate/ folder and produces assembled-graph.json. The most critical step: node ID normalization. script is the central normalization step between Phase 2 and Phase 3. It reads all batch files and produces a single, consistent graph.
# merge-batch-graphs.py — Algorithm Walkthrough
python <SKILL_DIR>/merge-batch-graphs.py $PROJECT_ROOT
# The script performs 6 steps in ONE pass:
Step 1: Load all batch-*.json
FOR EACH file IN glob("intermediate/batch-*.json"):
data = json.load(file)
all_nodes.extend(data["nodes"])
all_edges.extend(data["edges"])
Step 2: Node ID normalization
FOR EACH node IN all_nodes:
# Strip double prefixes
"file:file:src/x.ts" -> "file:src/x.ts"
# Strip project name prefixes
"file:myapp/src/x.ts" -> "file:src/x.ts"
# Add missing prefixes
"src/x.ts" -> "file:src/x.ts"
# Store mapping: old_id -> new_id
Step 3: Complexity normalization
normalize_map = {
"low": "simple",
"medium": "moderate",
"high": "complex",
"very high": "complex",
}
FOR EACH node: node.complexity =
normalize_map.get(node.complexity, node.complexity)
Step 4: Edge rewrites
FOR EACH edge IN all_edges:
edge.source = id_mapping[edge.source]
edge.target = id_mapping[edge.target]
Step 5: Deduplication
# Nodes: by ID (last occurrence wins)
nodes_by_id = {}
FOR EACH node: nodes_by_id[node.id] = node
# Edges: by (source, target, type)
edge_keys = set()
unique_edges = []
FOR EACH edge:
key = (edge.source, edge.target, edge.type)
IF key NOT IN edge_keys:
unique_edges.append(edge)
edge_keys.add(key)
Step 6: Drop dangling edges
valid_ids = set(nodes_by_id.keys())
FOR EACH edge IN unique_edges:
IF edge.source NOT IN valid_ids
OR edge.target NOT IN valid_ids:
log_warning + DROP
Step 1: Simple loading of all batch files. Order does not matter -- deduplication in step 5 handles conflicts.
Step 2 (most critical): LLM-generated node IDs are inconsistent. Some agents double the type prefix ("file:file:"), others forget it entirely ("src/x.ts"), others insert the project name ("file:myapp/src/x.ts"). All three error types are corrected.
Step 3: Different file-analyzer instances use different words for complexity: "low" vs "simple", "medium" vs "moderate". Normalization ensures only 3 values exist: simple, moderate, complex.
Step 4: After node IDs are corrected, all edges must be updated. An edge pointing to the old ID "file:file:src/x.ts" must now point to "file:src/x.ts".
Step 5: Nodes with identical IDs are deduplicated (last occurrence wins). Edges with identical (source, target, type) tuples are deduplicated.
Step 6: Edges pointing to non-existent nodes (because e.g. a file was deleted or a node ID error was uncorrectable) are logged and removed.
All corrections and drops are logged to stderr. This output is passed to the reviewer in the assemble-review phase:
# Typical stderr output of the merge script
[NORMALIZE] file:file:src/auth.ts
-> file:src/auth.ts (double prefix)
[NORMALIZE] src/utils.ts
-> file:src/utils.ts (missing prefix)
[NORMALIZE] file:myapp/src/db.ts
-> file:src/db.ts (project prefix)
[COMPLEXITY] file:src/auth.ts
low -> simple
[DEDUP-NODE] file:src/index.ts
kept batch-3, dropped batch-1
[DEDUP-EDGE] file:src/index.ts -> file:src/utils.ts
type: imports (duplicate)
[DANGLING] file:src/old-file.ts -> file:src/auth.ts
source not found — DROPPED
Summary: 142 nodes, 287 edges
6 normalizations, 3 dedup-nodes,
12 dedup-edges, 1 dangling dropped
file:my-project/src/x.ts), the project name is removed. Detection is based on whether the first path segment matches the project name from scan-result.json.
The two-level architectureLevel 1: Pipeline agents (1 per audience, controls the entire depth path). Level 2: File agents (1 per HTML file, launched by the pipeline agent). Levels are hierarchical: pipeline agent starts and waits for file agents. separates orchestration (level 1) from execution (level 2). A pipeline agent knows WHAT to build, a file agent knows HOW a page is built.
# Level 1: Pipeline Agent (1 per audience)
# Controls the complete depth path
Pipeline Agent Developer:
max_level = L3
hs_threshold = 6
hs_deeper = 8
Step 1: Build L0
-> Start file agent: index_dev_de.html
-> Start file agent: index_dev_en.html
-> WAIT until both complete
Step 2: Build L1 (ONLY topics with HS >= 6)
-> Start file agent: l1/skill-architektur_dev_de.html
-> Start file agent: l1/skill-architecture_dev_en.html
-> Start file agent: l1/agenten-pipeline_dev_de.html
-> Start file agent: l1/agent-pipeline_dev_en.html
-> ... (all L1 topics x 2 languages)
-> WAIT until ALL complete
Step 3: Build L2 (ONLY topics with HS >= 8)
-> # Fewer topics qualify
-> Start file agents for qualified topics
-> WAIT until ALL complete
Step 4: Build L3 (ONLY topics with HS >= 8)
-> Start file agents
-> WAIT until ALL complete
Step 5: Return pipeline summary
-> Created files: [...]
-> Skipped topics: [...]
-> Stop reasons: [...]
Why Level 1? Without pipeline agents, the main orchestrator would have to dispatch every single file and manually ensure level ordering. Pipeline agents encapsulate this logic.
WAIT semantics: The pipeline agent actively waits (poll loop) until all file agents of a level have completed. Only then does the next level start. This prevents L2 pages from linking to L1 pages that do not yet exist.
HS filter: The pipeline agent checks before each file agent start whether the topic's HS reaches the threshold. Topics below the threshold are NOT started as file agents.
Pipeline summary: At the end, the pipeline agent reports back what was built, what was skipped, and why. This data flows into the depth map (Phase 6).
# Level 2: File Agent (1 per HTML file)
# Builds a single HTML file
File Agent Input:
{
"filename": "l1/skill-architektur_dev_de.html",
"audience": "Developer",
"language": "DE",
"level": "L1",
"topic": "Skill Architecture",
"integration": "Embedded",
"hs": 9,
"breadcrumb": [
{"label":"L0","href":"../index_dev_de.html"},
{"label":"L1 Skill Architecture","href":null}
],
"deep_dive_links": [
"../l2/phasen-orchestrierung_dev_de.html"
],
"siblings": [
"agenten-pipeline_dev_de.html",
"kg-pipeline_dev_de.html",
"kurs-pipeline_dev_de.html"
],
"language_counterpart": "skill-architecture_dev_en.html",
"css_foundation": "[INLINE CSS]"
}
File Agent Output:
-> Writes the complete HTML file to disk
-> Returns file path + file size
-> Reports errors if quality gate fails