Bootstrap, analysis, curriculum derivation, and polish algorithm — the core phases with complete pseudocode, edge cases, and decision logic
Before the orchestrator starts any analysis, it needs to know where the source code lives. Phase 0 detects the source and ensures all mandatory questions are answered before a single token is spent on analysis.
Source detection — three paths:
HARD BLOCK — Mandatory Questions
After source detection, Phase 0 presents a block of mandatory questions. The orchestrator must not proceed to Phase 1 until all are answered. Missing answers do not generate defaults — they trigger re-prompting. This prevents the skill from building an entire course on false assumptions.
Step 1: The orchestrator checks the user input against three patterns: GitHub URL (starts with https://github.com/...), local file path (absolute or relative), or a phrase like “this project”.
Step 2: Depending on the pattern, it clones, resolves the path, or uses the current working directory. On failure (private repo, non-existent path, empty directory), Phase 0 aborts with a clear error message.
Step 3: Then come the mandatory questions. The three questions (language, integration mode, audiences) are asked in a loop. No question may be skipped. There are no default values. The loop repeats until a valid answer is provided.
Edge case: If a user provides a GitHub URL pointing to a private repo and no token is available, the clone fails. The error message includes a hint about the missing token — no silent failure.
Phase 1 reads the source code and builds a theme tree with complexity ratings. The approach is top-down: README and entry points first, then progressively deeper into the structure.
Analysis order:
Theme tree structure:
The theme tree is the central artifact of Phase 1. Each theme receives:
• Complexity 0: Trivial — mentioned in one sentence on the L0 overview. No dedicated module.
• Complexity 1: Needs one paragraph. Candidate for an L1 module, but no deeper.
• Complexity 2: Needs multiple sections. L1 + L2 candidate.
• Complexity 3: Needs its own page with diagrams, code examples, and edge cases. L1 + L2 + L3 candidate.
The rationale field documents why this complexity was chosen. This is critical for traceability — when the depth map later shows that a theme got no L3, you can navigate back to the reasoning.
Edge case: Monorepos
For monorepos with multiple packages, Phase 1 analyzes each package as a separate subtree. Themes are then grouped at the top level (e.g., “Frontend”, “Backend”, “Shared”). This prevents a monorepo from producing a flat, unstructured tree.
Phase 2 takes the theme tree from Phase 1 and creates separate curricula for each audience. Each audience receives only the topics and depths that are relevant to them.
Maximum depth per audience:
Curriculum derivation — algorithm:
The algorithm works as follows:
1. Each audience has a maximum depth. Executives: L1. Users: L2. Developers: L3. Planning never exceeds this.
2. For each theme, the Helpfulness Score (HS) is calculated — specific to this audience. The same theme has different scores per audience.
3. The HS is checked against thresholds. Each level has a threshold per audience. Only if the score meets the threshold and the maximum depth allows it, the level is planned.
4. Each topic gets a stop reason: Why was it not planned deeper? Three possible reasons: audience maximum depth reached, score below threshold, or the topic simply lacks substance for more.
Core rule: Never plan pages that won't be built. If executives max out at L1, no L2 is planned for them — even if the HS would theoretically be high enough.
Example: Topic “Authentication” — three curricula
| Audience | HS | Planned Levels | Stop Reason |
|---|---|---|---|
| 📊 Executives | 9 | L0, L1 | audience_max_level (max=L1) |
| 👤 Users | 7 | L0, L1, L2 | audience_max_level (max=L2) |
| 🔧 Developers | 10 | L0, L1, L2, L3 | complexity exhausted |
Phase 5 is the final pass. No content changes happen here — only verification, repair, and consistency enforcement. The polish algorithm is a systematic checklist implemented as interactive toggles.
Checks in detail:
l1/file.html. L1→L2: ../l2/file.html. L2→L3: ../l3/file.html. A wrong prefix breaks navigation.
The polish algorithm makes five passes:
1. Find and handle dead links: Every internal link is checked against the list of generated files. Deep-dive links to non-generated L3 pages are removed. Other dead links produce an error.
2. Inject audience switch: If multiple audiences were generated, the L0 overview gets links to all variants. If only one audience exists, no switch is shown.
3. Fix path prefixes: Links between levels must reflect the correct directory structure. A link from an L1 page to an L2 page must start with ../l2/, not l2/.
4. Synchronize breadcrumbs: The breadcrumb chain is computed from the file's position in the level tree and compared with the actual HTML.
5. Check language pairs: Every _de file needs an _en counterpart (if both languages were selected). Missing partners are reported as errors.
Phase 2 creates a curriculum for 📊 Executives. Should it include L2/L3 candidate topics?