Merge upstream origin/main into local fork

Accept upstream ce-review pipeline rewrite, retire 4 overlapping review
agents, add 5 local agents as conditional personas. Accept skill renames,
port local additions. Remove Rails/Ruby skills per FastAPI pivot.

36 agents, 48 skills, 7 commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
John Lamb
2026-03-25 13:32:26 -05:00
208 changed files with 15589 additions and 11555 deletions

View File

@@ -1,25 +1,12 @@
---
name: agent-browser
description: Browser automation using Vercel's agent-browser CLI. Use when you need to interact with web pages, fill forms, take screenshots, or scrape data. Alternative to Playwright MCP - uses Bash commands with ref-based element selection. Triggers on "browse website", "fill form", "click button", "take screenshot", "scrape page", "web automation".
description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
allowed-tools: Bash(npx agent-browser:*), Bash(agent-browser:*)
---
# Browser Automation with agent-browser
The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome.
## Setup Check
```bash
# Check installation
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"
```
### Install if needed
```bash
npm install -g agent-browser
agent-browser install # Downloads Chromium
```
The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update to the latest version.
## Core Workflow
@@ -103,6 +90,8 @@ echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/l
agent-browser auth login myapp
```
`auth login` navigates with `load` and then waits for login form selectors to appear before filling/clicking, which is more reliable on delayed SPA login screens.
**Option 5: State file (manual save/load)**
```bash
@@ -160,6 +149,12 @@ agent-browser download @e1 ./file.pdf # Click element to trigger downlo
agent-browser wait --download ./output.zip # Wait for any download to complete
agent-browser --download-path ./downloads open <url> # Set default download directory
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
@@ -188,6 +183,24 @@ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait str
agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
```
## Batch Execution
Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.
```bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
# Stop on first error
agent-browser batch --bail < commands.json
```
Use `batch` when you have a known sequence of commands that don't depend on intermediate output. Use separate commands or `&&` chaining when you need to parse output between steps (e.g., snapshot to discover refs, then interact).
## Common Patterns
### Form Submission
@@ -219,6 +232,8 @@ agent-browser auth show github
agent-browser auth delete github
```
`auth login` waits for username/password/submit selectors before interacting, with a timeout tied to the default action timeout.
### Authentication with State Persistence
```bash
@@ -258,6 +273,30 @@ agent-browser state clear myapp
agent-browser state clean --older-than 7
```
### Working with Iframes
Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
```bash
agent-browser open https://example.com/checkout
agent-browser snapshot -i
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
# @e3 [input] "Card number"
# @e4 [input] "Expiry"
# @e5 [button] "Pay"
# Interact directly — no frame switch needed
agent-browser fill @e3 "4111111111111111"
agent-browser fill @e4 "12/28"
agent-browser click @e5
# To scope a snapshot to one iframe:
agent-browser frame @e2
agent-browser snapshot -i # Only iframe content
agent-browser frame main # Return to main frame
```
### Data Extraction
```bash
@@ -294,6 +333,8 @@ agent-browser --auto-connect snapshot
agent-browser --cdp 9222 snapshot
```
Auto-connect discovers Chrome via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails.
### Color Scheme (Dark Mode)
```bash
@@ -596,6 +637,18 @@ Create `agent-browser.json` in the project root for persistent settings:
Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.
## Deep-Dive Documentation
| Reference | When to Use |
| -------------------------------------------------------------------- | --------------------------------------------------------- |
| [references/commands.md](references/commands.md) | Full command reference with all options |
| [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
| [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
| [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
| [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
| [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
| [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
## Browser Engine Selection
Use `--engine` to choose a local browser engine. The default is `chrome`.
@@ -618,18 +671,6 @@ Supported engines:
Lightpanda does not support `--extension`, `--profile`, `--state`, or `--allow-file-access`. Install Lightpanda from https://lightpanda.io/docs/open-source/installation.
## Deep-Dive Documentation
| Reference | When to Use |
| -------------------------------------------------------------------- | --------------------------------------------------------- |
| [references/commands.md](references/commands.md) | Full command reference with all options |
| [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
| [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
| [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
| [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
| [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
| [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
## Ready-to-Use Templates
| Template | Description |
@@ -643,23 +684,3 @@ Lightpanda does not support `--extension`, `--profile`, `--state`, or `--allow-f
./templates/authenticated-session.sh https://app.example.com/login
./templates/capture-workflow.sh https://example.com ./output
```
## vs Playwright MCP
| Feature | agent-browser (CLI) | Playwright MCP |
|---------|---------------------|----------------|
| Interface | Bash commands | MCP tools |
| Selection | Refs (@e1) | Refs (e1) |
| Output | Text/JSON | Tool responses |
| Parallel | Sessions | Tabs |
| Best for | Quick automation | Tool integration |
Use agent-browser when:
- You prefer Bash-based workflows
- You want simpler CLI commands
- You need quick one-off automation
Use Playwright MCP when:
- You need deep MCP tool integration
- You want tool-based responses
- You're building complex automation

View File

@@ -1,190 +0,0 @@
---
name: brainstorming
description: This skill should be used before implementing features, building components, or making changes. It guides exploring user intent, approaches, and design decisions before planning. Triggers on "let's brainstorm", "help me think through", "what should we build", "explore approaches", ambiguous feature requests, or when the user's request has multiple valid interpretations that need clarification.
---
# Brainstorming
This skill provides detailed process knowledge for effective brainstorming sessions that clarify **WHAT** to build before diving into **HOW** to build it.
## When to Use This Skill
Brainstorming is valuable when:
- Requirements are unclear or ambiguous
- Multiple approaches could solve the problem
- Trade-offs need to be explored with the user
- The user hasn't fully articulated what they want
- The feature scope needs refinement
Brainstorming can be skipped when:
- Requirements are explicit and detailed
- The user knows exactly what they want
- The task is a straightforward bug fix or well-defined change
## Core Process
### Phase 0: Assess Requirement Clarity
Before diving into questions, assess whether brainstorming is needed.
**Signals that requirements are clear:**
- User provided specific acceptance criteria
- User referenced existing patterns to follow
- User described exact behavior expected
- Scope is constrained and well-defined
**Signals that brainstorming is needed:**
- User used vague terms ("make it better", "add something like")
- Multiple reasonable interpretations exist
- Trade-offs haven't been discussed
- User seems unsure about the approach
If requirements are clear, suggest: "Your requirements seem clear. Consider proceeding directly to planning or implementation."
### Phase 1: Understand the Idea
Ask questions **one at a time** to understand the user's intent. Avoid overwhelming with multiple questions.
**Question Techniques:**
1. **Prefer multiple choice when natural options exist**
- Good: "Should the notification be: (a) email only, (b) in-app only, or (c) both?"
- Avoid: "How should users be notified?"
2. **Start broad, then narrow**
- First: What is the core purpose?
- Then: Who are the users?
- Finally: What constraints exist?
3. **Validate assumptions explicitly**
- "I'm assuming users will be logged in. Is that correct?"
4. **Ask about success criteria early**
- "How will you know this feature is working well?"
**Key Topics to Explore:**
| Topic | Example Questions |
|-------|-------------------|
| Purpose | What problem does this solve? What's the motivation? |
| Users | Who uses this? What's their context? |
| Constraints | Any technical limitations? Timeline? Dependencies? |
| Success | How will you measure success? What's the happy path? |
| Edge Cases | What shouldn't happen? Any error states to consider? |
| Existing Patterns | Are there similar features in the codebase to follow? |
**Exit Condition:** Continue until the idea is clear OR user says "proceed" or "let's move on"
### Phase 2: Explore Approaches
After understanding the idea, propose 2-3 concrete approaches.
**Structure for Each Approach:**
```markdown
### Approach A: [Name]
[2-3 sentence description]
**Pros:**
- [Benefit 1]
- [Benefit 2]
**Cons:**
- [Drawback 1]
- [Drawback 2]
**Best when:** [Circumstances where this approach shines]
```
**Guidelines:**
- Lead with a recommendation and explain why
- Be honest about trade-offs
- Consider YAGNI—simpler is usually better
- Reference codebase patterns when relevant
### Phase 3: Capture the Design
Summarize key decisions in a structured format.
**Design Doc Structure:**
```markdown
---
date: YYYY-MM-DD
topic: <kebab-case-topic>
---
# <Topic Title>
## What We're Building
[Concise description—1-2 paragraphs max]
## Why This Approach
[Brief explanation of approaches considered and why this one was chosen]
## Key Decisions
- [Decision 1]: [Rationale]
- [Decision 2]: [Rationale]
## Open Questions
- [Any unresolved questions for the planning phase]
## Next Steps
`/ce:plan` for implementation details
```
**Output Location:** `docs/brainstorms/YYYY-MM-DD-<topic>-brainstorm.md`
### Phase 4: Handoff
Present clear options for what to do next:
1. **Proceed to planning** → Run `/ce:plan`
2. **Refine further** → Continue exploring the design
3. **Done for now** → User will return later
## YAGNI Principles
During brainstorming, actively resist complexity:
- **Don't design for hypothetical future requirements**
- **Choose the simplest approach that solves the stated problem**
- **Prefer boring, proven patterns over clever solutions**
- **Ask "Do we really need this?" when complexity emerges**
- **Defer decisions that don't need to be made now**
## Incremental Validation
Keep sections short—200-300 words maximum. After each section of output, pause to validate understanding:
- "Does this match what you had in mind?"
- "Any adjustments before we continue?"
- "Is this the direction you want to go?"
This prevents wasted effort on misaligned designs.
## Anti-Patterns to Avoid
| Anti-Pattern | Better Approach |
|--------------|-----------------|
| Asking 5 questions at once | Ask one at a time |
| Jumping to implementation details | Stay focused on WHAT, not HOW |
| Proposing overly complex solutions | Start simple, add complexity only if needed |
| Ignoring existing codebase patterns | Research what exists first |
| Making assumptions without validating | State assumptions explicitly and confirm |
| Creating lengthy design documents | Keep it concise—details go in the plan |
## Integration with Planning
Brainstorming answers **WHAT** to build:
- Requirements and acceptance criteria
- Chosen approach and rationale
- Key decisions and trade-offs
Planning answers **HOW** to build it:
- Implementation steps and file changes
- Technical details and code patterns
- Testing strategy and verification
When brainstorm output exists, `/ce:plan` should detect it and use it as input, skipping its own idea refinement phase.

View File

@@ -1,16 +1,38 @@
---
name: ce:brainstorm
description: Explore requirements and approaches through collaborative dialogue before planning implementation
description: 'Explore requirements and approaches through collaborative dialogue before writing a right-sized requirements document and planning implementation. Use for feature ideas, problem framing, when the user says ''let''s brainstorm'', or when they want to think through options before deciding what to build. Also use when a user describes a vague or ambitious feature request, asks ''what should we build'', ''help me think through X'', presents a problem with multiple valid solutions, or seems unsure about scope or direction — even if they don''t explicitly ask to brainstorm.'
argument-hint: "[feature idea or problem to explore]"
---
# Brainstorm a Feature or Improvement
**Note: The current year is 2026.** Use this when dating brainstorm documents.
**Note: The current year is 2026.** Use this when dating requirements documents.
Brainstorming helps answer **WHAT** to build through collaborative dialogue. It precedes `/ce:plan`, which answers **HOW** to build it.
**Process knowledge:** Load the `brainstorming` skill for detailed question techniques, approach exploration patterns, and YAGNI principles.
The durable output of this workflow is a **requirements document**. In other workflows this might be called a lightweight PRD or feature brief. In compound engineering, keep the workflow name `brainstorm`, but make the written artifact strong enough that planning does not need to invent product behavior, scope boundaries, or success criteria.
This skill does not implement code. It explores, clarifies, and documents decisions for later planning or execution.
## Core Principles
1. **Assess scope first** - Match the amount of ceremony to the size and ambiguity of the work.
2. **Be a thinking partner** - Suggest alternatives, challenge assumptions, and explore what-ifs instead of only extracting requirements.
3. **Resolve product decisions here** - User-facing behavior, scope boundaries, and success criteria belong in this workflow. Detailed implementation belongs in planning.
4. **Keep implementation out of the requirements doc by default** - Do not include libraries, schemas, endpoints, file layouts, or code-level design unless the brainstorm itself is inherently about a technical or architectural change.
5. **Right-size the artifact** - Simple work gets a compact requirements document or brief alignment. Larger work gets a fuller document. Do not add ceremony that does not help planning.
6. **Apply YAGNI to carrying cost, not coding effort** - Prefer the simplest approach that delivers meaningful value. Avoid speculative complexity and hypothetical future-proofing, but low-cost polish or delight is worth including when its ongoing cost is small and easy to maintain.
## Interaction Rules
1. **Ask one question at a time** - Do not batch several unrelated questions into one message.
2. **Prefer single-select multiple choice** - Use single-select when choosing one direction, one priority, or one next step.
3. **Use multi-select rarely and intentionally** - Use it only for compatible sets such as goals, constraints, non-goals, or success criteria that can all coexist. If prioritization matters, follow up by asking which selected item is primary.
4. **Use the platform's question tool when available** - When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
## Output Guidance
- **Keep outputs concise** - Prefer short sections, brief bullets, and only enough detail to support the next decision.
## Feature Description
@@ -22,9 +44,16 @@ Do not proceed until you have a feature description from the user.
## Execution Flow
### Phase 0: Assess Requirements Clarity
### Phase 0: Resume, Assess, and Route
Evaluate whether brainstorming is needed based on the feature description.
#### 0.1 Resume Existing Work When Appropriate
If the user references an existing brainstorm topic or document, or there is an obvious recent matching `*-requirements.md` file in `docs/brainstorms/`:
- Read the document
- Confirm with the user before resuming: "Found an existing requirements doc for [topic]. Should I continue from this, or start fresh?"
- If resuming, summarize the current state briefly, continue from its existing decisions and outstanding questions, and update the existing document instead of creating a duplicate
#### 0.2 Assess Whether Brainstorming Is Needed
**Clear requirements indicators:**
- Specific acceptance criteria provided
@@ -33,71 +62,228 @@ Evaluate whether brainstorming is needed based on the feature description.
- Constrained, well-defined scope
**If requirements are already clear:**
Use **AskUserQuestion tool** to suggest: "Your requirements seem detailed enough to proceed directly to planning. Should I run `/ce:plan` instead, or would you like to explore the idea further?"
Keep the interaction brief. Confirm understanding and present concise next-step options rather than forcing a long brainstorm. Only write a short requirements document when a durable handoff to planning or later review would be valuable. Skip Phase 1.1 and 1.2 entirely — go straight to Phase 1.3 or Phase 3.
#### 0.3 Assess Scope
Use the feature description plus a light repo scan to classify the work:
- **Lightweight** - small, well-bounded, low ambiguity
- **Standard** - normal feature or bounded refactor with some decisions to make
- **Deep** - cross-cutting, strategic, or highly ambiguous
If the scope is unclear, ask one targeted question to disambiguate and then proceed.
### Phase 1: Understand the Idea
#### 1.1 Repository Research (Lightweight)
#### 1.1 Existing Context Scan
Run a quick repo scan to understand existing patterns:
Scan the repo before substantive brainstorming. Match depth to scope:
- Task compound-engineering:research:repo-research-analyst("Understand existing patterns related to: <feature_description>")
**Lightweight** — Search for the topic, check if something similar already exists, and move on.
Focus on: similar features, established patterns, CLAUDE.md guidance.
**Standard and Deep** — Two passes:
#### 1.2 Collaborative Dialogue
*Constraint Check* — Check project instruction files (`AGENTS.md`, and `CLAUDE.md` only if retained as compatibility context) for workflow, product, or scope constraints that affect the brainstorm. If these add nothing, move on.
Use the **AskUserQuestion tool** to ask questions **one at a time**.
*Topic Scan* — Search for relevant terms. Read the most relevant existing artifact if one exists (brainstorm, plan, spec, skill, feature doc). Skim adjacent examples covering similar behavior.
**Guidelines (see `brainstorming` skill for detailed techniques):**
If nothing obvious appears after a short scan, say so and continue. Do not drift into technical planning — avoid inspecting tests, migrations, deployment, or low-level architecture unless the brainstorm is itself about a technical decision.
#### 1.2 Product Pressure Test
Before generating approaches, challenge the request to catch misframing. Match depth to scope:
**Lightweight:**
- Is this solving the real user problem?
- Are we duplicating something that already covers this?
- Is there a clearly better framing with near-zero extra cost?
**Standard:**
- Is this the right problem, or a proxy for a more important one?
- What user or business outcome actually matters here?
- What happens if we do nothing?
- Is there a nearby framing that creates more user value without more carrying cost? If so, what complexity does it add?
- Given the current project state, user goal, and constraints, what is the single highest-leverage move right now: the request as framed, a reframing, one adjacent addition, a simplification, or doing nothing?
- Favor moves that compound value, reduce future carrying cost, or make the product meaningfully more useful or compelling
- Use the result to sharpen the conversation, not to bulldoze the user's intent
**Deep** — Standard questions plus:
- What durable capability should this create in 6-12 months?
- Does this move the product toward that, or is it only a local patch?
#### 1.3 Collaborative Dialogue
Use the platform's blocking question tool when available (see Interaction Rules). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
**Guidelines:**
- Ask questions **one at a time**
- Prefer multiple choice when natural options exist
- Start broad (purpose, users) then narrow (constraints, edge cases)
- Validate assumptions explicitly
- Ask about success criteria
- Prefer **single-select** when choosing one direction, one priority, or one next step
- Use **multi-select** only for compatible sets that can all coexist; if prioritization matters, ask which selected item is primary
- Start broad (problem, users, value) then narrow (constraints, exclusions, edge cases)
- Clarify the problem frame, validate assumptions, and ask about success criteria
- Make requirements concrete enough that planning will not need to invent behavior
- Surface dependencies or prerequisites only when they materially affect scope
- Resolve product decisions here; leave technical implementation choices for planning
- Bring ideas, alternatives, and challenges instead of only interviewing
**Exit condition:** Continue until the idea is clear OR user says "proceed"
**Exit condition:** Continue until the idea is clear OR the user explicitly wants to proceed.
### Phase 2: Explore Approaches
Propose **2-3 concrete approaches** based on research and conversation.
If multiple plausible directions remain, propose **2-3 concrete approaches** based on research and conversation. Otherwise state the recommended direction directly.
When useful, include one deliberately higher-upside alternative:
- Identify what adjacent addition or reframing would most increase usefulness, compounding value, or durability without disproportionate carrying cost. Present it as a challenger option alongside the baseline, not as the default. Omit it when the work is already obviously over-scoped or the baseline request is clearly the right move.
For each approach, provide:
- Brief description (2-3 sentences)
- Pros and cons
- Key risks or unknowns
- When it's best suited
Lead with your recommendation and explain why. Apply YAGNI—prefer simpler solutions.
Lead with your recommendation and explain why. Prefer simpler solutions when added complexity creates real carrying cost, but do not reject low-cost, high-value polish just because it is not strictly necessary.
Use **AskUserQuestion tool** to ask which approach the user prefers.
If one approach is clearly best and alternatives are not meaningful, skip the menu and state the recommendation directly.
### Phase 3: Capture the Design
If relevant, call out whether the choice is:
- Reuse an existing pattern
- Extend an existing capability
- Build something net new
Write a brainstorm document to `docs/brainstorms/YYYY-MM-DD-<topic>-brainstorm.md`.
### Phase 3: Capture the Requirements
**Document structure:** See the `brainstorming` skill for the template format. Key sections: What We're Building, Why This Approach, Key Decisions, Open Questions.
Write or update a requirements document only when the conversation produced durable decisions worth preserving.
This document should behave like a lightweight PRD without PRD ceremony. Include what planning needs to execute well, and skip sections that add no value for the scope.
The requirements document is for product definition and scope control. Do **not** include implementation details such as libraries, schemas, endpoints, file layouts, or code structure unless the brainstorm is inherently technical and those details are themselves the subject of the decision.
**Required content for non-trivial work:**
- Problem frame
- Concrete requirements or intended behavior with stable IDs
- Scope boundaries
- Success criteria
**Include when materially useful:**
- Key decisions and rationale
- Dependencies or assumptions
- Outstanding questions
- Alternatives considered
- High-level technical direction only when the work is inherently technical and the direction is part of the product/architecture decision
**Document structure:** Use this template and omit clearly inapplicable optional sections:
```markdown
---
date: YYYY-MM-DD
topic: <kebab-case-topic>
---
# <Topic Title>
## Problem Frame
[Who is affected, what is changing, and why it matters]
## Requirements
- R1. [Concrete user-facing behavior or requirement]
- R2. [Concrete user-facing behavior or requirement]
## Success Criteria
- [How we will know this solved the right problem]
## Scope Boundaries
- [Deliberate non-goal or exclusion]
## Key Decisions
- [Decision]: [Rationale]
## Dependencies / Assumptions
- [Only include if material]
## Outstanding Questions
### Resolve Before Planning
- [Affects R1][User decision] [Question that must be answered before planning can proceed]
### Deferred to Planning
- [Affects R2][Technical] [Question that should be answered during planning or codebase exploration]
- [Affects R2][Needs research] [Question that likely requires research during planning]
## Next Steps
[If `Resolve Before Planning` is empty: `→ /ce:plan` for structured implementation planning]
[If `Resolve Before Planning` is not empty: `→ Resume /ce:brainstorm` to resolve blocking questions before planning]
```
For **Standard** and **Deep** brainstorms, a requirements document is usually warranted.
For **Lightweight** brainstorms, keep the document compact. Skip document creation when the user only needs brief alignment and no durable decisions need to be preserved.
For very small requirements docs with only 1-3 simple requirements, plain bullet requirements are acceptable. For **Standard** and **Deep** requirements docs, use stable IDs like `R1`, `R2`, `R3` so planning and later review can refer to them unambiguously.
When the work is simple, combine sections rather than padding them. A short requirements document is better than a bloated one.
Before finalizing, check:
- What would `ce:plan` still have to invent if this brainstorm ended now?
- Do any requirements depend on something claimed to be out of scope?
- Are any unresolved items actually product decisions rather than planning questions?
- Did implementation details leak in when they shouldn't have?
- Is there a low-cost change that would make this materially more useful?
If planning would need to invent product behavior, scope boundaries, or success criteria, the brainstorm is not complete yet.
Ensure `docs/brainstorms/` directory exists before writing.
**IMPORTANT:** Before proceeding to Phase 4, check if there are any Open Questions listed in the brainstorm document. If there are open questions, YOU MUST ask the user about each one using AskUserQuestion before offering to proceed to planning. Move resolved questions to a "Resolved Questions" section.
If a document contains outstanding questions:
- Use `Resolve Before Planning` only for questions that truly block planning
- If `Resolve Before Planning` is non-empty, keep working those questions during the brainstorm by default
- If the user explicitly wants to proceed anyway, convert each remaining item into an explicit decision, assumption, or `Deferred to Planning` question before proceeding
- Do not force resolution of technical questions during brainstorming just to remove uncertainty
- Put technical questions, or questions that require validation or research, under `Deferred to Planning` when they are better answered there
- Use tags like `[Needs research]` when the planner should likely investigate the question rather than answer it from repo context alone
- Carry deferred questions forward explicitly rather than treating them as a failure to finish the requirements doc
### Phase 4: Handoff
Use **AskUserQuestion tool** to present next steps:
#### 4.1 Present Next-Step Options
**Question:** "Brainstorm captured. What would you like to do next?"
Present next steps using the platform's blocking question tool when available (see Interaction Rules). Otherwise present numbered options in chat and end the turn.
**Options:**
1. **Review and refine** - Improve the document through structured self-review
2. **Proceed to planning** - Run `/ce:plan` (will auto-detect this brainstorm)
3. **Share to Proof** - Upload to Proof for collaborative review and sharing
4. **Ask more questions** - I have more questions to clarify before moving on
5. **Done for now** - Return later
If `Resolve Before Planning` contains any items:
- Ask the blocking questions now, one at a time, by default
- If the user explicitly wants to proceed anyway, first convert each remaining item into an explicit decision, assumption, or `Deferred to Planning` question
- If the user chooses to pause instead, present the handoff as paused or blocked rather than complete
- Do not offer `Proceed to planning` or `Proceed directly to work` while `Resolve Before Planning` remains non-empty
**Question when no blocking questions remain:** "Brainstorm complete. What would you like to do next?"
**Question when blocking questions remain and user wants to pause:** "Brainstorm paused. Planning is blocked until the remaining questions are resolved. What would you like to do next?"
Present only the options that apply:
- **Proceed to planning (Recommended)** - Run `/ce:plan` for structured implementation planning
- **Proceed directly to work** - Only offer this when scope is lightweight, success criteria are clear, scope boundaries are clear, and no meaningful technical or research questions remain
- **Review and refine** - Offer this only when a requirements document exists and can be improved through structured review
- **Ask more questions** - Continue clarifying scope, preferences, or edge cases
- **Share to Proof** - Offer this only when a requirements document exists
- **Done for now** - Return later
If the direct-to-work gate is not satisfied, omit that option entirely.
#### 4.2 Handle the Selected Option
**If user selects "Proceed to planning (Recommended)":**
Immediately run `/ce:plan` in the current session. Pass the requirements document path when one exists; otherwise pass a concise summary of the finalized brainstorm decisions. Do not print the closing summary first.
**If user selects "Proceed directly to work":**
Immediately run `/ce:work` in the current session using the finalized brainstorm output as context. If a compact requirements document exists, pass its path. Do not print the closing summary first.
**If user selects "Share to Proof":**
```bash
CONTENT=$(cat docs/brainstorms/YYYY-MM-DD-<topic>-brainstorm.md)
TITLE="Brainstorm: <topic title>"
CONTENT=$(cat docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md)
TITLE="Requirements: <topic title>"
RESPONSE=$(curl -s -X POST https://www.proofeditor.ai/share/markdown \
-H "Content-Type: application/json" \
-d "$(jq -n --arg title "$TITLE" --arg markdown "$CONTENT" --arg by "ai:compound" '{title: $title, markdown: $markdown, by: $by}')")
@@ -108,38 +294,42 @@ Display the URL prominently: `View & collaborate in Proof: <PROOF_URL>`
If the curl fails, skip silently. Then return to the Phase 4 options.
**If user selects "Ask more questions":** YOU (Claude) return to Phase 1.2 (Collaborative Dialogue) and continue asking the USER questions one at a time to further refine the design. The user wants YOU to probe deeper - ask about edge cases, constraints, preferences, or areas not yet explored. Continue until the user is satisfied, then return to Phase 4.
**If user selects "Ask more questions":** Return to Phase 1.3 (Collaborative Dialogue) and continue asking the user questions one at a time to further refine the design. Probe deeper into edge cases, constraints, preferences, or areas not yet explored. Continue until the user is satisfied, then return to Phase 4. Do not show the closing summary yet.
**If user selects "Review and refine":**
Load the `document-review` skill and apply it to the brainstorm document.
Load the `document-review` skill and apply it to the requirements document.
When document-review returns "Review complete", present next steps:
When document-review returns "Review complete", return to the normal Phase 4 options and present only the options that still apply. Do not show the closing summary yet.
1. **Move to planning** - Continue to `/ce:plan` with this document
2. **Done for now** - Brainstorming complete. To start planning later: `/ce:plan [document-path]`
#### 4.3 Closing Summary
## Output Summary
Use the closing summary only when this run of the workflow is ending or handing off, not when returning to the Phase 4 options.
When complete, display:
When complete and ready for planning, display:
```
```text
Brainstorm complete!
Document: docs/brainstorms/YYYY-MM-DD-<topic>-brainstorm.md
Requirements doc: docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md # if one was created
Key decisions:
- [Decision 1]
- [Decision 2]
Next: Run `/ce:plan` when ready to implement.
Recommended next step: `/ce:plan`
```
## Important Guidelines
If the user pauses with `Resolve Before Planning` still populated, display:
- **Stay focused on WHAT, not HOW** - Implementation details belong in the plan
- **Ask one question at a time** - Don't overwhelm
- **Apply YAGNI** - Prefer simpler approaches
- **Keep outputs concise** - 200-300 words per section max
```text
Brainstorm paused.
NEVER CODE! Just explore and document decisions.
Requirements doc: docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md # if one was created
Planning is blocked by:
- [Blocking question 1]
- [Blocking question 2]
Resume with `/ce:brainstorm` when ready to resolve these before planning.
```

View File

@@ -0,0 +1,635 @@
---
name: ce:compound-refresh
description: Refresh stale or drifting learnings and pattern docs in docs/solutions/ by reviewing, updating, consolidating, replacing, or deleting them against the current codebase. Use after refactors, migrations, dependency upgrades, or when a retrieved learning feels outdated or wrong. Also use when reviewing docs/solutions/ for accuracy, when a recently solved problem contradicts an existing learning, when pattern docs no longer reflect current code, or when multiple docs seem to cover the same topic and might benefit from consolidation.
argument-hint: "[mode:autofix] [optional: scope hint]"
disable-model-invocation: true
---
# Compound Refresh
Maintain the quality of `docs/solutions/` over time. This workflow reviews existing learnings against the current codebase, then refreshes any derived pattern docs that depend on them.
## Mode Detection
Check if `$ARGUMENTS` contains `mode:autofix`. If present, strip it from arguments (use the remainder as a scope hint) and run in **autofix mode**.
| Mode | When | Behavior |
|------|------|----------|
| **Interactive** (default) | User is present and can answer questions | Ask for decisions on ambiguous cases, confirm actions |
| **Autofix** | `mode:autofix` in arguments | No user interaction. Apply all unambiguous actions (Keep, Update, Consolidate, auto-Delete, Replace with sufficient evidence). Mark ambiguous cases as stale. Generate a summary report at the end. |
### Autofix mode rules
- **Skip all user questions.** Never pause for input.
- **Process all docs in scope.** No scope narrowing questions — if no scope hint was provided, process everything.
- **Attempt all safe actions:** Keep (no-op), Update (fix references), Consolidate (merge and delete subsumed doc), auto-Delete (unambiguous criteria met), Replace (when evidence is sufficient). If a write succeeds, record it as **applied**. If a write fails (e.g., permission denied), record the action as **recommended** in the report and continue — do not stop or ask for permissions.
- **Mark as stale when uncertain.** If classification is genuinely ambiguous (Update vs Replace vs Consolidate vs Delete) or Replace evidence is insufficient, mark as stale with `status: stale`, `stale_reason`, and `stale_date` in the frontmatter. If even the stale-marking write fails, include it as a recommendation.
- **Use conservative confidence.** In interactive mode, borderline cases get a user question. In autofix mode, borderline cases get marked stale. Err toward stale-marking over incorrect action.
- **Always generate a report.** The report is the primary deliverable. It has two sections: **Applied** (actions that were successfully written) and **Recommended** (actions that could not be written, with full rationale so a human can apply them or run the skill interactively). The report structure is the same regardless of what permissions were granted — the only difference is which section each action lands in.
## Interaction Principles
**These principles apply to interactive mode only. In autofix mode, skip all user questions and apply the autofix mode rules above.**
Follow the same interaction style as `ce:brainstorm`:
- Ask questions **one at a time** — use the platform's blocking question tool when available (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in plain text and wait for the user's reply before continuing
- Prefer **multiple choice** when natural options exist
- Start with **scope and intent**, then narrow only when needed
- Do **not** ask the user to make decisions before you have evidence
- Lead with a recommendation and explain it briefly
The goal is not to force the user through a checklist. The goal is to help them make a good maintenance decision with the smallest amount of friction.
## Refresh Order
Refresh in this order:
1. Review the relevant individual learning docs first
2. Note which learnings stayed valid, were updated, were consolidated, were replaced, or were deleted
3. Then review any pattern docs that depend on those learnings
Why this order:
- learning docs are the primary evidence
- pattern docs are derived from one or more learnings
- stale learnings can make a pattern look more valid than it really is
If the user starts by naming a pattern doc, you may begin there to understand the concern, but inspect the supporting learning docs before changing the pattern.
## Maintenance Model
For each candidate artifact, classify it into one of five outcomes:
| Outcome | Meaning | Default action |
|---------|---------|----------------|
| **Keep** | Still accurate and still useful | No file edit by default; report that it was reviewed and remains trustworthy |
| **Update** | Core solution is still correct, but references drifted | Apply evidence-backed in-place edits |
| **Consolidate** | Two or more docs overlap heavily but are both correct | Merge unique content into the canonical doc, delete the subsumed doc |
| **Replace** | The old artifact is now misleading, but there is a known better replacement | Create a trustworthy successor, then delete the old artifact |
| **Delete** | No longer useful, applicable, or distinct | Delete the file — git history preserves it if anyone needs to recover it later |
## Core Rules
1. **Evidence informs judgment.** The signals below are inputs, not a mechanical scorecard. Use engineering judgment to decide whether the artifact is still trustworthy.
2. **Prefer no-write Keep.** Do not update a doc just to leave a review breadcrumb.
3. **Match docs to reality, not the reverse.** When current code differs from a learning, update the learning to reflect the current code. The skill's job is doc accuracy, not code review — do not ask the user whether code changes were "intentional" or "a regression." If the code changed, the doc should match. If the user thinks the code is wrong, that is a separate concern outside this workflow.
4. **Be decisive, minimize questions.** When evidence is clear (file renamed, class moved, reference broken), apply the update. In interactive mode, only ask the user when the right action is genuinely ambiguous. In autofix mode, mark ambiguous cases as stale instead of asking. The goal is automated maintenance with human oversight on judgment calls, not a question for every finding.
5. **Avoid low-value churn.** Do not edit a doc just to fix a typo, polish wording, or make cosmetic changes that do not materially improve accuracy or usability.
6. **Use Update only for meaningful, evidence-backed drift.** Paths, module names, related links, category metadata, code snippets, and clearly stale wording are fair game when fixing them materially improves accuracy.
7. **Use Replace only when there is a real replacement.** That means either:
- the current conversation contains a recently solved, verified replacement fix, or
- the user has provided enough concrete replacement context to document the successor honestly, or
- the codebase investigation found the current approach and can document it as the successor, or
- newer docs, pattern docs, PRs, or issues provide strong successor evidence.
8. **Delete when the code is gone.** If the referenced code, controller, or workflow no longer exists in the codebase and no successor can be found, delete the file — don't default to Keep just because the general advice is still "sound." A learning about a deleted feature misleads readers into thinking that feature still exists. When in doubt between Keep and Delete, ask the user (in interactive mode) or mark as stale (in autofix mode). But missing referenced files with no matching code is **not** a doubt case — it is strong, unambiguous Delete evidence. Auto-delete it.
9. **Evaluate document-set design, not just accuracy.** In addition to checking whether each doc is accurate, evaluate whether it is still the right unit of knowledge. If two or more docs overlap heavily, determine whether they should remain separate, be cross-scoped more clearly, or be consolidated into one canonical document. Redundant docs are dangerous because they drift silently — two docs saying the same thing will eventually say different things.
10. **Delete, don't archive.** There is no `_archived/` directory. When a doc is no longer useful, delete it. Git history preserves every deleted file — that is the archive. A dedicated archive directory creates problems: archived docs accumulate, pollute search results, and nobody reads them. If someone needs a deleted doc, `git log --diff-filter=D -- docs/solutions/` will find it.
## Scope Selection
Start by discovering learnings and pattern docs under `docs/solutions/`.
Exclude:
- `README.md`
- `docs/solutions/_archived/` (legacy — if this directory exists, flag it for cleanup in the report)
Find all `.md` files under `docs/solutions/`, excluding `README.md` files and anything under `_archived/`. If an `_archived/` directory exists, note it in the report as a legacy artifact that should be cleaned up (files either restored or deleted).
If `$ARGUMENTS` is provided, use it to narrow scope before proceeding. Try these matching strategies in order, stopping at the first that produces results:
1. **Directory match** — check if the argument matches a subdirectory name under `docs/solutions/` (e.g., `performance-issues`, `database-issues`)
2. **Frontmatter match** — search `module`, `component`, or `tags` fields in learning frontmatter for the argument
3. **Filename match** — match against filenames (partial matches are fine)
4. **Content search** — search file contents for the argument as a keyword (useful for feature names or feature areas)
If no matches are found, report that and ask the user to clarify. In autofix mode, report the miss and stop — do not guess at scope.
If no candidate docs are found, report:
```text
No candidate docs found in docs/solutions/.
Run `ce:compound` after solving problems to start building your knowledge base.
```
## Phase 0: Assess and Route
Before asking the user to classify anything:
1. Discover candidate artifacts
2. Estimate scope
3. Choose the lightest interaction path that fits
### Route by Scope
| Scope | When to use it | Interaction style |
|-------|----------------|-------------------|
| **Focused** | 1-2 likely files or user named a specific doc | Investigate directly, then present a recommendation |
| **Batch** | Up to ~8 mostly independent docs | Investigate first, then present grouped recommendations |
| **Broad** | 9+ docs, ambiguous, or repo-wide stale-doc sweep | Triage first, then investigate in batches |
### Broad Scope Triage
When scope is broad (9+ candidate docs), do a lightweight triage before deep investigation:
1. **Inventory** — read frontmatter of all candidate docs, group by module/component/category
2. **Impact clustering** — identify areas with the densest clusters of learnings + pattern docs. A cluster of 5 learnings and 2 patterns covering the same module is higher-impact than 5 isolated single-doc areas, because staleness in one doc is likely to affect the others.
3. **Spot-check drift** — for each cluster, check whether the primary referenced files still exist. Missing references in a high-impact cluster = strongest signal for where to start.
4. **Recommend a starting area** — present the highest-impact cluster with a brief rationale and ask the user to confirm or redirect. In autofix mode, skip the question and process all clusters in impact order.
Example:
```text
Found 24 learnings across 5 areas.
The auth module has 5 learnings and 2 pattern docs that cross-reference
each other — and 3 of those reference files that no longer exist.
I'd start there.
1. Start with auth (recommended)
2. Pick a different area
3. Review everything
```
Do not ask action-selection questions yet. First gather evidence.
## Phase 1: Investigate Candidate Learnings
For each learning in scope, read it, cross-reference its claims against the current codebase, and form a recommendation.
A learning has several dimensions that can independently go stale. Surface-level checks catch the obvious drift, but staleness often hides deeper:
- **References** — do the file paths, class names, and modules it mentions still exist or have they moved?
- **Recommended solution** — does the fix still match how the code actually works today? A renamed file with a completely different implementation pattern is not just a path update.
- **Code examples** — if the learning includes code snippets, do they still reflect the current implementation?
- **Related docs** — are cross-referenced learnings and patterns still present and consistent?
- **Auto memory** — does the auto memory directory contain notes in the same problem domain? Read MEMORY.md from the auto memory directory (the path is known from the system prompt context). If it does not exist or is empty, skip this dimension. A memory note describing a different approach than what the learning recommends is a supplementary drift signal.
- **Overlap** — while investigating, note when another doc in scope covers the same problem domain, references the same files, or recommends a similar solution. For each overlap, record: the two file paths, which dimensions overlap (problem, solution, root cause, files, prevention), and which doc appears broader or more current. These signals feed Phase 1.75 (Document-Set Analysis).
Match investigation depth to the learning's specificity — a learning referencing exact file paths and code snippets needs more verification than one describing a general principle.
### Drift Classification: Update vs Replace
The critical distinction is whether the drift is **cosmetic** (references moved but the solution is the same) or **substantive** (the solution itself changed):
- **Update territory** — file paths moved, classes renamed, links broke, metadata drifted, but the core recommended approach is still how the code works. `ce:compound-refresh` fixes these directly.
- **Replace territory** — the recommended solution conflicts with current code, the architectural approach changed, or the pattern is no longer the preferred way. This means a new learning needs to be written. A replacement subagent writes the successor following `ce:compound`'s document format (frontmatter, problem, root cause, solution, prevention), using the investigation evidence already gathered. The orchestrator does not rewrite learnings inline — it delegates to a subagent for context isolation.
**The boundary:** if you find yourself rewriting the solution section or changing what the learning recommends, stop — that is Replace, not Update.
**Memory-sourced drift signals** are supplementary, not primary. A memory note describing a different approach does not alone justify Replace or Delete. Use memory signals to:
- Corroborate codebase-sourced drift (strengthens the case for Replace)
- Prompt deeper investigation when codebase evidence is borderline
- Add context to the evidence report ("(auto memory [claude]) notes suggest approach X may have changed since this learning was written")
In autofix mode, memory-only drift (no codebase corroboration) should result in stale-marking, not action.
### Judgment Guidelines
Three guidelines that are easy to get wrong:
1. **Contradiction = strong Replace signal.** If the learning's recommendation conflicts with current code patterns or a recently verified fix, that is not a minor drift — the learning is actively misleading. Classify as Replace.
2. **Age alone is not a stale signal.** A 2-year-old learning that still matches current code is fine. Only use age as a prompt to inspect more carefully.
3. **Check for successors before deleting.** Before recommending Replace or Delete, look for newer learnings, pattern docs, PRs, or issues covering the same problem space. If successor evidence exists, prefer Replace over Delete so readers are directed to the newer guidance.
## Phase 1.5: Investigate Pattern Docs
After reviewing the underlying learning docs, investigate any relevant pattern docs under `docs/solutions/patterns/`.
Pattern docs are high-leverage — a stale pattern is more dangerous than a stale individual learning because future work may treat it as broadly applicable guidance. Evaluate whether the generalized rule still holds given the refreshed state of the learnings it depends on.
A pattern doc with no clear supporting learnings is a stale signal — investigate carefully before keeping it unchanged.
## Phase 1.75: Document-Set Analysis
After investigating individual docs, step back and evaluate the document set as a whole. The goal is to catch problems that only become visible when comparing docs to each other — not just to reality.
### Overlap Detection
For docs that share the same module, component, tags, or problem domain, compare them across these dimensions:
- **Problem statement** — do they describe the same underlying problem?
- **Solution shape** — do they recommend the same approach, even if worded differently?
- **Referenced files** — do they point to the same code paths?
- **Prevention rules** — do they repeat the same prevention bullets?
- **Root cause** — do they identify the same root cause?
High overlap across 3+ dimensions is a strong Consolidate signal. The question to ask: "Would a future maintainer need to read both docs to get the current truth, or is one mostly repeating the other?"
### Supersession Signals
Detect "older narrow precursor, newer canonical doc" patterns:
- A newer doc covers the same files, same workflow, and broader runtime behavior than an older doc
- An older doc describes a specific incident that a newer doc generalizes into a pattern
- Two docs recommend the same fix but the newer one has better context, examples, or scope
When a newer doc clearly subsumes an older one, the older doc is a consolidation candidate — its unique content (if any) should be merged into the newer doc, and the older doc should be deleted.
### Canonical Doc Identification
For each topic cluster (docs sharing a problem domain), identify which doc is the **canonical source of truth**:
- Usually the most recent, broadest, most accurate doc in the cluster
- The one a maintainer should find first when searching for this topic
- The one that other docs should point to, not duplicate
All other docs in the cluster are either:
- **Distinct** — they cover a meaningfully different sub-problem and have independent retrieval value. Keep them separate.
- **Subsumed** — their unique content fits as a section in the canonical doc. Consolidate.
- **Redundant** — they add nothing the canonical doc doesn't already say. Delete.
### Retrieval-Value Test
Before recommending that two docs stay separate, apply this test: "If a maintainer searched for this topic six months from now, would having these as separate docs improve discoverability, or just create drift risk?"
Separate docs earn their keep only when:
- They cover genuinely different sub-problems that someone might search for independently
- They target different audiences or contexts (e.g., one is about debugging, another about prevention)
- Merging them would create an unwieldy doc that is harder to navigate than two focused ones
If none of these apply, prefer consolidation. Two docs covering the same ground will eventually drift apart and contradict each other — that is worse than a slightly longer single doc.
### Cross-Doc Conflict Check
Look for outright contradictions between docs in scope:
- Doc A says "always use approach X" while Doc B says "avoid approach X"
- Doc A references a file path that Doc B says was deprecated
- Doc A and Doc B describe different root causes for what appears to be the same problem
Contradictions between docs are more urgent than individual staleness — they actively confuse readers. Flag these for immediate resolution, either through Consolidate (if one is right and the other is a stale version of the same truth) or through targeted Update/Replace.
## Subagent Strategy
Use subagents for context isolation when investigating multiple artifacts — not just because the task sounds complex. Choose the lightest approach that fits:
| Approach | When to use |
|----------|-------------|
| **Main thread only** | Small scope, short docs |
| **Sequential subagents** | 1-2 artifacts with many supporting files to read |
| **Parallel subagents** | 3+ truly independent artifacts with low overlap |
| **Batched subagents** | Broad sweeps — narrow scope first, then investigate in batches |
**When spawning any subagent, include this instruction in its task prompt:**
> Use dedicated file search and read tools (Glob, Grep, Read) for all investigation. Do NOT use shell commands (ls, find, cat, grep, test, bash) for file operations. This avoids permission prompts and is more reliable.
>
> Also read MEMORY.md from the auto memory directory if it exists. Check for notes related to the learning's problem domain. Report any memory-sourced drift signals separately from codebase-sourced evidence, tagged with "(auto memory [claude])" in the evidence section. If MEMORY.md does not exist or is empty, skip this check.
There are two subagent roles:
1. **Investigation subagents** — read-only. They must not edit files, create successors, or delete anything. Each returns: file path, evidence, recommended action, confidence, and open questions. These can run in parallel when artifacts are independent.
2. **Replacement subagents** — write a single new learning to replace a stale one. These run **one at a time, sequentially** (each replacement subagent may need to read significant code, and running multiple in parallel risks context exhaustion). The orchestrator handles all deletions and metadata updates after each replacement completes.
The orchestrator merges investigation results, detects contradictions, coordinates replacement subagents, and performs all deletions/metadata edits centrally. In interactive mode, it asks the user questions on ambiguous cases. In autofix mode, it marks ambiguous cases as stale instead. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing.
## Phase 2: Classify the Right Maintenance Action
After gathering evidence, assign one recommended action.
### Keep
The learning is still accurate and useful. Do not edit the file — report that it was reviewed and remains trustworthy. Only add `last_refreshed` if you are already making a meaningful update for another reason.
### Update
The core solution is still valid but references have drifted (paths, class names, links, code snippets, metadata). Apply the fixes directly.
### Consolidate
Choose **Consolidate** when Phase 1.75 identified docs that overlap heavily but are both materially correct. This is different from Update (which fixes drift in a single doc) and Replace (which rewrites misleading guidance). Consolidate handles the "both right, one subsumes the other" case.
**When to consolidate:**
- Two docs describe the same problem and recommend the same (or compatible) solution
- One doc is a narrow precursor and a newer doc covers the same ground more broadly
- The unique content from the subsumed doc can fit as a section or addendum in the canonical doc
- Keeping both creates drift risk without meaningful retrieval benefit
**When NOT to consolidate** (apply the Retrieval-Value Test from Phase 1.75):
- The docs cover genuinely different sub-problems that someone would search for independently
- Merging would create an unwieldy doc that harms navigation more than drift risk harms accuracy
**Consolidate vs Delete:** If the subsumed doc has unique content worth preserving (edge cases, alternative approaches, extra prevention rules), use Consolidate to merge that content first. If the subsumed doc adds nothing the canonical doc doesn't already say, skip straight to Delete.
The Consolidate action is: merge unique content from the subsumed doc into the canonical doc, then delete the subsumed doc. Not archive — delete. Git history preserves it.
### Replace
Choose **Replace** when the learning's core guidance is now misleading — the recommended fix changed materially, the root cause or architecture shifted, or the preferred pattern is different.
The user may have invoked the refresh months after the original learning was written. Do not ask them for replacement context they are unlikely to have — use agent intelligence to investigate the codebase and synthesize the replacement.
**Evidence assessment:**
By the time you identify a Replace candidate, Phase 1 investigation has already gathered significant evidence: the old learning's claims, what the current code actually does, and where the drift occurred. Assess whether this evidence is sufficient to write a trustworthy replacement:
- **Sufficient evidence** — you understand both what the old learning recommended AND what the current approach is. The investigation found the current code patterns, the new file locations, the changed architecture. → Proceed to write the replacement (see Phase 4 Replace Flow).
- **Insufficient evidence** — the drift is so fundamental that you cannot confidently document the current approach. The entire subsystem was replaced, or the new architecture is too complex to understand from a file scan alone. → Mark as stale in place:
- Add `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD` to the frontmatter
- Report what evidence you found and what is missing
- Recommend the user run `ce:compound` after their next encounter with that area, when they have fresh problem-solving context
### Delete
Choose **Delete** when:
- The code or workflow no longer exists and the problem domain is gone
- The learning is obsolete and has no modern replacement worth documenting
- The learning is fully redundant with another doc (use Consolidate if there is unique content to merge first)
- There is no meaningful successor evidence suggesting it should be replaced instead
Action: delete the file. No archival directory, no metadata — just delete it. Git history preserves every deleted file if recovery is ever needed.
### Before deleting: check if the problem domain is still active
When a learning's referenced files are gone, that is strong evidence — but only that the **implementation** is gone. Before deleting, reason about whether the **problem the learning solves** is still a concern in the codebase:
- A learning about session token storage where `auth_token.rb` is gone — does the application still handle session tokens? If so, the concept persists under a new implementation. That is Replace, not Delete.
- A learning about a deprecated API endpoint where the entire feature was removed — the problem domain is gone. That is Delete.
Do not search mechanically for keywords from the old learning. Instead, understand what problem the learning addresses, then investigate whether that problem domain still exists in the codebase. The agent understands concepts — use that understanding to look for where the problem lives now, not where the old code used to be.
**Auto-delete only when both the implementation AND the problem domain are gone:**
- the referenced code is gone AND the application no longer deals with that problem domain
- the learning is fully superseded by a clearly better successor AND the old doc adds no distinct value
- the document is plainly redundant and adds nothing the canonical doc doesn't already say
If the implementation is gone but the problem domain persists (the app still does auth, still processes payments, still handles migrations), classify as **Replace** — the problem still matters and the current approach should be documented.
Do not keep a learning just because its general advice is "still sound" — if the specific code it references is gone, the learning misleads readers. But do not delete a learning whose problem domain is still active — that knowledge gap should be filled with a replacement.
## Pattern Guidance
Apply the same five outcomes (Keep, Update, Consolidate, Replace, Delete) to pattern docs, but evaluate them as **derived guidance** rather than incident-level learnings. Key differences:
- **Keep**: the underlying learnings still support the generalized rule and examples remain representative
- **Update**: the rule holds but examples, links, scope, or supporting references drifted
- **Consolidate**: two pattern docs generalize the same set of learnings or cover the same design concern — merge into one canonical pattern
- **Replace**: the generalized rule is now misleading, or the underlying learnings support a different synthesis. Base the replacement on the refreshed learning set — do not invent new rules from guesswork
- **Delete**: the pattern is no longer valid, no longer recurring, or fully subsumed by a stronger pattern doc with no unique content remaining
## Phase 3: Ask for Decisions
### Autofix mode
**Skip this entire phase. Do not ask any questions. Do not present options. Do not wait for input.** Proceed directly to Phase 4 and execute all actions based on the classifications from Phase 2:
- Unambiguous Keep, Update, Consolidate, auto-Delete, and Replace (with sufficient evidence) → execute directly
- Ambiguous cases → mark as stale
- Then generate the report (see Output Format)
### Interactive mode
Most Updates and Consolidations should be applied directly without asking. Only ask the user when:
- The right action is genuinely ambiguous (Update vs Replace vs Consolidate vs Delete)
- You are about to Delete a document **and** the evidence is not unambiguous (see auto-delete criteria in Phase 2). When auto-delete criteria are met, proceed without asking.
- You are about to Consolidate and the choice of canonical doc is not clear-cut
- You are about to create a successor via Replace
Do **not** ask questions about whether code changes were intentional, whether the user wants to fix bugs in the code, or other concerns outside doc maintenance. Stay in your lane — doc accuracy.
#### Question Style
Always present choices using the platform's blocking question tool when available (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in plain text and wait for the user's reply before proceeding.
Question rules:
- Ask **one question at a time**
- Prefer **multiple choice**
- Lead with the **recommended option**
- Explain the rationale for the recommendation in one concise sentence
- Avoid asking the user to choose from actions that are not actually plausible
#### Focused Scope
For a single artifact, present:
- file path
- 2-4 bullets of evidence
- recommended action
Then ask:
```text
This [learning/pattern] looks like a [Keep/Update/Consolidate/Replace/Delete].
Why: [one-sentence rationale based on the evidence]
What would you like to do?
1. [Recommended action]
2. [Second plausible action]
3. Skip for now
```
Do not list all five actions unless all five are genuinely plausible.
#### Batch Scope
For several learnings:
1. Group obvious **Keep** cases together
2. Group obvious **Update** cases together when the fixes are straightforward
3. Present **Consolidate** cases together when the canonical doc is clear
4. Present **Replace** cases individually or in very small groups
5. Present **Delete** cases individually unless they are strong auto-delete candidates
Ask for confirmation in stages:
1. Confirm grouped Keep/Update recommendations
2. Then handle Consolidate groups (present the canonical doc and what gets merged)
3. Then handle Replace one at a time
4. Then handle Delete one at a time unless the deletion is unambiguous and safe to auto-apply
#### Broad Scope
If the user asked for a sweeping refresh, keep the interaction incremental:
1. Narrow scope first
2. Investigate a manageable batch
3. Present recommendations
4. Ask whether to continue to the next batch
Do not front-load the user with a full maintenance queue.
## Phase 4: Execute the Chosen Action
### Keep Flow
No file edit by default. Summarize why the learning remains trustworthy.
### Update Flow
Apply in-place edits only when the solution is still substantively correct.
Examples of valid in-place updates:
- Rename `app/models/auth_token.rb` reference to `app/models/session_token.rb`
- Update `module: AuthToken` to `module: SessionToken`
- Fix outdated links to related docs
- Refresh implementation notes after a directory move
Examples that should **not** be in-place updates:
- Fixing a typo with no effect on understanding
- Rewording prose for style alone
- Small cleanup that does not materially improve accuracy or usability
- The old fix is now an anti-pattern
- The system architecture changed enough that the old guidance is misleading
- The troubleshooting path is materially different
Those cases require **Replace**, not Update.
### Consolidate Flow
The orchestrator handles consolidation directly (no subagent needed — the docs are already read and the merge is a focused edit). Process Consolidate candidates by topic cluster. For each cluster identified in Phase 1.75:
1. **Confirm the canonical doc** — the broader, more current, more accurate doc in the cluster.
2. **Extract unique content** from the subsumed doc(s) — anything the canonical doc does not already cover. This might be specific edge cases, additional prevention rules, or alternative debugging approaches.
3. **Merge unique content** into the canonical doc in a natural location. Do not just append — integrate it where it logically belongs. If the unique content is small (a bullet point, a sentence), inline it. If it is a substantial sub-topic, add it as a clearly labeled section.
4. **Update cross-references** — if any other docs reference the subsumed doc, update those references to point to the canonical doc.
5. **Delete the subsumed doc.** Do not archive it, do not add redirect metadata — just delete the file. Git history preserves it.
If a doc cluster has 3+ overlapping docs, process pairwise: consolidate the two most overlapping docs first, then evaluate whether the merged result should be consolidated with the next doc.
**Structural edits beyond merge:** Consolidate also covers the reverse case. If one doc has grown unwieldy and covers multiple distinct problems that would benefit from separate retrieval, it is valid to recommend splitting it. Only do this when the sub-topics are genuinely independent and a maintainer might search for one without needing the other.
### Replace Flow
Process Replace candidates **one at a time, sequentially**. Each replacement is written by a subagent to protect the main context window.
**When evidence is sufficient:**
1. Spawn a single subagent to write the replacement learning. Pass it:
- The old learning's full content
- A summary of the investigation evidence (what changed, what the current code does, why the old guidance is misleading)
- The target path and category (same category as the old learning unless the category itself changed)
2. The subagent writes the new learning following `ce:compound`'s document format: YAML frontmatter (title, category, date, module, component, tags), problem description, root cause, current solution with code examples, and prevention tips. It should use dedicated file search and read tools if it needs additional context beyond what was passed.
3. After the subagent completes, the orchestrator deletes the old learning file. The new learning's frontmatter may include `supersedes: [old learning filename]` for traceability, but this is optional — the git history and commit message provide the same information.
**When evidence is insufficient:**
1. Mark the learning as stale in place:
- Add to frontmatter: `status: stale`, `stale_reason: [what you found]`, `stale_date: YYYY-MM-DD`
2. Report what evidence was found and what is missing
3. Recommend the user run `ce:compound` after their next encounter with that area
### Delete Flow
Delete only when a learning is clearly obsolete, redundant (with no unique content to merge), or its problem domain is gone. Do not delete a document just because it is old — age alone is not a signal.
## Output Format
**The full report MUST be printed as markdown output.** Do not summarize findings internally and then output a one-liner. The report is the deliverable — print every section in full, formatted as readable markdown with headers, tables, and bullet points.
After processing the selected scope, output the following report:
```text
Compound Refresh Summary
========================
Scanned: N learnings
Kept: X
Updated: Y
Consolidated: C
Replaced: Z
Deleted: W
Skipped: V
Marked stale: S
```
Then for EVERY file processed, list:
- The file path
- The classification (Keep/Update/Consolidate/Replace/Delete/Stale)
- What evidence was found -- tag any memory-sourced findings with "(auto memory [claude])" to distinguish them from codebase-sourced evidence
- What action was taken (or recommended)
- For Consolidate: which doc was canonical, what unique content was merged, what was deleted
For **Keep** outcomes, list them under a reviewed-without-edits section so the result is visible without creating git churn.
### Autofix mode report
In autofix mode, the report is the sole deliverable — there is no user present to ask follow-up questions, so the report must be self-contained and complete. **Print the full report. Do not abbreviate, summarize, or skip sections.**
Split actions into two sections:
**Applied** (writes that succeeded):
- For each **Updated** file: the file path, what references were fixed, and why
- For each **Consolidated** cluster: the canonical doc, what unique content was merged from each subsumed doc, and the subsumed docs that were deleted
- For each **Replaced** file: what the old learning recommended vs what the current code does, and the path to the new successor
- For each **Deleted** file: the file path and why it was removed (problem domain gone, fully redundant, etc.)
- For each **Marked stale** file: the file path, what evidence was found, and why it was ambiguous
**Recommended** (actions that could not be written — e.g., permission denied):
- Same detail as above, but framed as recommendations for a human to apply
- Include enough context that the user can apply the change manually or re-run the skill interactively
If all writes succeed, the Recommended section is empty. If no writes succeed (e.g., read-only invocation), all actions appear under Recommended — the report becomes a maintenance plan.
**Legacy cleanup** (if `docs/solutions/_archived/` exists):
- List archived files found and recommend disposition: restore (if still relevant), delete (if truly obsolete), or consolidate (if overlapping with active docs)
## Phase 5: Commit Changes
After all actions are executed and the report is generated, handle committing the changes. Skip this phase if no files were modified (all Keep, or all writes failed).
### Detect git context
Before offering options, check:
1. Which branch is currently checked out (main/master vs feature branch)
2. Whether the working tree has other uncommitted changes beyond what compound-refresh modified
3. Recent commit messages to match the repo's commit style
### Autofix mode
Use sensible defaults — no user to ask:
| Context | Default action |
|---------|---------------|
| On main/master | Create a branch named for what was refreshed (e.g., `docs/refresh-auth-and-ci-learnings`), commit, attempt to open a PR. If PR creation fails, report the branch name. |
| On a feature branch | Commit as a separate commit on the current branch |
| Git operations fail | Include the recommended git commands in the report and continue |
Stage only the files that compound-refresh modified — not other dirty files in the working tree.
### Interactive mode
First, run `git branch --show-current` to determine the current branch. Then present the correct options based on the result. Stage only compound-refresh files regardless of which option the user picks.
**If the current branch is main, master, or the repo's default branch:**
1. Create a branch, commit, and open a PR (recommended) — the branch name should be specific to what was refreshed, not generic (e.g., `docs/refresh-auth-learnings` not `docs/compound-refresh`)
2. Commit directly to `{current branch name}`
3. Don't commit — I'll handle it
**If the current branch is a feature branch, clean working tree:**
1. Commit to `{current branch name}` as a separate commit (recommended)
2. Create a separate branch and commit
3. Don't commit
**If the current branch is a feature branch, dirty working tree (other uncommitted changes):**
1. Commit only the compound-refresh changes to `{current branch name}` (selective staging — other dirty files stay untouched)
2. Don't commit
### Commit message
Write a descriptive commit message that:
- Summarizes what was refreshed (e.g., "update 3 stale learnings, consolidate 2 overlapping docs, delete 1 obsolete doc")
- Follows the repo's existing commit conventions (check recent git log for style)
- Is succinct — the details are in the changed files themselves
## Relationship to ce:compound
- `ce:compound` captures a newly solved, verified problem
- `ce:compound-refresh` maintains older learnings as the codebase evolves — both their individual accuracy and their collective design as a document set
Use **Replace** only when the refresh process has enough real evidence to write a trustworthy successor. When evidence is insufficient, mark as stale and recommend `ce:compound` for when the user next encounters that problem area.
Use **Consolidate** proactively when the document set has grown organically and redundancy has crept in. Every `ce:compound` invocation adds a new doc — over time, multiple docs may cover the same problem from slightly different angles. Periodic consolidation keeps the document set lean and authoritative.

View File

@@ -37,6 +37,27 @@ Compact-safe mode exists as a lightweight alternative — see the **Compact-Safe
Phase 1 subagents return TEXT DATA to the orchestrator. They must NOT use Write, Edit, or create any files. Only the orchestrator (Phase 2) writes the final documentation file.
</critical_requirement>
### Phase 0.5: Auto Memory Scan
Before launching Phase 1 subagents, check the auto memory directory for notes relevant to the problem being documented.
1. Read MEMORY.md from the auto memory directory (the path is known from the system prompt context)
2. If the directory or MEMORY.md does not exist, is empty, or is unreadable, skip this step and proceed to Phase 1 unchanged
3. Scan the entries for anything related to the problem being documented -- use semantic judgment, not keyword matching
4. If relevant entries are found, prepare a labeled excerpt block:
```
## Supplementary notes from auto memory
Treat as additional context, not primary evidence. Conversation history
and codebase findings take priority over these notes.
[relevant entries here]
```
5. Pass this block as additional context to the Context Analyzer and Solution Extractor task prompts in Phase 1. If any memory notes end up in the final documentation (e.g., as part of the investigation steps or root cause analysis), tag them with "(auto memory [claude])" so their origin is clear to future readers.
If no relevant entries are found, proceed to Phase 1 without passing memory context.
### Phase 1: Parallel Research
<parallel_tasks>
@@ -46,32 +67,84 @@ Launch these subagents IN PARALLEL. Each returns text data to the orchestrator.
#### 1. **Context Analyzer**
- Extracts conversation history
- Identifies problem type, component, symptoms
- Validates against schema
- Returns: YAML frontmatter skeleton
- Incorporates auto memory excerpts (if provided by the orchestrator) as supplementary evidence when identifying problem type, component, and symptoms
- Validates all enum fields against the schema values below
- Maps problem_type to the `docs/solutions/` category directory
- Suggests a filename using the pattern `[sanitized-problem-slug]-[date].md`
- Returns: YAML frontmatter skeleton (must include `category:` field mapped from problem_type), category directory path, and suggested filename
**Schema enum values (validate against these exactly):**
- **problem_type**: build_error, test_failure, runtime_error, performance_issue, database_issue, security_issue, ui_bug, integration_issue, logic_error, developer_experience, workflow_issue, best_practice, documentation_gap
- **component**: rails_model, rails_controller, rails_view, service_object, background_job, database, frontend_stimulus, hotwire_turbo, email_processing, brief_system, assistant, authentication, payments, development_workflow, testing_framework, documentation, tooling
- **root_cause**: missing_association, missing_include, missing_index, wrong_api, scope_issue, thread_violation, async_timing, memory_leak, config_error, logic_error, test_isolation, missing_validation, missing_permission, missing_workflow_step, inadequate_documentation, missing_tooling, incomplete_setup
- **resolution_type**: code_fix, migration, config_change, test_fix, dependency_update, environment_setup, workflow_improvement, documentation_update, tooling_addition, seed_data_update
- **severity**: critical, high, medium, low
**Category mapping (problem_type -> directory):**
| problem_type | Directory |
|---|---|
| build_error | build-errors/ |
| test_failure | test-failures/ |
| runtime_error | runtime-errors/ |
| performance_issue | performance-issues/ |
| database_issue | database-issues/ |
| security_issue | security-issues/ |
| ui_bug | ui-bugs/ |
| integration_issue | integration-issues/ |
| logic_error | logic-errors/ |
| developer_experience | developer-experience/ |
| workflow_issue | workflow-issues/ |
| best_practice | best-practices/ |
| documentation_gap | documentation-gaps/ |
#### 2. **Solution Extractor**
- Analyzes all investigation steps
- Identifies root cause
- Extracts working solution with code examples
- Returns: Solution content block
- Incorporates auto memory excerpts (if provided by the orchestrator) as supplementary evidence -- conversation history and the verified fix take priority; if memory notes contradict the conversation, note the contradiction as cautionary context
- Develops prevention strategies and best practices guidance
- Generates test cases if applicable
- Returns: Solution content block including prevention section
**Expected output sections (follow this structure):**
- **Problem**: 1-2 sentence description of the issue
- **Symptoms**: Observable symptoms (error messages, behavior)
- **What Didn't Work**: Failed investigation attempts and why they failed
- **Solution**: The actual fix with code examples (before/after when applicable)
- **Why This Works**: Root cause explanation and why the solution addresses it
- **Prevention**: Strategies to avoid recurrence, best practices, and test cases. Include concrete code examples where applicable (e.g., gem configurations, test assertions, linting rules)
#### 3. **Related Docs Finder**
- Searches `docs/solutions/` for related documentation
- Identifies cross-references and links
- Finds related GitHub issues
- Returns: Links and relationships
- Flags any related learning or pattern docs that may now be stale, contradicted, or overly broad
- **Assesses overlap** with the new doc being created across five dimensions: problem statement, root cause, solution approach, referenced files, and prevention rules. Score as:
- **High**: 4-5 dimensions match — essentially the same problem solved again
- **Moderate**: 2-3 dimensions match — same area but different angle or solution
- **Low**: 0-1 dimensions match — related but distinct
- Returns: Links, relationships, refresh candidates, and overlap assessment (score + which dimensions matched)
#### 4. **Prevention Strategist**
- Develops prevention strategies
- Creates best practices guidance
- Generates test cases if applicable
- Returns: Prevention/testing content
**Search strategy (grep-first filtering for efficiency):**
#### 5. **Category Classifier**
- Determines optimal `docs/solutions/` category
- Validates category against schema
- Suggests filename based on slug
- Returns: Final path and filename
1. Extract keywords from the problem context: module names, technical terms, error messages, component types
2. If the problem category is clear, narrow search to the matching `docs/solutions/<category>/` directory
3. Use the native content-search tool (e.g., Grep in Claude Code) to pre-filter candidate files BEFORE reading any content. Run multiple searches in parallel, case-insensitive, targeting frontmatter fields. These are template patterns -- substitute actual keywords:
- `title:.*<keyword>`
- `tags:.*(<keyword1>|<keyword2>)`
- `module:.*<module name>`
- `component:.*<component>`
4. If search returns >25 candidates, re-run with more specific patterns. If <3, broaden to full content search
5. Read only frontmatter (first 30 lines) of candidate files to score relevance
6. Fully read only strong/moderate matches
7. Return distilled links and relationships, not raw file contents
**GitHub issue search:**
Prefer the `gh` CLI for searching related issues: `gh issue list --search "<keywords>" --state all --limit 5`. If `gh` is not installed, fall back to the GitHub MCP tools (e.g., `unblocked` data_retrieval) if available. If neither is available, skip GitHub issue search and note it was skipped in the output.
</parallel_tasks>
@@ -84,13 +157,73 @@ Launch these subagents IN PARALLEL. Each returns text data to the orchestrator.
The orchestrating agent (main conversation) performs these steps:
1. Collect all text results from Phase 1 subagents
2. Assemble complete markdown file from the collected pieces
3. Validate YAML frontmatter against schema
4. Create directory if needed: `mkdir -p docs/solutions/[category]/`
5. Write the SINGLE final file: `docs/solutions/[category]/[filename].md`
2. **Check the overlap assessment** from the Related Docs Finder before deciding what to write:
| Overlap | Action |
|---------|--------|
| **High** — existing doc covers the same problem, root cause, and solution | **Update the existing doc** with fresher context (new code examples, updated references, additional prevention tips) rather than creating a duplicate. The existing doc's path and structure stay the same. |
| **Moderate** — same problem area but different angle, root cause, or solution | **Create the new doc** normally. Flag the overlap for Phase 2.5 to recommend consolidation review. |
| **Low or none** | **Create the new doc** normally. |
The reason to update rather than create: two docs describing the same problem and solution will inevitably drift apart. The newer context is fresher and more trustworthy, so fold it into the existing doc rather than creating a second one that immediately needs consolidation.
When updating an existing doc, preserve its file path and frontmatter structure. Update the solution, code examples, prevention tips, and any stale references. Add a `last_updated: YYYY-MM-DD` field to the frontmatter. Do not change the title unless the problem framing has materially shifted.
3. Assemble complete markdown file from the collected pieces
4. Validate YAML frontmatter against schema
5. Create directory if needed: `mkdir -p docs/solutions/[category]/`
6. Write the file: either the updated existing doc or the new `docs/solutions/[category]/[filename].md`
</sequential_tasks>
### Phase 2.5: Selective Refresh Check
After writing the new learning, decide whether this new solution is evidence that older docs should be refreshed.
`ce:compound-refresh` is **not** a default follow-up. Use it selectively when the new learning suggests an older learning or pattern doc may now be inaccurate.
It makes sense to invoke `ce:compound-refresh` when one or more of these are true:
1. A related learning or pattern doc recommends an approach that the new fix now contradicts
2. The new fix clearly supersedes an older documented solution
3. The current work involved a refactor, migration, rename, or dependency upgrade that likely invalidated references in older docs
4. A pattern doc now looks overly broad, outdated, or no longer supported by the refreshed reality
5. The Related Docs Finder surfaced high-confidence refresh candidates in the same problem space
6. The Related Docs Finder reported **moderate overlap** with an existing doc — there may be consolidation opportunities that benefit from a focused review
It does **not** make sense to invoke `ce:compound-refresh` when:
1. No related docs were found
2. Related docs still appear consistent with the new learning
3. The overlap is superficial and does not change prior guidance
4. Refresh would require a broad historical review with weak evidence
Use these rules:
- If there is **one obvious stale candidate**, invoke `ce:compound-refresh` with a narrow scope hint after the new learning is written
- If there are **multiple candidates in the same area**, ask the user whether to run a targeted refresh for that module, category, or pattern set
- If context is already tight or you are in compact-safe mode, do not expand into a broad refresh automatically; instead recommend `ce:compound-refresh` as the next step with a scope hint
When invoking or recommending `ce:compound-refresh`, be explicit about the argument to pass. Prefer the narrowest useful scope:
- **Specific file** when one learning or pattern doc is the likely stale artifact
- **Module or component name** when several related docs may need review
- **Category name** when the drift is concentrated in one solutions area
- **Pattern filename or pattern topic** when the stale guidance lives in `docs/solutions/patterns/`
Examples:
- `/ce:compound-refresh plugin-versioning-requirements`
- `/ce:compound-refresh payments`
- `/ce:compound-refresh performance-issues`
- `/ce:compound-refresh critical-patterns`
A single scope hint may still expand to multiple related docs when the change is cross-cutting within one domain, category, or pattern area.
Do not invoke `ce:compound-refresh` without an argument unless the user explicitly wants a broad sweep.
Always capture the new learning first. Refresh is a targeted maintenance follow-up, not a prerequisite for documentation.
### Phase 3: Optional Enhancement
**WAIT for Phase 2 to complete before proceeding.**
@@ -119,7 +252,7 @@ When context budget is tight, this mode skips parallel subagents entirely. The o
The orchestrator (main conversation) performs ALL of the following in one sequential pass:
1. **Extract from conversation**: Identify the problem, root cause, and solution from conversation history
1. **Extract from conversation**: Identify the problem, root cause, and solution from conversation history. Also read MEMORY.md from the auto memory directory if it exists -- use any relevant notes as supplementary context alongside conversation history. Tag any memory-sourced content incorporated into the final doc with "(auto memory [claude])"
2. **Classify**: Determine category and filename (same categories as full mode)
3. **Write minimal doc**: Create `docs/solutions/[category]/[filename].md` with:
- YAML frontmatter (title, category, date, tags)
@@ -143,6 +276,8 @@ re-run /compound in a fresh session.
**No subagents are launched. No parallel tasks. One file written.**
In compact-safe mode, the overlap check is skipped (no Related Docs Finder subagent). This means compact-safe mode may create a doc that overlaps with an existing one. That is acceptable — `ce:compound-refresh` will catch it later. Only suggest `ce:compound-refresh` if there is an obvious narrow refresh target. Do not broaden into a large refresh sweep from a compact-safe session.
---
## What It Captures
@@ -192,19 +327,20 @@ re-run /compound in a fresh session.
|----------|-----------|
| Subagents write files like `context-analysis.md`, `solution-draft.md` | Subagents return text data; orchestrator writes one final file |
| Research and assembly run in parallel | Research completes → then assembly runs |
| Multiple files created during workflow | Single file: `docs/solutions/[category]/[filename].md` |
| Multiple files created during workflow | One file written or updated: `docs/solutions/[category]/[filename].md` |
| Creating a new doc when an existing doc covers the same problem | Check overlap assessment; update the existing doc when overlap is high |
## Success Output
```
✓ Documentation complete
Auto memory: 2 relevant entries used as supplementary evidence
Subagent Results:
✓ Context Analyzer: Identified performance_issue in brief_system
✓ Solution Extractor: 3 code fixes
✓ Context Analyzer: Identified performance_issue in brief_system, category: performance-issues/
✓ Solution Extractor: 3 code fixes, prevention strategies
✓ Related Docs Finder: 2 related issues
✓ Prevention Strategist: Prevention strategies, test suggestions
✓ Category Classifier: `performance-issues`
Specialized Agent Reviews (Auto-Triggered):
✓ performance-oracle: Validated query optimization approach
@@ -226,6 +362,19 @@ What's next?
5. Other
```
**Alternate output (when updating an existing doc due to high overlap):**
```
✓ Documentation updated (existing doc refreshed with current context)
Overlap detected: docs/solutions/performance-issues/n-plus-one-queries.md
Matched dimensions: problem statement, root cause, solution, referenced files
Action: Updated existing doc with fresher code examples and prevention tips
File updated:
- docs/solutions/performance-issues/n-plus-one-queries.md (added last_updated: 2026-03-24)
```
## The Compounding Philosophy
This creates a compounding knowledge system:

View File

@@ -0,0 +1,370 @@
---
name: ce:ideate
description: "Generate and critically evaluate grounded improvement ideas for the current project. Use when asking what to improve, requesting idea generation, exploring surprising improvements, or wanting the AI to proactively suggest strong project directions before brainstorming one in depth. Triggers on phrases like 'what should I improve', 'give me ideas', 'ideate on this project', 'surprise me with improvements', 'what would you change', or any request for AI-generated project improvement suggestions rather than refining the user's own idea."
argument-hint: "[optional: feature, focus area, or constraint]"
---
# Generate Improvement Ideas
**Note: The current year is 2026.** Use this when dating ideation documents and checking recent ideation artifacts.
`ce:ideate` precedes `ce:brainstorm`.
- `ce:ideate` answers: "What are the strongest ideas worth exploring?"
- `ce:brainstorm` answers: "What exactly should one chosen idea mean?"
- `ce:plan` answers: "How should it be built?"
This workflow produces a ranked ideation artifact in `docs/ideation/`. It does **not** produce requirements, plans, or code.
## Interaction Method
Use the platform's blocking question tool when available (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
Ask one question at a time. Prefer concise single-select choices when natural options exist.
## Focus Hint
<focus_hint> #$ARGUMENTS </focus_hint>
Interpret any provided argument as optional context. It may be:
- a concept such as `DX improvements`
- a path such as `plugins/compound-engineering/skills/`
- a constraint such as `low-complexity quick wins`
- a volume hint such as `top 3`, `100 ideas`, or `raise the bar`
If no argument is provided, proceed with open-ended ideation.
## Core Principles
1. **Ground before ideating** - Scan the actual codebase first. Do not generate abstract product advice detached from the repository.
2. **Diverge before judging** - Generate the full idea set before evaluating any individual idea.
3. **Use adversarial filtering** - The quality mechanism is explicit rejection with reasons, not optimistic ranking.
4. **Preserve the original prompt mechanism** - Generate many ideas, critique the whole list, then explain only the survivors in detail. Do not let extra process obscure this pattern.
5. **Use agent diversity to improve the candidate pool** - Parallel sub-agents are a support mechanism for richer idea generation and critique, not the core workflow itself.
6. **Preserve the artifact early** - Write the ideation document before presenting results so work survives interruptions.
7. **Route action into brainstorming** - Ideation identifies promising directions; `ce:brainstorm` defines the selected one precisely enough for planning.
## Execution Flow
### Phase 0: Resume and Scope
#### 0.1 Check for Recent Ideation Work
Look in `docs/ideation/` for ideation documents created within the last 30 days.
Treat a prior ideation doc as relevant when:
- the topic matches the requested focus
- the path or subsystem overlaps the requested focus
- the request is open-ended and there is an obvious recent open ideation doc
- the issue-grounded status matches: do not offer to resume a non-issue ideation when the current argument indicates issue-tracker intent, or vice versa — treat these as distinct topics
If a relevant doc exists, ask whether to:
1. continue from it
2. start fresh
If continuing:
- read the document
- summarize what has already been explored
- preserve previous idea statuses and session log entries
- update the existing file instead of creating a duplicate
#### 0.2 Interpret Focus and Volume
Infer three things from the argument:
- **Focus context** - concept, path, constraint, or open-ended
- **Volume override** - any hint that changes candidate or survivor counts
- **Issue-tracker intent** - whether the user wants issue/bug data as an input source
Issue-tracker intent triggers when the argument's primary intent is about analyzing issue patterns: `bugs`, `github issues`, `open issues`, `issue patterns`, `what users are reporting`, `bug reports`, `issue themes`.
Do NOT trigger on arguments that merely mention bugs as a focus: `bug in auth`, `fix the login issue`, `the signup bug` — these are focus hints, not requests to analyze the issue tracker.
When combined (e.g., `top 3 bugs in authentication`): detect issue-tracker intent first, volume override second, remainder is the focus hint. The focus narrows which issues matter; the volume override controls survivor count.
Default volume:
- each ideation sub-agent generates about 7-8 ideas (yielding 30-40 raw ideas across agents, ~20-30 after dedupe)
- keep the top 5-7 survivors
Honor clear overrides such as:
- `top 3`
- `100 ideas`
- `go deep`
- `raise the bar`
Use reasonable interpretation rather than formal parsing.
### Phase 1: Codebase Scan
Before generating ideas, gather codebase context.
Run agents in parallel in the **foreground** (do not use background dispatch — the results are needed before proceeding):
1. **Quick context scan** — dispatch a general-purpose sub-agent with this prompt:
> Read the project's AGENTS.md (or CLAUDE.md only as compatibility fallback, then README.md if neither exists), then discover the top-level directory layout using the native file-search/glob tool (e.g., `Glob` with pattern `*` or `*/*` in Claude Code). Return a concise summary (under 30 lines) covering:
> - project shape (language, framework, top-level directory layout)
> - notable patterns or conventions
> - obvious pain points or gaps
> - likely leverage points for improvement
>
> Keep the scan shallow — read only top-level documentation and directory structure. Do not analyze GitHub issues, templates, or contribution guidelines. Do not do deep code search.
>
> Focus hint: {focus_hint}
2. **Learnings search** — dispatch `compound-engineering:research:learnings-researcher` with a brief summary of the ideation focus.
3. **Issue intelligence** (conditional) — if issue-tracker intent was detected in Phase 0.2, dispatch `compound-engineering:research:issue-intelligence-analyst` with the focus hint. If a focus hint is present, pass it so the agent can weight its clustering toward that area. Run this in parallel with agents 1 and 2.
If the agent returns an error (gh not installed, no remote, auth failure), log a warning to the user ("Issue analysis unavailable: {reason}. Proceeding with standard ideation.") and continue with the existing two-agent grounding.
If the agent reports fewer than 5 total issues, note "Insufficient issue signal for theme analysis" and proceed with default ideation frames in Phase 2.
Consolidate all results into a short grounding summary. When issue intelligence is present, keep it as a distinct section so ideation sub-agents can distinguish between code-observed and user-reported signals:
- **Codebase context** — project shape, notable patterns, obvious pain points, likely leverage points
- **Past learnings** — relevant institutional knowledge from docs/solutions/
- **Issue intelligence** (when present) — theme summaries from the issue intelligence agent, preserving theme titles, descriptions, issue counts, and trend directions
Do **not** do external research in v1.
### Phase 2: Divergent Ideation
Follow this mechanism exactly:
1. Generate the full candidate list before critiquing any idea.
2. Each sub-agent targets about 7-8 ideas by default. With 4-6 agents this yields 30-40 raw ideas, which merge and dedupe to roughly 20-30 unique candidates. Adjust the per-agent target when volume overrides apply (e.g., "100 ideas" raises it, "top 3" may lower the survivor count instead).
3. Push past the safe obvious layer. Each agent's first few ideas tend to be obvious — push past them.
4. Ground every idea in the Phase 1 scan.
5. Use this prompting pattern as the backbone:
- first generate many ideas
- then challenge them systematically
- then explain only the survivors in detail
6. If the platform supports sub-agents, use them to improve diversity in the candidate pool rather than to replace the core mechanism.
7. Give each ideation sub-agent the same:
- grounding summary
- focus hint
- per-agent volume target (~7-8 ideas by default)
- instruction to generate raw candidates only, not critique
8. When using sub-agents, assign each one a different ideation frame as a **starting bias, not a constraint**. Prompt each agent to begin from its assigned perspective but follow any promising thread wherever it leads — cross-cutting ideas that span multiple frames are valuable, not out of scope.
**Frame selection depends on whether issue intelligence is active:**
**When issue-tracker intent is active and themes were returned:**
- Each theme with `confidence: high` or `confidence: medium` becomes an ideation frame. The frame prompt uses the theme title and description as the starting bias.
- If fewer than 4 cluster-derived frames, pad with default frames in this order: "leverage and compounding effects", "assumption-breaking or reframing", "inversion, removal, or automation of a painful step". These complement issue-grounded themes by pushing beyond the reported problems.
- Cap at 6 total frames. If more than 6 themes qualify, use the top 6 by issue count; note remaining themes in the grounding summary as "minor themes" so sub-agents are still aware of them.
**When issue-tracker intent is NOT active (default):**
- user or operator pain and friction
- unmet need or missing capability
- inversion, removal, or automation of a painful step
- assumption-breaking or reframing
- leverage and compounding effects
- extreme cases, edge cases, or power-user pressure
9. Ask each ideation sub-agent to return a standardized structure for each idea so the orchestrator can merge and reason over the outputs consistently. Prefer a compact JSON-like structure with:
- title
- summary
- why_it_matters
- evidence or grounding hooks
- optional local signals such as boldness or focus_fit
10. Merge and dedupe the sub-agent outputs into one master candidate list.
11. **Synthesize cross-cutting combinations.** After deduping, scan the merged list for ideas from different frames that together suggest something stronger than either alone. If two or more ideas naturally combine into a higher-leverage proposal, add the combined idea to the list (expect 3-5 additions at most). This synthesis step belongs to the orchestrator because it requires seeing all ideas simultaneously.
12. Spread ideas across multiple dimensions when justified:
- workflow/DX
- reliability
- extensibility
- missing capabilities
- docs/knowledge compounding
- quality and maintenance
- leverage on future work
13. If a focus was provided, pass it to every ideation sub-agent and weight the merged list toward it without excluding stronger adjacent ideas.
The mechanism to preserve is:
- generate many ideas first
- critique the full combined list second
- explain only the survivors in detail
The sub-agent pattern to preserve is:
- independent ideation with frames as starting biases first
- orchestrator merge, dedupe, and cross-cutting synthesis second
- critique only after the combined and synthesized list exists
### Phase 3: Adversarial Filtering
Review every generated idea critically.
Prefer a two-layer critique:
1. Have one or more skeptical sub-agents attack the merged list from distinct angles.
2. Have the orchestrator synthesize those critiques, apply the rubric consistently, score the survivors, and decide the final ranking.
Do not let critique agents generate replacement ideas in this phase unless explicitly refining.
Critique agents may provide local judgments, but final scoring authority belongs to the orchestrator so the ranking stays consistent across different frames and perspectives.
For each rejected idea, write a one-line reason.
Use rejection criteria such as:
- too vague
- not actionable
- duplicates a stronger idea
- not grounded in the current codebase
- too expensive relative to likely value
- already covered by existing workflows or docs
- interesting but better handled as a brainstorm variant, not a product improvement
Use a consistent survivor rubric that weighs:
- groundedness in the current repo
- expected value
- novelty
- pragmatism
- leverage on future work
- implementation burden
- overlap with stronger ideas
Target output:
- keep 5-7 survivors by default
- if too many survive, run a second stricter pass
- if fewer than 5 survive, report that honestly rather than lowering the bar
### Phase 4: Present the Survivors
Present the surviving ideas to the user before writing the durable artifact.
This first presentation is a review checkpoint, not the final archived result.
Present only the surviving ideas in structured form:
- title
- description
- rationale
- downsides
- confidence score
- estimated complexity
Then include a brief rejection summary so the user can see what was considered and cut.
Keep the presentation concise. The durable artifact holds the full record.
Allow brief follow-up questions and lightweight clarification before writing the artifact.
Do not write the ideation doc yet unless:
- the user indicates the candidate set is good enough to preserve
- the user asks to refine and continue in a way that should be recorded
- the workflow is about to hand off to `ce:brainstorm`, Proof sharing, or session end
### Phase 5: Write the Ideation Artifact
Write the ideation artifact after the candidate set has been reviewed enough to preserve.
Always write or update the artifact before:
- handing off to `ce:brainstorm`
- sharing to Proof
- ending the session
To write the artifact:
1. Ensure `docs/ideation/` exists
2. Choose the file path:
- `docs/ideation/YYYY-MM-DD-<topic>-ideation.md`
- `docs/ideation/YYYY-MM-DD-open-ideation.md` when no focus exists
3. Write or update the ideation document
Use this structure and omit clearly irrelevant fields only when necessary:
```markdown
---
date: YYYY-MM-DD
topic: <kebab-case-topic>
focus: <optional focus hint>
---
# Ideation: <Title>
## Codebase Context
[Grounding summary from Phase 1]
## Ranked Ideas
### 1. <Idea Title>
**Description:** [Concrete explanation]
**Rationale:** [Why this improves the project]
**Downsides:** [Tradeoffs or costs]
**Confidence:** [0-100%]
**Complexity:** [Low / Medium / High]
**Status:** [Unexplored / Explored]
## Rejection Summary
| # | Idea | Reason Rejected |
|---|------|-----------------|
| 1 | <Idea> | <Reason rejected> |
## Session Log
- YYYY-MM-DD: Initial ideation — <candidate count> generated, <survivor count> survived
```
If resuming:
- update the existing file in place
- append to the session log
- preserve explored markers
### Phase 6: Refine or Hand Off
After presenting the results, ask what should happen next.
Offer these options:
1. brainstorm a selected idea
2. refine the ideation
3. share to Proof
4. end the session
#### 6.1 Brainstorm a Selected Idea
If the user selects an idea:
- write or update the ideation doc first
- mark that idea as `Explored`
- note the brainstorm date in the session log
- invoke `ce:brainstorm` with the selected idea as the seed
Do **not** skip brainstorming and go straight to planning from ideation output.
#### 6.2 Refine the Ideation
Route refinement by intent:
- `add more ideas` or `explore new angles` -> return to Phase 2
- `re-evaluate` or `raise the bar` -> return to Phase 3
- `dig deeper on idea #N` -> expand only that idea's analysis
After each refinement:
- update the ideation document before any handoff, sharing, or session end
- append a session log entry
#### 6.3 Share to Proof
If requested, share the ideation document using the standard Proof markdown upload pattern already used elsewhere in the plugin.
Return to the next-step options after sharing.
#### 6.4 End the Session
When ending:
- offer to commit only the ideation doc
- do not create a branch
- do not push
- if the user declines, leave the file uncommitted
## Quality Bar
Before finishing, check:
- the idea set is grounded in the actual repo
- the candidate list was generated before filtering
- the original many-ideas -> critique -> survivors mechanism was preserved
- if sub-agents were used, they improved diversity without replacing the core workflow
- every rejected idea has a reason
- survivors are materially better than a naive "give me ideas" list
- the artifact was written before any handoff, sharing, or session end
- acting on an idea routes to `ce:brainstorm`, not directly to implementation

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,31 @@
# Diff Scope Rules
These rules apply to every reviewer. They define what is "your code to review" versus pre-existing context.
## Scope Discovery
Determine the diff to review using this priority order:
1. **User-specified scope.** If the caller passed `BASE:`, `FILES:`, or `DIFF:` markers, use that scope exactly.
2. **Working copy changes.** If there are unstaged or staged changes (`git diff HEAD` is non-empty), review those.
3. **Unpushed commits vs base branch.** If the working copy is clean, review `git diff $(git merge-base HEAD <base>)..HEAD` where `<base>` is the default branch (main or master).
The scope step in the SKILL.md handles discovery and passes you the resolved diff. You do not need to run git commands yourself.
## Finding Classification Tiers
Every finding you report falls into one of three tiers based on its relationship to the diff:
### Primary (directly changed code)
Lines added or modified in the diff. This is your main focus. Report findings against these lines at full confidence.
### Secondary (immediately surrounding code)
Unchanged code within the same function, method, or block as a changed line. If a change introduces a bug that's only visible by reading the surrounding context, report it -- but note that the issue exists in the interaction between new and existing code.
### Pre-existing (unrelated to this diff)
Issues in unchanged code that the diff didn't touch and doesn't interact with. Mark these as `"pre_existing": true` in your output. They're reported separately and don't count toward the review verdict.
**The rule:** If you'd flag the same issue on an identical diff that didn't include the surrounding file, it's pre-existing. If the diff makes the issue *newly relevant* (e.g., a new caller hits an existing buggy function), it's secondary.

View File

@@ -0,0 +1,128 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Code Review Findings",
"description": "Structured output schema for code review sub-agents",
"type": "object",
"required": ["reviewer", "findings", "residual_risks", "testing_gaps"],
"properties": {
"reviewer": {
"type": "string",
"description": "Persona name that produced this output (e.g., 'correctness', 'security')"
},
"findings": {
"type": "array",
"description": "List of code review findings. Empty array if no issues found.",
"items": {
"type": "object",
"required": [
"title",
"severity",
"file",
"line",
"why_it_matters",
"autofix_class",
"owner",
"requires_verification",
"confidence",
"evidence",
"pre_existing"
],
"properties": {
"title": {
"type": "string",
"description": "Short, specific issue title. 10 words or fewer.",
"maxLength": 100
},
"severity": {
"type": "string",
"enum": ["P0", "P1", "P2", "P3"],
"description": "Issue severity level"
},
"file": {
"type": "string",
"description": "Relative file path from repository root"
},
"line": {
"type": "integer",
"description": "Primary line number of the issue",
"minimum": 1
},
"why_it_matters": {
"type": "string",
"description": "Impact and failure mode -- not 'what is wrong' but 'what breaks'"
},
"autofix_class": {
"type": "string",
"enum": ["safe_auto", "gated_auto", "manual", "advisory"],
"description": "Reviewer's conservative recommendation for how this issue should be handled after synthesis"
},
"owner": {
"type": "string",
"enum": ["review-fixer", "downstream-resolver", "human", "release"],
"description": "Who should own the next action for this finding after synthesis"
},
"requires_verification": {
"type": "boolean",
"description": "Whether any fix for this finding must be re-verified with targeted tests or a follow-up review pass"
},
"suggested_fix": {
"type": ["string", "null"],
"description": "Concrete minimal fix. Omit or null if no good fix is obvious -- a bad suggestion is worse than none."
},
"confidence": {
"type": "number",
"description": "Reviewer confidence in this finding, calibrated per persona",
"minimum": 0.0,
"maximum": 1.0
},
"evidence": {
"type": "array",
"description": "Code-grounded evidence: snippets, line references, or pattern descriptions. At least 1 item.",
"items": { "type": "string" },
"minItems": 1
},
"pre_existing": {
"type": "boolean",
"description": "True if this issue exists in unchanged code unrelated to the current diff"
}
}
}
},
"residual_risks": {
"type": "array",
"description": "Risks the reviewer noticed but could not confirm as findings",
"items": { "type": "string" }
},
"testing_gaps": {
"type": "array",
"description": "Missing test coverage the reviewer identified",
"items": { "type": "string" }
}
},
"_meta": {
"confidence_thresholds": {
"suppress": "Below 0.60 -- do not report. Finding is speculative noise.",
"flag": "0.60-0.69 -- include only when the persona's calibration says the issue is actionable at that confidence.",
"report": "0.70+ -- report with full confidence."
},
"severity_definitions": {
"P0": "Critical breakage, exploitable vulnerability, data loss/corruption. Must fix before merge.",
"P1": "High-impact defect likely hit in normal usage, breaking contract. Should fix.",
"P2": "Moderate issue with meaningful downside (edge case, perf regression, maintainability trap). Fix if straightforward.",
"P3": "Low-impact, narrow scope, minor improvement. User's discretion."
},
"autofix_classes": {
"safe_auto": "Local, deterministic code or test fix suitable for the in-skill fixer in autonomous mode.",
"gated_auto": "Concrete fix exists, but it changes behavior, permissions, contracts, or other sensitive areas that deserve explicit approval.",
"manual": "Actionable issue that should become residual work rather than an in-skill autofix.",
"advisory": "Informational or operational item that should be surfaced in the report only."
},
"owners": {
"review-fixer": "The in-skill fixer can own this when policy allows.",
"downstream-resolver": "Turn this into residual work for later resolution.",
"human": "A person must make a judgment call before code changes should continue.",
"release": "Operational or rollout follow-up; do not convert into code-fix work automatically."
}
}
}

View File

@@ -0,0 +1,63 @@
# Persona Catalog
13 reviewer personas organized in three tiers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review.
## Always-on (3 personas + 2 CE agents)
Spawned on every review regardless of diff content.
**Persona agents (structured JSON output):**
| Persona | Agent | Focus |
|---------|-------|-------|
| `correctness` | `compound-engineering:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation, intent compliance |
| `testing` | `compound-engineering:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests, missing edge case tests |
| `maintainability` | `compound-engineering:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, premature abstraction |
**CE agents (unstructured output, synthesized separately):**
| Agent | Focus |
|-------|-------|
| `compound-engineering:review:agent-native-reviewer` | Verify new features are agent-accessible |
| `compound-engineering:research:learnings-researcher` | Search docs/solutions/ for past issues related to this PR's modules and patterns |
## Conditional (5 personas)
Spawned when the orchestrator identifies relevant patterns in the diff. The orchestrator reads the full diff and reasons about selection -- this is agent judgment, not keyword matching.
| Persona | Agent | Select when diff touches... |
|---------|-------|---------------------------|
| `security` | `compound-engineering:review:security-reviewer` | Auth middleware, public endpoints, user input handling, permission checks, secrets management |
| `performance` | `compound-engineering:review:performance-reviewer` | Database queries, ORM calls, loop-heavy data transforms, caching layers, async/concurrent code |
| `api-contract` | `compound-engineering:review:api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning |
| `data-migrations` | `compound-engineering:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
| `reliability` | `compound-engineering:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
## Language & Framework Conditional (5 personas)
Spawned when the orchestrator identifies language or framework-specific patterns in the diff. These provide deeper domain expertise than the general-purpose personas above.
| Persona | Agent | Select when diff touches... |
|---------|-------|---------------------------|
| `python-quality` | `compound-engineering:review:kieran-python-reviewer` | Python files, FastAPI routes, Pydantic models, async/await patterns, SQLAlchemy usage |
| `fastapi-philosophy` | `compound-engineering:review:tiangolo-fastapi-reviewer` | FastAPI application code, dependency injection, response models, middleware, OpenAPI schemas |
| `typescript-quality` | `compound-engineering:review:kieran-typescript-reviewer` | TypeScript files, React components, type definitions, generic patterns |
| `frontend-races` | `compound-engineering:review:julik-frontend-races-reviewer` | Frontend JavaScript, Stimulus controllers, event listeners, async UI code, animations, DOM lifecycle |
| `architecture` | `compound-engineering:review:architecture-strategist` | New services, module boundaries, dependency graphs, API layer changes, package structure |
## CE Conditional Agents (migration-specific)
These CE-native agents provide specialized analysis beyond what the persona agents cover. Spawn them when the diff includes database migrations, schema.rb, or data backfills.
| Agent | Focus |
|-------|-------|
| `compound-engineering:review:schema-drift-detector` | Cross-references schema.rb changes against included migrations to catch unrelated drift |
| `compound-engineering:review:deployment-verification-agent` | Produces Go/No-Go deployment checklist with SQL verification queries and rollback procedures |
## Selection rules
1. **Always spawn all 3 always-on personas** plus the 2 CE always-on agents.
2. **For each conditional persona**, the orchestrator reads the diff and decides whether the persona's domain is relevant. This is a judgment call, not a keyword match.
3. **For language/framework conditional personas**, spawn when the diff contains files matching the persona's language or framework domain. Multiple language personas can be active simultaneously (e.g., both `python-quality` and `typescript-quality` if the diff touches both).
4. **For CE conditional agents**, spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts.
5. **Announce the team** before spawning with a one-line justification per conditional reviewer selected.

View File

@@ -0,0 +1,115 @@
# Code Review Output Template
Use this **exact format** when presenting synthesized review findings. Findings are grouped by severity, not by reviewer.
**IMPORTANT:** Use pipe-delimited markdown tables (`| col | col |`). Do NOT use ASCII box-drawing characters.
## Example
```markdown
## Code Review Results
**Scope:** merge-base with the review base branch -> working tree (14 files, 342 lines)
**Intent:** Add order export endpoint with CSV and JSON format support
**Mode:** autofix
**Reviewers:** correctness, testing, maintainability, security, api-contract
- security -- new public endpoint accepts user-provided format parameter
- api-contract -- new /api/orders/export route with response schema
### P0 -- Critical
| # | File | Issue | Reviewer | Confidence | Route |
|---|------|-------|----------|------------|-------|
| 1 | `orders_controller.rb:42` | User-supplied ID in account lookup without ownership check | security | 0.92 | `gated_auto -> downstream-resolver` |
### P1 -- High
| # | File | Issue | Reviewer | Confidence | Route |
|---|------|-------|----------|------------|-------|
| 2 | `export_service.rb:87` | Loads all orders into memory -- unbounded for large accounts | performance | 0.85 | `safe_auto -> review-fixer` |
| 3 | `export_service.rb:91` | No pagination -- response size grows linearly with order count | api-contract, performance | 0.80 | `manual -> downstream-resolver` |
### P2 -- Moderate
| # | File | Issue | Reviewer | Confidence | Route |
|---|------|-------|----------|------------|-------|
| 4 | `export_service.rb:45` | Missing error handling for CSV serialization failure | correctness | 0.75 | `safe_auto -> review-fixer` |
### P3 -- Low
| # | File | Issue | Reviewer | Confidence | Route |
|---|------|-------|----------|------------|-------|
| 5 | `export_helper.rb:12` | Format detection could use early return instead of nested conditional | maintainability | 0.70 | `advisory -> human` |
### Applied Fixes
- `safe_auto`: Added bounded export pagination guard and CSV serialization failure test coverage in this run
### Residual Actionable Work
| # | File | Issue | Route | Next Step |
|---|------|-------|-------|-----------|
| 1 | `orders_controller.rb:42` | Ownership check missing on export lookup | `gated_auto -> downstream-resolver` | Create residual todo and require explicit approval before behavior change |
| 2 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Create residual todo with contract and client impact details |
### Pre-existing Issues
| # | File | Issue | Reviewer |
|---|------|-------|----------|
| 1 | `orders_controller.rb:12` | Broad rescue masking failed permission check | correctness |
### Learnings & Past Solutions
- [Known Pattern] `docs/solutions/export-pagination.md` -- previous export pagination fix applies to this endpoint
### Agent-Native Gaps
- New export endpoint has no CLI/agent equivalent -- agent users cannot trigger exports
### Schema Drift Check
- Clean: schema.rb changes match the migrations in scope
### Deployment Notes
- Pre-deploy: capture baseline row counts before enabling the export backfill
- Verify: `SELECT COUNT(*) FROM exports WHERE status IS NULL;` should stay at `0`
- Rollback: keep the old export path available until the backfill has been validated
### Coverage
- Suppressed: 2 findings below 0.60 confidence
- Residual risks: No rate limiting on export endpoint
- Testing gaps: No test for concurrent export requests
---
> **Verdict:** Ready with fixes
>
> **Reasoning:** 1 critical auth bypass must be fixed. The memory/pagination issues (P1) should be addressed for production safety.
>
> **Fix order:** P0 auth bypass -> P1 memory/pagination -> P2 error handling if straightforward
```
## Formatting Rules
- **Pipe-delimited markdown tables** -- never ASCII box-drawing characters
- **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
- **Always include file:line location** for code review issues
- **Reviewer column** shows which persona(s) flagged the issue. Multiple reviewers = cross-reviewer agreement.
- **Confidence column** shows the finding's confidence score
- **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``.
- **Header includes** scope, intent, and reviewer team with per-conditional justifications
- **Mode line** -- include `interactive`, `autofix`, or `report-only`
- **Applied Fixes section** -- include only when a fix phase ran in this review invocation
- **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work
- **Pre-existing section** -- separate table, no confidence column (these are informational)
- **Learnings & Past Solutions section** -- results from learnings-researcher, with links to docs/solutions/ files
- **Agent-Native Gaps section** -- results from agent-native-reviewer. Omit if no gaps found.
- **Schema Drift Check section** -- results from schema-drift-detector. Omit if the agent did not run.
- **Deployment Notes section** -- key checklist items from deployment-verification-agent. Omit if the agent did not run.
- **Coverage section** -- suppressed count, residual risks, testing gaps, failed reviewers
- **Summary uses blockquotes** for verdict, reasoning, and fix order
- **Horizontal rule** (`---`) separates findings from verdict
- **`###` headers** for each section -- never plain text headers

View File

@@ -0,0 +1,56 @@
# Sub-agent Prompt Template
This template is used by the orchestrator to spawn each reviewer sub-agent. Variable substitution slots are filled at spawn time.
---
## Template
```
You are a specialist code reviewer.
<persona>
{persona_file}
</persona>
<scope-rules>
{diff_scope_rules}
</scope-rules>
<output-contract>
Return ONLY valid JSON matching the findings schema below. No prose, no markdown, no explanation outside the JSON object.
{schema}
Rules:
- Suppress any finding below your stated confidence floor (see your Confidence calibration section).
- Every finding MUST include at least one evidence item grounded in the actual code.
- Set pre_existing to true ONLY for issues in unchanged code that are unrelated to this diff. If the diff makes the issue newly relevant, it is NOT pre-existing.
- You are operationally read-only. You may use non-mutating inspection commands, including read-oriented `git` / `gh` commands, to gather evidence. Do not edit files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
- Set `autofix_class` conservatively. Use `safe_auto` only when the fix is local, deterministic, and low-risk. Use `gated_auto` when a concrete fix exists but changes behavior/contracts/permissions. Use `manual` for actionable residual work. Use `advisory` for report-only items that should not become code-fix work.
- Set `owner` to the default next actor for this finding: `review-fixer`, `downstream-resolver`, `human`, or `release`.
- Set `requires_verification` to true whenever the likely fix needs targeted tests, a focused re-review, or operational validation before it should be trusted.
- suggested_fix is optional. Only include it when the fix is obvious and correct. A bad suggestion is worse than none.
- If you find no issues, return an empty findings array. Still populate residual_risks and testing_gaps if applicable.
</output-contract>
<review-context>
Intent: {intent_summary}
Changed files: {file_list}
Diff:
{diff}
</review-context>
```
## Variable Reference
| Variable | Source | Description |
|----------|--------|-------------|
| `{persona_file}` | Agent markdown file content | The full persona definition (identity, failure modes, calibration, suppress conditions) |
| `{diff_scope_rules}` | `references/diff-scope.md` content | Primary/secondary/pre-existing tier rules |
| `{schema}` | `references/findings-schema.json` content | The JSON schema reviewers must conform to |
| `{intent_summary}` | Stage 2 output | 2-3 line description of what the change is trying to accomplish |
| `{file_list}` | Stage 1 output | List of changed files from the scope step |
| `{diff}` | Stage 1 output | The actual diff content to review |

View File

@@ -0,0 +1,564 @@
---
name: ce:work-beta
description: "[BETA] Execute work plans with external delegate support. Same as ce:work but includes experimental Codex delegation mode for token-conserving code implementation."
argument-hint: "[plan file, specification, or todo file path]"
disable-model-invocation: true
---
# Work Plan Execution Command
Execute a work plan efficiently while maintaining quality and finishing features.
## Introduction
This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
## Input Document
<input_document> #$ARGUMENTS </input_document>
## Execution Workflow
### Phase 1: Quick Start
1. **Read Plan and Clarify**
- Read the work document completely
- Treat the plan as a decision artifact, not an execution script
- If the plan includes sections such as `Implementation Units`, `Work Breakdown`, `Requirements Trace`, `Files`, `Test Scenarios`, or `Verification`, use those as the primary source material for execution
- Check for `Execution note` on each implementation unit — these carry the plan's execution posture signal for that unit (for example, test-first or characterization-first). Note them when creating tasks.
- Check for a `Deferred to Implementation` or `Implementation-Time Unknowns` section — these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-task
- Check for a `Scope Boundaries` section — these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent work
- Review any references or links provided in the plan
- If the user explicitly asks for TDD, test-first, or characterization-first execution in this session, honor that request even if the plan has no `Execution note`
- If anything is unclear or ambiguous, ask clarifying questions now
- Get user approval to proceed
- **Do not skip this** - better to ask questions now than build the wrong thing
2. **Setup Environment**
First, check the current branch:
```bash
current_branch=$(git branch --show-current)
default_branch=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')
# Fallback if remote HEAD isn't set
if [ -z "$default_branch" ]; then
default_branch=$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo "main" || echo "master")
fi
```
**If already on a feature branch** (not the default branch):
- Ask: "Continue working on `[current_branch]`, or create a new branch?"
- If continuing, proceed to step 3
- If creating new, follow Option A or B below
**If on the default branch**, choose how to proceed:
**Option A: Create a new branch**
```bash
git pull origin [default_branch]
git checkout -b feature-branch-name
```
Use a meaningful name based on the work (e.g., `feat/user-authentication`, `fix/email-validation`).
**Option B: Use a worktree (recommended for parallel development)**
```bash
skill: git-worktree
# The skill will create a new branch from the default branch in an isolated worktree
```
**Option C: Continue on the default branch**
- Requires explicit user confirmation
- Only proceed after user explicitly says "yes, commit to [default_branch]"
- Never commit directly to the default branch without explicit permission
**Recommendation**: Use worktree if:
- You want to work on multiple features simultaneously
- You want to keep the default branch clean while experimenting
- You plan to switch between branches frequently
3. **Create Todo List**
- Use your available task tracking tool (e.g., TodoWrite, task lists) to break the plan into actionable tasks
- Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
- Carry each unit's `Execution note` into the task when present
- For each unit, read the `Patterns to follow` field before implementing — these point to specific files or conventions to mirror
- Use each unit's `Verification` field as the primary "done" signal for that task
- Do not expect the plan to contain implementation code, micro-step TDD instructions, or exact shell commands
- Include dependencies between tasks
- Prioritize based on what needs to be done first
- Include testing and quality check tasks
- Keep tasks specific and completable
4. **Choose Execution Strategy**
After creating the task list, decide how to execute based on the plan's size and dependency structure:
| Strategy | When to use |
|----------|-------------|
| **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
| **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
| **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
**Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
- The full plan file path (for overall context)
- The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
- Any resolved deferred questions relevant to that unit
After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
### Phase 2: Execute
1. **Task Execution Loop**
For each task in priority order:
```
while (tasks remain):
- Mark task as in-progress
- Read any referenced files from the plan
- Look for similar patterns in codebase
- Implement following existing conventions
- Write tests for new functionality
- Run System-Wide Test Check (see below)
- Run tests after changes
- Mark task as completed
- Evaluate for incremental commit (see below)
```
When a unit carries an `Execution note`, honor it. For test-first units, write the failing test before implementation for that unit. For characterization-first units, capture existing behavior before changing it. For units without an `Execution note`, proceed pragmatically.
Guardrails for execution posture:
- Do not write the test and implementation in the same step when working test-first
- Do not skip verifying that a new test fails before implementing the fix or feature
- Do not over-implement beyond the current behavior slice when working test-first
- Skip test-first discipline for trivial renames, pure configuration, and pure styling work
**System-Wide Test Check** — Before marking a task done, pause and ask:
| Question | What to do |
|----------|------------|
| **What fires when this runs?** Callbacks, middleware, observers, event handlers — trace two levels out from your change. | Read the actual code (not docs) for callbacks on models you touch, middleware in the request chain, `after_*` hooks. |
| **Do my tests exercise the real chain?** If every dependency is mocked, the test proves your logic works *in isolation* — it says nothing about the interaction. | Write at least one integration test that uses real objects through the full callback/middleware chain. No mocks for the layers that interact. |
| **Can failure leave orphaned state?** If your code persists state (DB row, cache, file) before calling an external service, what happens when the service fails? Does retry create duplicates? | Trace the failure path with real objects. If state is created before the risky call, test that failure cleans up or that retry is idempotent. |
| **What other interfaces expose this?** Mixins, DSLs, alternative entry points (Agent vs Chat vs ChatMethods). | Grep for the method/behavior in related classes. If parity is needed, add it now — not as a follow-up. |
| **Do error strategies align across layers?** Retry middleware + application fallback + framework error handling — do they conflict or create double execution? | List the specific error classes at each layer. Verify your rescue list matches what the lower layer actually raises. |
**When to skip:** Leaf-node changes with no callbacks, no state persistence, no parallel interfaces. If the change is purely additive (new helper method, new view partial), the check takes 10 seconds and the answer is "nothing fires, skip."
**When this matters most:** Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.
2. **Incremental Commits**
After completing each task, evaluate whether to create an incremental commit:
| Commit when... | Don't commit when... |
|----------------|---------------------|
| Logical unit complete (model, service, component) | Small part of a larger unit |
| Tests pass + meaningful progress | Tests failing |
| About to switch contexts (backend → frontend) | Purely scaffolding with no behavior |
| About to attempt risky/uncertain changes | Would need a "WIP" commit message |
**Heuristic:** "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."
If the plan has Implementation Units, use them as a starting guide for commit boundaries — but adapt based on what you find during implementation. A unit might need multiple commits if it's larger than expected, or small related units might land together. Use each unit's Goal to inform the commit message.
**Commit workflow:**
```bash
# 1. Verify tests pass (use project's test command)
# Examples: bin/rails test, npm test, pytest, go test, etc.
# 2. Stage only files related to this logical unit (not `git add .`)
git add <files related to this logical unit>
# 3. Commit with conventional message
git commit -m "feat(scope): description of this unit"
```
**Handling merge conflicts:** If conflicts arise during rebasing or merging, resolve them immediately. Incremental commits make conflict resolution easier since each commit is small and focused.
**Note:** Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
3. **Follow Existing Patterns**
- The plan should reference similar code - read those files first
- Match naming conventions exactly
- Reuse existing components where possible
- Follow project coding standards (see AGENTS.md; use CLAUDE.md only if the repo still keeps a compatibility shim)
- When in doubt, grep for similar implementations
4. **Test Continuously**
- Run relevant tests after each significant change
- Don't wait until the end to test
- Fix failures immediately
- Add new tests for new functionality
- **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.
5. **Simplify as You Go**
After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities — consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.
Don't simplify after every single unit — early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.
If a `/simplify` skill or equivalent is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities.
6. **Figma Design Sync** (if applicable)
For UI work with Figma designs:
- Implement components following design specs
- Use figma-design-sync agent iteratively to compare
- Fix visual differences identified
- Repeat until implementation matches design
7. **Frontend Design Guidance** (if applicable)
For UI tasks without a Figma design -- where the implementation touches view, template, component, layout, or page files, creates user-visible routes, or the plan contains explicit UI/frontend/design language:
- Load the `frontend-design` skill before implementing
- Follow its detection, guidance, and verification flow
- If the skill produced a verification screenshot, it satisfies Phase 4's screenshot requirement -- no need to capture separately. If the skill fell back to mental review (no browser access), Phase 4's screenshot capture still applies
8. **Track Progress**
- Keep the task list updated as you complete tasks
- Note any blockers or unexpected discoveries
- Create new tasks if scope expands
- Keep user informed of major milestones
### Phase 3: Quality Check
1. **Run Core Quality Checks**
Always run before submitting:
```bash
# Run full test suite (use project's test command)
# Examples: bin/rails test, npm test, pytest, go test, etc.
# Run linting (per AGENTS.md)
# Use linting-agent before pushing to origin
```
2. **Consider Reviewer Agents** (Optional)
Use for complex, risky, or large changes. Read agents from `compound-engineering.local.md` frontmatter (`review_agents`). If no settings file, invoke the `setup` skill to create one.
Run configured agents in parallel with Task tool. Present findings and address critical issues.
3. **Final Validation**
- All tasks marked completed
- All tests pass
- Linting passes
- Code follows existing patterns
- Figma designs match (if applicable)
- No console errors or warnings
- If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
- If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
4. **Prepare Operational Validation Plan** (REQUIRED)
- Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
- Include concrete:
- Log queries/search terms
- Metrics or dashboards to watch
- Expected healthy signals
- Failure signals and rollback/mitigation trigger
- Validation window and owner
- If there is truly no production/runtime impact, still include the section with: `No additional operational monitoring required` and a one-line reason.
### Phase 4: Ship It
1. **Create Commit**
```bash
git add .
git status # Review what's being committed
git diff --staged # Check the changes
# Commit with conventional format
git commit -m "$(cat <<'EOF'
feat(scope): description of what and why
Brief explanation if needed.
🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Compound Engineering v[VERSION]
Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
EOF
)"
```
**Fill in at commit/PR time:**
| Placeholder | Value | Example |
|-------------|-------|---------|
| Placeholder | Value | Example |
|-------------|-------|---------|
| `[MODEL]` | Model name | Claude Opus 4.6, GPT-5.4 |
| `[CONTEXT]` | Context window (if known) | 200K, 1M |
| `[THINKING]` | Thinking level (if known) | extended thinking |
| `[HARNESS]` | Tool running you | Claude Code, Codex, Gemini CLI |
| `[HARNESS_URL]` | Link to that tool | `https://claude.com/claude-code` |
| `[VERSION]` | `plugin.json` → `version` | 2.40.0 |
Subagents creating commits/PRs are equally responsible for accurate attribution.
2. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
For **any** design changes, new views, or UI modifications, you MUST capture and upload screenshots:
**Step 1: Start dev server** (if not running)
```bash
bin/dev # Run in background
```
**Step 2: Capture screenshots with agent-browser CLI**
```bash
agent-browser open http://localhost:3000/[route]
agent-browser snapshot -i
agent-browser screenshot output.png
```
See the `agent-browser` skill for detailed usage.
**Step 3: Upload using imgup skill**
```bash
skill: imgup
# Then upload each screenshot:
imgup -h pixhost screenshot.png # pixhost works without API key
# Alternative hosts: catbox, imagebin, beeimg
```
**What to capture:**
- **New screens**: Screenshot of the new UI
- **Modified screens**: Before AND after screenshots
- **Design implementation**: Screenshot showing Figma design match
**IMPORTANT**: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
3. **Create Pull Request**
```bash
git push -u origin feature-branch-name
gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF'
## Summary
- What was built
- Why it was needed
- Key decisions made
## Testing
- Tests added/modified
- Manual testing performed
## Post-Deploy Monitoring & Validation
- **What to monitor/search**
- Logs:
- Metrics/Dashboards:
- **Validation checks (queries/commands)**
- `command or query here`
- **Expected healthy behavior**
- Expected signal(s)
- **Failure signal(s) / rollback trigger**
- Trigger + immediate action
- **Validation window & owner**
- Window:
- Owner:
- **If no operational impact**
- `No additional operational monitoring required: <reason>`
## Before / After Screenshots
| Before | After |
|--------|-------|
| ![before](URL) | ![after](URL) |
## Figma Design
[Link if applicable]
---
[![Compound Engineering v[VERSION]](https://img.shields.io/badge/Compound_Engineering-v[VERSION]-6366f1)](https://github.com/EveryInc/compound-engineering-plugin)
🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
EOF
)"
```
4. **Update Plan Status**
If the input document has YAML frontmatter with a `status` field, update it to `completed`:
```
status: active → status: completed
```
5. **Notify User**
- Summarize what was completed
- Link to PR
- Note any follow-up work needed
- Suggest next steps if applicable
---
## Swarm Mode with Agent Teams (Optional)
For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in Claude Code, multi-agent workflows in Codex).
**Agent teams are typically experimental and require opt-in.** Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
### When to Use Agent Teams vs Subagents
| Agent Teams | Subagents (standard mode) |
|-------------|---------------------------|
| Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
| Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
| 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
| User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.
### Agent Teams Workflow
1. **Create team** — use your available team creation mechanism
2. **Create task list** — parse Implementation Units into tasks with dependency relationships
3. **Spawn teammates** — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
4. **Coordinate** — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
5. **Cleanup** — shut down all teammates, then clean up the team resources
---
## External Delegate Mode (Optional)
For plans where token conservation matters, delegate code implementation to an external delegate (currently Codex CLI) while keeping planning, review, and git operations in the current agent.
This mode integrates with the existing Phase 1 Step 4 strategy selection as a **task-level modifier** - the strategy (inline/serial/parallel) still applies, but the implementation step within each tagged task delegates to the external tool instead of executing directly.
### When to Use External Delegation
| External Delegation | Standard Mode |
|---------------------|---------------|
| Task is pure code implementation | Task requires research or exploration |
| Plan has clear acceptance criteria | Task is ambiguous or needs iteration |
| Token conservation matters (e.g., Max20 plan) | Unlimited plan or small task |
| Files to change are well-scoped | Changes span many interconnected files |
### Enabling External Delegation
External delegation activates when any of these conditions are met:
- The user says "use codex for this work", "delegate to codex", or "delegate mode"
- A plan implementation unit contains `Execution target: external-delegate` in its Execution note (set by ce:plan)
The specific delegate tool is resolved at execution time. Currently the only supported delegate is Codex CLI. Future delegates can be added without changing plan files.
### Environment Guard
Before attempting delegation, check whether the current agent is already running inside a delegate's sandbox. Delegation from within a sandbox will fail silently or recurse.
Check for known sandbox indicators:
- `CODEX_SANDBOX` environment variable is set
- `CODEX_SESSION_ID` environment variable is set
- The filesystem is read-only at `.git/` (Codex sandbox blocks git writes)
If any indicator is detected, print "Already running inside a delegate sandbox - using standard mode." and proceed with standard execution for that task.
### External Delegation Workflow
When external delegation is active, follow this workflow for each tagged task. Do not skip delegation because a task seems "small", "simple", or "faster inline". The user or plan explicitly requested delegation.
1. **Check availability**
Verify the delegate CLI is installed. If not found, print "Delegate CLI not installed - continuing with standard mode." and proceed normally.
2. **Build prompt** — For each task, assemble a prompt from the plan's implementation unit (Goal, Files, Approach, Conventions from `compound-engineering.local.md`). Include rules: no git commits, no PRs, run `git status` and `git diff --stat` when done. Never embed credentials or tokens in the prompt - pass auth through environment variables.
3. **Write prompt to file** — Save the assembled prompt to a unique temporary file to avoid shell quoting issues and cross-task races. Use a unique filename per task.
4. **Delegate** — Run the delegate CLI, piping the prompt file via stdin (not argv expansion, which hits `ARG_MAX` on large prompts). Omit the model flag to use the delegate's default model, which stays current without manual updates.
5. **Review diff** — After the delegate finishes, verify the diff is non-empty and in-scope. Run the project's test/lint commands. If the diff is empty or out-of-scope, fall back to standard mode for that task.
6. **Commit** — The current agent handles all git operations. The delegate's sandbox blocks `.git/index.lock` writes, so the delegate cannot commit. Stage changes and commit with a conventional message.
7. **Error handling** — On any delegate failure (rate limit, error, empty diff), fall back to standard mode for that task. Track consecutive failures - after 3 consecutive failures, disable delegation for remaining tasks and print "Delegate disabled after 3 consecutive failures - completing remaining tasks in standard mode."
### Mixed-Model Attribution
When some tasks are executed by the delegate and others by the current agent, use the following attribution in Phase 4:
- If all tasks used the delegate: attribute to the delegate model
- If all tasks used standard mode: attribute to the current agent's model
- If mixed: use `Generated with [CURRENT_MODEL] + [DELEGATE_MODEL] via [HARNESS]` and note which tasks were delegated in the PR description
---
## Key Principles
### Start Fast, Execute Faster
- Get clarification once at the start, then execute
- Don't wait for perfect understanding - ask questions and move
- The goal is to **finish the feature**, not create perfect process
### The Plan is Your Guide
- Work documents should reference similar code and patterns
- Load those references and follow them
- Don't reinvent - match what exists
### Test As You Go
- Run tests after each change, not at the end
- Fix failures immediately
- Continuous testing prevents big surprises
### Quality is Built In
- Follow existing patterns
- Write tests for new code
- Run linting before pushing
- Use reviewer agents for complex/risky changes only
### Ship Complete Features
- Mark all tasks completed before moving on
- Don't leave features 80% done
- A finished feature that ships beats a perfect feature that doesn't
## Quality Checklist
Before creating PR, verify:
- [ ] All clarifying questions asked and answered
- [ ] All tasks marked completed
- [ ] Tests pass (run project's test command)
- [ ] Linting passes (use linting-agent)
- [ ] Code follows existing patterns
- [ ] Figma designs match implementation (if applicable)
- [ ] Before/after screenshots captured and uploaded (for UI changes)
- [ ] Commit messages follow conventional format
- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
- [ ] PR description includes summary, testing notes, and screenshots
- [ ] PR description includes Compound Engineered badge with accurate model, harness, and version
## When to Use Reviewer Agents
**Don't use by default.** Use reviewer agents only when:
- Large refactor affecting many files (10+)
- Security-sensitive changes (authentication, permissions, data access)
- Performance-critical code paths
- Complex algorithms or business logic
- User explicitly requests thorough review
For most features: tests + linting + following patterns is sufficient.
## Common Pitfalls to Avoid
- **Analysis paralysis** - Don't overthink, read the plan and execute
- **Skipping clarifying questions** - Ask now, not after building wrong thing
- **Ignoring plan references** - The plan has links for a reason
- **Testing at the end** - Test continuously or suffer later
- **Forgetting to track progress** - Update task status as you go or lose track of what's done
- **80% done syndrome** - Finish the feature, don't move on early
- **Over-reviewing simple changes** - Save reviewer agents for complex work

View File

@@ -23,7 +23,13 @@ This command takes a work document (plan, specification, or todo file) and execu
1. **Read Plan and Clarify**
- Read the work document completely
- Treat the plan as a decision artifact, not an execution script
- If the plan includes sections such as `Implementation Units`, `Work Breakdown`, `Requirements Trace`, `Files`, `Test Scenarios`, or `Verification`, use those as the primary source material for execution
- Check for `Execution note` on each implementation unit — these carry the plan's execution posture signal for that unit (for example, test-first or characterization-first). Note them when creating tasks.
- Check for a `Deferred to Implementation` or `Implementation-Time Unknowns` section — these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-task
- Check for a `Scope Boundaries` section — these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent work
- Review any references or links provided in the plan
- If the user explicitly asks for TDD, test-first, or characterization-first execution in this session, honor that request even if the plan has no `Execution note`
- If anything is unclear or ambiguous, ask clarifying questions now
- Get user approval to proceed
- **Do not skip this** - better to ask questions now than build the wrong thing
@@ -73,12 +79,36 @@ This command takes a work document (plan, specification, or todo file) and execu
- You plan to switch between branches frequently
3. **Create Todo List**
- Use TodoWrite to break plan into actionable tasks
- Use your available task tracking tool (e.g., TodoWrite, task lists) to break the plan into actionable tasks
- Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
- Carry each unit's `Execution note` into the task when present
- For each unit, read the `Patterns to follow` field before implementing — these point to specific files or conventions to mirror
- Use each unit's `Verification` field as the primary "done" signal for that task
- Do not expect the plan to contain implementation code, micro-step TDD instructions, or exact shell commands
- Include dependencies between tasks
- Prioritize based on what needs to be done first
- Include testing and quality check tasks
- Keep tasks specific and completable
4. **Choose Execution Strategy**
After creating the task list, decide how to execute based on the plan's size and dependency structure:
| Strategy | When to use |
|----------|-------------|
| **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
| **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
| **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
**Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
- The full plan file path (for overall context)
- The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
- Any resolved deferred questions relevant to that unit
After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
### Phase 2: Execute
1. **Task Execution Loop**
@@ -87,18 +117,25 @@ This command takes a work document (plan, specification, or todo file) and execu
```
while (tasks remain):
- Mark task as in_progress in TodoWrite
- Mark task as in-progress
- Read any referenced files from the plan
- Look for similar patterns in codebase
- Implement following existing conventions
- Write tests for new functionality
- Run System-Wide Test Check (see below)
- Run tests after changes
- Mark task as completed in TodoWrite
- Mark off the corresponding checkbox in the plan file ([ ] → [x])
- Mark task as completed
- Evaluate for incremental commit (see below)
```
When a unit carries an `Execution note`, honor it. For test-first units, write the failing test before implementation for that unit. For characterization-first units, capture existing behavior before changing it. For units without an `Execution note`, proceed pragmatically.
Guardrails for execution posture:
- Do not write the test and implementation in the same step when working test-first
- Do not skip verifying that a new test fails before implementing the fix or feature
- Do not over-implement beyond the current behavior slice when working test-first
- Skip test-first discipline for trivial renames, pure configuration, and pure styling work
**System-Wide Test Check** — Before marking a task done, pause and ask:
| Question | What to do |
@@ -113,7 +150,6 @@ This command takes a work document (plan, specification, or todo file) and execu
**When this matters most:** Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.
**IMPORTANT**: Always update the original plan document by checking off completed items. Use the Edit tool to change `- [ ]` to `- [x]` for each task you finish. This keeps the plan as a living document showing progress and ensures no checkboxes are left unchecked.
2. **Incremental Commits**
@@ -128,6 +164,8 @@ This command takes a work document (plan, specification, or todo file) and execu
**Heuristic:** "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."
If the plan has Implementation Units, use them as a starting guide for commit boundaries — but adapt based on what you find during implementation. A unit might need multiple commits if it's larger than expected, or small related units might land together. Use each unit's Goal to inform the commit message.
**Commit workflow:**
```bash
# 1. Verify tests pass (use project's test command)
@@ -149,7 +187,7 @@ This command takes a work document (plan, specification, or todo file) and execu
- The plan should reference similar code - read those files first
- Match naming conventions exactly
- Reuse existing components where possible
- Follow project coding standards (see CLAUDE.md)
- Follow project coding standards (see AGENTS.md; use CLAUDE.md only if the repo still keeps a compatibility shim)
- When in doubt, grep for similar implementations
4. **Test Continuously**
@@ -160,7 +198,15 @@ This command takes a work document (plan, specification, or todo file) and execu
- Add new tests for new functionality
- **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.
5. **Figma Design Sync** (if applicable)
5. **Simplify as You Go**
After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities — consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.
Don't simplify after every single unit — early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.
If a `/simplify` skill or equivalent is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities.
6. **Figma Design Sync** (if applicable)
For UI work with Figma designs:
@@ -170,7 +216,7 @@ This command takes a work document (plan, specification, or todo file) and execu
- Repeat until implementation matches design
6. **Track Progress**
- Keep TodoWrite updated as you complete tasks
- Keep the task list updated as you complete tasks
- Note any blockers or unexpected discoveries
- Create new tasks if scope expands
- Keep user informed of major milestones
@@ -185,7 +231,7 @@ This command takes a work document (plan, specification, or todo file) and execu
# Run full test suite (use project's test command)
# Examples: bin/rails test, npm test, pytest, go test, etc.
# Run linting (per CLAUDE.md)
# Run linting (per AGENTS.md)
# Use linting-agent before pushing to origin
```
@@ -196,12 +242,14 @@ This command takes a work document (plan, specification, or todo file) and execu
Run configured agents in parallel with Task tool. Present findings and address critical issues.
3. **Final Validation**
- All TodoWrite tasks marked completed
- All tasks marked completed
- All tests pass
- Linting passes
- Code follows existing patterns
- Figma designs match (if applicable)
- No console errors or warnings
- If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
- If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution
4. **Prepare Operational Validation Plan** (REQUIRED)
- Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
@@ -228,13 +276,28 @@ This command takes a work document (plan, specification, or todo file) and execu
Brief explanation if needed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Compound Engineering v[VERSION]
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
EOF
)"
```
**Fill in at commit/PR time:**
| Placeholder | Value | Example |
|-------------|-------|---------|
| Placeholder | Value | Example |
|-------------|-------|---------|
| `[MODEL]` | Model name | Claude Opus 4.6, GPT-5.4 |
| `[CONTEXT]` | Context window (if known) | 200K, 1M |
| `[THINKING]` | Thinking level (if known) | extended thinking |
| `[HARNESS]` | Tool running you | Claude Code, Codex, Gemini CLI |
| `[HARNESS_URL]` | Link to that tool | `https://claude.com/claude-code` |
| `[VERSION]` | `plugin.json` → `version` | 2.40.0 |
Subagents creating commits/PRs are equally responsible for accurate attribution.
2. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
For **any** design changes, new views, or UI modifications, you MUST capture and upload screenshots:
@@ -308,7 +371,8 @@ This command takes a work document (plan, specification, or todo file) and execu
---
[![Compound Engineered](https://img.shields.io/badge/Compound-Engineered-6366f1)](https://github.com/EveryInc/compound-engineering-plugin) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
[![Compound Engineering v[VERSION]](https://img.shields.io/badge/Compound_Engineering-v[VERSION]-6366f1)](https://github.com/EveryInc/compound-engineering-plugin)
🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
EOF
)"
```
@@ -328,73 +392,30 @@ This command takes a work document (plan, specification, or todo file) and execu
---
## Swarm Mode (Optional)
## Swarm Mode with Agent Teams (Optional)
For complex plans with multiple independent workstreams, enable swarm mode for parallel execution with coordinated agents.
For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in Claude Code, multi-agent workflows in Codex).
### When to Use Swarm Mode
**Agent teams are typically experimental and require opt-in.** Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
| Use Swarm Mode when... | Use Standard Mode when... |
|------------------------|---------------------------|
| Plan has 5+ independent tasks | Plan is linear/sequential |
| Multiple specialists needed (review + test + implement) | Single-focus work |
| Want maximum parallelism | Simpler mental model preferred |
| Large feature with clear phases | Small feature or bug fix |
### When to Use Agent Teams vs Subagents
### Enabling Swarm Mode
| Agent Teams | Subagents (standard mode) |
|-------------|---------------------------|
| Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
| Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
| 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
| User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
To trigger swarm execution, say:
Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.
> "Make a Task list and launch an army of agent swarm subagents to build the plan"
### Agent Teams Workflow
Or explicitly request: "Use swarm mode for this work"
### Swarm Workflow
When swarm mode is enabled, the workflow changes:
1. **Create Team**
```
Teammate({ operation: "spawnTeam", team_name: "work-{timestamp}" })
```
2. **Create Task List with Dependencies**
- Parse plan into TaskCreate items
- Set up blockedBy relationships for sequential dependencies
- Independent tasks have no blockers (can run in parallel)
3. **Spawn Specialized Teammates**
```
Task({
team_name: "work-{timestamp}",
name: "implementer",
subagent_type: "general-purpose",
prompt: "Claim implementation tasks, execute, mark complete",
run_in_background: true
})
Task({
team_name: "work-{timestamp}",
name: "tester",
subagent_type: "general-purpose",
prompt: "Claim testing tasks, run tests, mark complete",
run_in_background: true
})
```
4. **Coordinate and Monitor**
- Team lead monitors task completion
- Spawn additional workers as phases unblock
- Handle plan approval if required
5. **Cleanup**
```
Teammate({ operation: "requestShutdown", target_agent_id: "implementer" })
Teammate({ operation: "requestShutdown", target_agent_id: "tester" })
Teammate({ operation: "cleanup" })
```
See the `orchestrating-swarms` skill for detailed swarm patterns and best practices.
1. **Create team** — use your available team creation mechanism
2. **Create task list** — parse Implementation Units into tasks with dependency relationships
3. **Spawn teammates** — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
4. **Coordinate** — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
5. **Cleanup** — shut down all teammates, then clean up the team resources
---
@@ -436,7 +457,7 @@ See the `orchestrating-swarms` skill for detailed swarm patterns and best practi
Before creating PR, verify:
- [ ] All clarifying questions asked and answered
- [ ] All TodoWrite tasks marked completed
- [ ] All tasks marked completed
- [ ] Tests pass (run project's test command)
- [ ] Linting passes (use linting-agent)
- [ ] Code follows existing patterns
@@ -445,7 +466,7 @@ Before creating PR, verify:
- [ ] Commit messages follow conventional format
- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
- [ ] PR description includes summary, testing notes, and screenshots
- [ ] PR description includes Compound Engineered badge
- [ ] PR description includes Compound Engineered badge with accurate model, harness, and version
## When to Use Reviewer Agents
@@ -465,6 +486,6 @@ For most features: tests + linting + following patterns is sufficient.
- **Skipping clarifying questions** - Ask now, not after building wrong thing
- **Ignoring plan references** - The plan has links for a reason
- **Testing at the end** - Test continuously or suffer later
- **Forgetting TodoWrite** - Track progress or lose track of what's done
- **Forgetting to track progress** - Update task status as you go or lose track of what's done
- **80% done syndrome** - Finish the feature, don't move on early
- **Over-reviewing simple changes** - Save reviewer agents for complex work

View File

@@ -0,0 +1,160 @@
---
name: claude-permissions-optimizer
context: fork
description: Optimize Claude Code permissions by finding safe Bash commands from session history and auto-applying them to settings.json. Can run from any coding agent but targets Claude Code specifically. Use when experiencing permission fatigue, too many permission prompts, wanting to optimize permissions, or needing to set up allowlists. Triggers on "optimize permissions", "reduce permission prompts", "allowlist commands", "too many permission prompts", "permission fatigue", "permission setup", or complaints about clicking approve too often.
---
# Claude Permissions Optimizer
Find safe Bash commands that are causing unnecessary permission prompts and auto-allow them in `settings.json` -- evidence-based, not prescriptive.
This skill identifies commands safe to auto-allow based on actual session history. It does not handle requests to allowlist specific dangerous commands. If the user asks to allow something destructive (e.g., `rm -rf`, `git push --force`), explain that this skill optimizes for safe commands only, and that manual allowlist changes can be made directly in settings.json.
## Pre-check: Confirm environment
Determine whether you are currently running inside Claude Code or a different coding agent (Codex, Gemini CLI, Cursor, etc.).
**If running inside Claude Code:** Proceed directly to Step 1.
**If running in a different agent:** Inform the user before proceeding:
> "This skill analyzes Claude Code session history and writes to Claude Code's settings.json. You're currently in [agent name], but I can still optimize your Claude Code permissions from here -- the results will apply next time you use Claude Code."
Then proceed to Step 1 normally. The skill works from any environment as long as `~/.claude/` (or `$CLAUDE_CONFIG_DIR`) exists on the machine.
## Step 1: Choose Analysis Scope
Ask the user how broadly to analyze using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, present the numbered options and wait for the user's reply.
1. **All projects** (Recommended) -- sessions across every project
2. **This project only** -- sessions for the current working directory
3. **Custom** -- user specifies constraints (time window, session count, etc.)
Default to **All projects** unless the user explicitly asks for a single project. More data produces better recommendations.
## Step 2: Run Extraction Script
Run the bundled script. It handles everything: loads the current allowlist, scans recent session transcripts (most recent 500 sessions or last 30 days, whichever is more restrictive), filters already-covered commands, applies a min-count threshold (5+), normalizes into `Bash(pattern)` rules, and pre-classifies each as safe/review/dangerous.
**All projects:**
```bash
node <skill-dir>/scripts/extract-commands.mjs
```
**This project only** -- pass the project slug (absolute path with every non-alphanumeric char replaced by `-`, e.g., `/Users/tmchow/Code/my-project` becomes `-Users-tmchow-Code-my-project`):
```bash
node <skill-dir>/scripts/extract-commands.mjs --project-slug <slug>
```
Optional: `--days <N>` to limit to the last N days. Omit to analyze all available sessions.
The output JSON has:
- `green`: safe patterns to recommend `{ pattern, count, sessions, examples }`
- `redExamples`: top 5 blocked dangerous patterns `{ pattern, reason, count }` (or empty)
- `yellowFootnote`: one-line summary of frequently-used commands that aren't safe to auto-allow (or null)
- `stats`: `totalExtracted`, `alreadyCovered`, `belowThreshold`, `patternsReturned`, `greenRawCount`, etc.
The model's job is to **present** the script's output, not re-classify.
If the script returns empty results, tell the user their allowlist is already well-optimized or they don't have enough session history yet -- suggest re-running after a few more working sessions.
## Step 3: Present Results
Present in three parts. Keep the formatting clean and scannable.
### Part 1: Analysis summary
Show the work done using the script's `stats`. Reaffirm the scope. Keep it to 4-5 lines.
**Example:**
```
## Analysis (compound-engineering-plugin)
Scanned **24 sessions** for this project.
Found **312 unique Bash commands** across those sessions.
- **245** already covered by your 43 existing allowlist rules (79%)
- **61** used fewer than 5 times (filtered as noise)
- **6 commands** remain that regularly trigger permission prompts
```
### Part 2: Recommendations
Present `green` patterns as a numbered table. If `yellowFootnote` is not null, include it as a line after the table.
```
### Safe to auto-allow
| # | Pattern | Evidence |
|---|---------|----------|
| 1 | `Bash(bun test *)` | 23 uses across 8 sessions |
| 2 | `Bash(bun run *)` | 18 uses, covers dev/build/lint scripts |
| 3 | `Bash(node *)` | 12 uses across 5 sessions |
Also frequently used: bun install, mkdir (not classified as safe to auto-allow but may be worth reviewing)
```
If `redExamples` is non-empty, show a compact "Blocked" table after the recommendations. This builds confidence that the classifier is doing its job. Show up to 3 examples.
```
### Blocked from recommendations
| Pattern | Reason | Uses |
|---------|--------|------|
| `rm *` | Irreversible file deletion | 21 |
| `eval *` | Arbitrary code execution | 14 |
| `git reset --hard *` | Destroys uncommitted work | 5 |
```
### Part 3: Bottom line
**One sentence only.** Frame the impact relative to current coverage using the script's stats. Nothing else -- no pattern names, no usage counts, no elaboration. The question tool UI that immediately follows will visually clip any trailing text, so this must fit on a single short line.
```
Adding 22 rules would bring your allowlist coverage from 65% to 93%.
```
Compute the percentages from stats:
- **Before:** `alreadyCovered / totalExtracted * 100`
- **After:** `(alreadyCovered + greenRawCount) / totalExtracted * 100`
Use `greenRawCount` (the number of unique raw commands the green patterns cover), not `patternsReturned` (which is just the number of normalized patterns).
## Step 4: Get User Confirmation
The recommendations table is already displayed. Use the platform's blocking question tool to ask for the decision:
1. **Apply all to user settings** (`~/.claude/settings.json`)
2. **Apply all to project settings** (`.claude/settings.json`)
3. **Skip**
If the user wants to exclude specific items, they can reply in free text (e.g., "all except 3 and 7 to user settings"). The numbered table is already visible for reference -- no need to re-list items in the question tool.
## Step 5: Apply to Settings
For each target settings file:
1. Read the current file (create `{ "permissions": { "allow": [] } }` if it doesn't exist)
2. Append new patterns to `permissions.allow`, avoiding duplicates
3. Sort the allow array alphabetically
4. Write back with 2-space indentation
5. **Verify the write** -- tell the user you're validating the JSON before running this command, e.g., "Verifying settings.json is valid JSON..." The command looks alarming without context:
```bash
node -e "JSON.parse(require('fs').readFileSync('<path>','utf8'))"
```
If this fails, the file is invalid JSON. Immediately restore from the content read in step 1 and report the error. Do not continue to other files.
After successful verification:
```
Applied N rules to ~/.claude/settings.json
Applied M rules to .claude/settings.json
These commands will no longer trigger permission prompts.
```
If `.claude/settings.json` was modified and is tracked by git, mention that committing it would benefit teammates.
## Edge Cases
- **No project context** (running outside a project): Only offer user-level settings as write target.
- **Settings file doesn't exist**: Create it with `{ "permissions": { "allow": [] } }`. For `.claude/settings.json`, also create the `.claude/` directory if needed.
- **Deny rules**: If a deny rule already blocks a command, warn rather than adding an allow rule (deny takes precedence in Claude Code).

View File

@@ -0,0 +1,661 @@
#!/usr/bin/env node
// Extracts, normalizes, and pre-classifies Bash commands from Claude Code sessions.
// Filters against the current allowlist, groups by normalized pattern, and classifies
// each pattern as green/yellow/red so the model can review rather than classify from scratch.
//
// Usage: node extract-commands.mjs [--days <N>] [--project-slug <slug>] [--min-count 5]
// [--settings <path>] [--settings <path>] ...
//
// Analyzes the most recent sessions, bounded by both count and time.
// Defaults: last 200 sessions or 30 days, whichever is more restrictive.
//
// Output: JSON with { green, yellowFootnote, stats }
import { readdir, readFile, stat } from "node:fs/promises";
import { join } from "node:path";
import { homedir } from "node:os";
const args = process.argv.slice(2);
function flag(name, fallback) {
const i = args.indexOf(`--${name}`);
return i !== -1 && args[i + 1] ? args[i + 1] : fallback;
}
function flagAll(name) {
const results = [];
let i = 0;
while (i < args.length) {
if (args[i] === `--${name}` && args[i + 1]) {
results.push(args[i + 1]);
i += 2;
} else {
i++;
}
}
return results;
}
const days = parseInt(flag("days", "30"), 10);
const maxSessions = parseInt(flag("max-sessions", "500"), 10);
const minCount = parseInt(flag("min-count", "5"), 10);
const projectSlugFilter = flag("project-slug", null);
const settingsPaths = flagAll("settings");
const claudeDir = process.env.CLAUDE_CONFIG_DIR || join(homedir(), ".claude");
const projectsDir = join(claudeDir, "projects");
const cutoff = Date.now() - days * 24 * 60 * 60 * 1000;
// ── Allowlist loading ──────────────────────────────────────────────────────
const allowPatterns = [];
async function loadAllowlist(filePath) {
try {
const content = await readFile(filePath, "utf-8");
const settings = JSON.parse(content);
const allow = settings?.permissions?.allow || [];
for (const rule of allow) {
const match = rule.match(/^Bash\((.+)\)$/);
if (match) {
allowPatterns.push(match[1]);
} else if (rule === "Bash" || rule === "Bash(*)") {
allowPatterns.push("*");
}
}
} catch {
// file doesn't exist or isn't valid JSON
}
}
if (settingsPaths.length === 0) {
settingsPaths.push(join(claudeDir, "settings.json"));
settingsPaths.push(join(process.cwd(), ".claude", "settings.json"));
settingsPaths.push(join(process.cwd(), ".claude", "settings.local.json"));
}
for (const p of settingsPaths) {
await loadAllowlist(p);
}
function isAllowed(command) {
for (const pattern of allowPatterns) {
if (pattern === "*") return true;
if (matchGlob(pattern, command)) return true;
}
return false;
}
function matchGlob(pattern, command) {
const normalized = pattern.replace(/:(\*)$/, " $1");
let regexStr;
if (normalized.endsWith(" *")) {
const base = normalized.slice(0, -2);
const escaped = base.replace(/[.+^${}()|[\]\\]/g, "\\$&");
regexStr = "^" + escaped + "($| .*)";
} else {
regexStr =
"^" +
normalized
.replace(/[.+^${}()|[\]\\]/g, "\\$&")
.replace(/\*/g, ".*") +
"$";
}
try {
return new RegExp(regexStr).test(command);
} catch {
return false;
}
}
// ── Classification rules ───────────────────────────────────────────────────
// RED: patterns that should never be allowlisted with wildcards.
// Checked first -- highest priority.
const RED_PATTERNS = [
// Destructive file ops -- all rm variants
{ test: /^rm\s/, reason: "Irreversible file deletion" },
{ test: /^sudo\s/, reason: "Privilege escalation" },
{ test: /^su\s/, reason: "Privilege escalation" },
// find with destructive actions (must be before GREEN_BASES check)
{ test: /\bfind\b.*\s-delete\b/, reason: "find -delete permanently removes files" },
{ test: /\bfind\b.*\s-exec\s+rm\b/, reason: "find -exec rm permanently removes files" },
// ast-grep rewrite modifies files in place
{ test: /\b(ast-grep|sg)\b.*--rewrite\b/, reason: "ast-grep --rewrite modifies files in place" },
// sed -i edits files in place
{ test: /\bsed\s+.*-i\b/, reason: "sed -i modifies files in place" },
// Git irreversible
{ test: /git\s+(?:\S+\s+)*push\s+.*--force(?!-with-lease)/, reason: "Force push overwrites remote history" },
{ test: /git\s+(?:\S+\s+)*push\s+.*\s-f\b/, reason: "Force push overwrites remote history" },
{ test: /git\s+(?:\S+\s+)*push\s+-f\b/, reason: "Force push overwrites remote history" },
{ test: /git\s+reset\s+--(hard|merge)/, reason: "Destroys uncommitted work" },
{ test: /git\s+clean\s+.*(-[a-z]*f[a-z]*\b|--force\b)/, reason: "Permanently deletes untracked files" },
{ test: /git\s+commit\s+.*--no-verify/, reason: "Skips safety hooks" },
{ test: /git\s+config\s+--system/, reason: "System-wide config change" },
{ test: /git\s+filter-branch/, reason: "Rewrites entire repo history" },
{ test: /git\s+filter-repo/, reason: "Rewrites repo history" },
{ test: /git\s+gc\s+.*--aggressive/, reason: "Can remove recoverable objects" },
{ test: /git\s+reflog\s+expire/, reason: "Removes recovery safety net" },
{ test: /git\s+stash\s+clear\b/, reason: "Removes ALL stash entries permanently" },
{ test: /git\s+branch\s+.*(-D\b|--force\b)/, reason: "Force-deletes without merge check" },
{ test: /git\s+checkout\s+.*\s--\s/, reason: "Discards uncommitted changes" },
{ test: /git\s+checkout\s+--\s/, reason: "Discards uncommitted changes" },
{ test: /git\s+restore\s+(?!.*(-S\b|--staged\b))/, reason: "Discards working tree changes" },
// Publishing -- permanent across all ecosystems
{ test: /\b(npm|yarn|pnpm)\s+publish\b/, reason: "Permanent package publishing" },
{ test: /\bnpm\s+unpublish\b/, reason: "Permanent package removal" },
{ test: /\bcargo\s+publish\b/, reason: "Permanent crate publishing" },
{ test: /\bcargo\s+yank\b/, reason: "Unavails crate version" },
{ test: /\bgem\s+push\b/, reason: "Permanent gem publishing" },
{ test: /\bpoetry\s+publish\b/, reason: "Permanent package publishing" },
{ test: /\btwine\s+upload\b/, reason: "Permanent package publishing" },
{ test: /\bgh\s+release\s+create\b/, reason: "Permanent release creation" },
// Shell injection
{ test: /\|\s*(sh|bash|zsh)\b/, reason: "Pipe to shell execution" },
{ test: /\beval\s/, reason: "Arbitrary code execution" },
// Docker destructive
{ test: /docker\s+run\s+.*--privileged/, reason: "Full host access" },
{ test: /docker\s+system\s+prune\b(?!.*--dry-run)/, reason: "Removes all unused data" },
{ test: /docker\s+volume\s+(rm|prune)\b/, reason: "Permanent data deletion" },
{ test: /docker[- ]compose\s+down\s+.*(-v\b|--volumes\b)/, reason: "Removes volumes and data" },
{ test: /docker[- ]compose\s+down\s+.*--rmi\b/, reason: "Removes all images" },
{ test: /docker\s+(rm|rmi)\s+.*-[a-z]*f/, reason: "Force removes without confirmation" },
// System
{ test: /^reboot\b/, reason: "System restart" },
{ test: /^shutdown\b/, reason: "System halt" },
{ test: /^halt\b/, reason: "System halt" },
{ test: /\bsystemctl\s+(stop|disable|mask)\b/, reason: "Stops system services" },
{ test: /\bkill\s+-9\b/, reason: "Force kill without cleanup" },
{ test: /\bpkill\s+-9\b/, reason: "Force kill by name" },
// Disk destructive
{ test: /\bdd\s+.*\bof=/, reason: "Raw disk write" },
{ test: /\bmkfs\b/, reason: "Formats disk partition" },
// Permissions
{ test: /\bchmod\s+777\b/, reason: "World-writable permissions" },
{ test: /\bchmod\s+-R\b/, reason: "Recursive permission change" },
{ test: /\bchown\s+-R\b/, reason: "Recursive ownership change" },
// Database destructive
{ test: /\bDROP\s+(DATABASE|TABLE|SCHEMA)\b/i, reason: "Permanent data deletion" },
{ test: /\bTRUNCATE\b/i, reason: "Permanent row deletion" },
// Network
{ test: /^(nc|ncat)\s/, reason: "Raw socket access" },
// Credential exposure
{ test: /\bcat\s+\.env.*\|/, reason: "Credential exposure via pipe" },
{ test: /\bprintenv\b.*\|/, reason: "Credential exposure via pipe" },
// Package removal (from DCG)
{ test: /\bpip3?\s+uninstall\b/, reason: "Package removal" },
{ test: /\bapt(?:-get)?\s+(remove|purge|autoremove)\b/, reason: "Package removal" },
{ test: /\bbrew\s+uninstall\b/, reason: "Package removal" },
];
// GREEN: base commands that are always read-only / safe.
// NOTE: `find` is intentionally excluded -- `find -delete` and `find -exec rm`
// are destructive. Safe find usage is handled via GREEN_COMPOUND instead.
const GREEN_BASES = new Set([
"ls", "cat", "head", "tail", "wc", "file", "tree", "stat", "du",
"diff", "grep", "rg", "ag", "ack", "which", "whoami", "pwd", "echo",
"printf", "env", "printenv", "uname", "hostname", "jq", "sort", "uniq",
"tr", "cut", "less", "more", "man", "type", "realpath", "dirname",
"basename", "date", "ps", "top", "htop", "free", "uptime",
"id", "groups", "lsof", "open", "xdg-open",
]);
// GREEN: compound patterns
const GREEN_COMPOUND = [
/--version\s*$/,
/--help(\s|$)/,
/^git\s+(status|log|diff|show|blame|shortlog|branch\s+-[alv]|remote\s+-v|rev-parse|describe|reflog\b(?!\s+expire))\b/,
/^git\s+tag\s+(-l\b|--list\b)/, // tag listing (not creation)
/^git\s+stash\s+(list|show)\b/, // stash read-only operations
/^(npm|bun|pnpm|yarn)\s+run\s+(test|lint|build|check|typecheck)\b/,
/^(npm|bun|pnpm|yarn)\s+(test|lint|audit|outdated|list)\b/,
/^(npx|bunx)\s+(vitest|jest|eslint|prettier|tsc)\b/,
/^(pytest|jest|cargo\s+test|go\s+test|rspec|bundle\s+exec\s+rspec|make\s+test|rake\s+rspec)\b/,
/^(eslint|prettier|rubocop|black|flake8|cargo\s+(clippy|fmt)|gofmt|golangci-lint|tsc(\s+--noEmit)?|mypy|pyright)\b/,
/^(cargo\s+(build|check|doc|bench)|go\s+(build|vet))\b/,
/^pnpm\s+--filter\s/,
/^(npm|bun|pnpm|yarn)\s+(typecheck|format|verify|validate|check|analyze)\b/, // common safe script names
/^git\s+-C\s+\S+\s+(status|log|diff|show|branch|remote|rev-parse|describe)\b/, // git -C <dir> <read-only>
/^docker\s+(ps|images|logs|inspect|stats|system\s+df)\b/,
/^docker[- ]compose\s+(ps|logs|config)\b/,
/^systemctl\s+(status|list-|show|is-|cat)\b/,
/^journalctl\b/,
/^(pg_dump|mysqldump)\b(?!.*--clean)/,
/\b--dry-run\b/,
/^git\s+clean\s+.*(-[a-z]*n|--dry-run)\b/, // git clean dry run
// NOTE: find is intentionally NOT green. Bash(find *) would also match
// find -delete and find -exec rm in Claude Code's allowlist glob matching.
// Commands with mode-switching flags: only green when the normalized pattern
// is narrow enough that the allowlist glob can't match the destructive form.
// Bash(sed -n *) is safe; Bash(sed *) would also match sed -i.
/^sed\s+-(?!i\b)[a-zA-Z]\s/, // sed with a non-destructive flag (matches normalized sed -n *, sed -e *, etc.)
/^(ast-grep|sg)\b(?!.*--rewrite)/, // ast-grep without --rewrite
/^find\s+-(?:name|type|path|iname)\s/, // find with safe predicate flag (matches normalized form)
// gh CLI read-only operations
/^gh\s+(pr|issue|run)\s+(view|list|status|diff|checks)\b/,
/^gh\s+repo\s+(view|list|clone)\b/,
/^gh\s+api\b/,
];
// YELLOW: base commands that modify local state but are recoverable
const YELLOW_BASES = new Set([
"mkdir", "touch", "cp", "mv", "tee", "curl", "wget", "ssh", "scp", "rsync",
"python", "python3", "node", "ruby", "perl", "make", "just",
"awk", // awk can write files; safe forms handled case-by-case if needed
]);
// YELLOW: compound patterns
const YELLOW_COMPOUND = [
/^git\s+(add|commit(?!\s+.*--no-verify)|checkout(?!\s+--\s)|switch|pull|push(?!\s+.*--force)(?!\s+.*-f\b)|fetch|merge|rebase|stash(?!\s+clear\b)|branch\b(?!\s+.*(-D\b|--force\b))|cherry-pick|tag|clone)\b/,
/^git\s+push\s+--force-with-lease\b/,
/^git\s+restore\s+.*(-S\b|--staged\b)/, // restore --staged is safe (just unstages)
/^git\s+gc\b(?!\s+.*--aggressive)/,
/^(npm|bun|pnpm|yarn)\s+install\b/,
/^(npm|bun|pnpm|yarn)\s+(add|remove|uninstall|update)\b/,
/^(npm|bun|pnpm)\s+run\s+(start|dev|serve)\b/,
/^(pip|pip3)\s+install\b(?!\s+https?:)/,
/^bundle\s+install\b/,
/^(cargo\s+add|go\s+get)\b/,
/^docker\s+(build|run(?!\s+.*--privileged)|stop|start)\b/,
/^docker[- ]compose\s+(up|down\b(?!\s+.*(-v\b|--volumes\b|--rmi\b)))/,
/^systemctl\s+restart\b/,
/^kill\s+(?!.*-9)\d/,
/^rake\b/,
// gh CLI write operations (recoverable)
/^gh\s+(pr|issue)\s+(create|edit|comment|close|reopen|merge)\b/,
/^gh\s+run\s+(rerun|cancel|watch)\b/,
];
function classify(command) {
// Extract the first command from compound chains (&&, ||, ;) and pipes
// so that `cd /dir && git branch -D feat` classifies as green (cd),
// not red (git branch -D). This matches what normalize() does.
const compoundMatch = command.match(/^(.+?)\s*(&&|\|\||;)\s*(.+)$/);
if (compoundMatch) return classify(compoundMatch[1].trim());
const pipeMatch = command.match(/^(.+?)\s*\|\s*(.+)$/);
if (pipeMatch && !/\|\s*(sh|bash|zsh)\b/.test(command)) {
return classify(pipeMatch[1].trim());
}
// RED check first (highest priority)
for (const { test, reason } of RED_PATTERNS) {
if (test.test(command)) return { tier: "red", reason };
}
// GREEN checks
const baseCmd = command.split(/\s+/)[0];
if (GREEN_BASES.has(baseCmd)) return { tier: "green" };
for (const re of GREEN_COMPOUND) {
if (re.test(command)) return { tier: "green" };
}
// YELLOW checks
if (YELLOW_BASES.has(baseCmd)) return { tier: "yellow" };
for (const re of YELLOW_COMPOUND) {
if (re.test(command)) return { tier: "yellow" };
}
// Unclassified -- silently dropped from output
return { tier: "unknown" };
}
// ── Normalization ──────────────────────────────────────────────────────────
// Risk-modifying flags that must NOT be collapsed into wildcards.
// Global flags are always preserved; context-specific flags only matter
// for certain base commands.
const GLOBAL_RISK_FLAGS = new Set([
"--force", "--hard", "-rf", "--privileged", "--no-verify",
"--system", "--force-with-lease", "-D", "--force-if-includes",
"--volumes", "--rmi", "--rewrite", "--delete",
]);
// Flags that are only risky for specific base commands.
// -f means force-push in git, force-remove in docker, but pattern-file in grep.
// -v means remove-volumes in docker-compose, but verbose everywhere else.
const CONTEXTUAL_RISK_FLAGS = {
"-f": new Set(["git", "docker", "rm"]),
"-v": new Set(["docker", "docker-compose"]),
};
function isRiskFlag(token, base) {
if (GLOBAL_RISK_FLAGS.has(token)) return true;
// Check context-specific flags
const contexts = CONTEXTUAL_RISK_FLAGS[token];
if (contexts && base && contexts.has(base)) return true;
// Combined short flags containing risk chars: -rf, -fr, -fR, etc.
if (/^-[a-zA-Z]*[rf][a-zA-Z]*$/.test(token) && token.length <= 4) return true;
return false;
}
function normalize(command) {
// Don't normalize shell injection patterns
if (/\|\s*(sh|bash|zsh)\b/.test(command)) return command;
// Don't normalize sudo -- keep as-is
if (/^sudo\s/.test(command)) return "sudo *";
// Handle pnpm --filter <pkg> <subcommand> specially
const pnpmFilter = command.match(/^pnpm\s+--filter\s+\S+\s+(\S+)/);
if (pnpmFilter) return "pnpm --filter * " + pnpmFilter[1] + " *";
// Handle sed specially -- preserve the mode flag to keep safe patterns narrow.
// sed -i (in-place) is destructive; sed -n, sed -e, bare sed are read-only.
if (/^sed\s/.test(command)) {
if (/\s-i\b/.test(command)) return "sed -i *";
const sedFlag = command.match(/^sed\s+(-[a-zA-Z])\s/);
return sedFlag ? "sed " + sedFlag[1] + " *" : "sed *";
}
// Handle ast-grep specially -- preserve --rewrite flag.
if (/^(ast-grep|sg)\s/.test(command)) {
const base = command.startsWith("sg") ? "sg" : "ast-grep";
return /\s--rewrite\b/.test(command) ? base + " --rewrite *" : base + " *";
}
// Handle find specially -- preserve key action flags.
// find -delete and find -exec rm are destructive; find -name/-type are safe.
if (/^find\s/.test(command)) {
if (/\s-delete\b/.test(command)) return "find -delete *";
if (/\s-exec\s/.test(command)) return "find -exec *";
// Extract the first predicate flag for a narrower safe pattern
const findFlag = command.match(/\s(-(?:name|type|path|iname))\s/);
return findFlag ? "find " + findFlag[1] + " *" : "find *";
}
// Handle git -C <dir> <subcommand> -- strip the -C <dir> and normalize the git subcommand
const gitC = command.match(/^git\s+-C\s+\S+\s+(.+)$/);
if (gitC) return normalize("git " + gitC[1]);
// Split on compound operators -- normalize the first command only
const compoundMatch = command.match(/^(.+?)\s*(&&|\|\||;)\s*(.+)$/);
if (compoundMatch) {
return normalize(compoundMatch[1].trim());
}
// Strip trailing pipe chains for normalization (e.g., `cmd | tail -5`)
// but preserve pipe-to-shell (already handled by shell injection check above)
const pipeMatch = command.match(/^(.+?)\s*\|\s*(.+)$/);
if (pipeMatch) {
return normalize(pipeMatch[1].trim());
}
// Strip trailing redirections (2>&1, > file, >> file)
const cleaned = command.replace(/\s*[12]?>>?\s*\S+\s*$/, "").replace(/\s*2>&1\s*$/, "").trim();
const parts = cleaned.split(/\s+/);
if (parts.length === 0) return command;
const base = parts[0];
// For git/docker/gh/npm etc, include the subcommand
const multiWordBases = ["git", "docker", "docker-compose", "gh", "npm", "bun",
"pnpm", "yarn", "cargo", "pip", "pip3", "bundle", "systemctl", "kubectl"];
let prefix = base;
let argStart = 1;
if (multiWordBases.includes(base) && parts.length > 1) {
prefix = base + " " + parts[1];
argStart = 2;
}
// Preserve risk-modifying flags in the remaining args
const preservedFlags = [];
for (let i = argStart; i < parts.length; i++) {
if (isRiskFlag(parts[i], base)) {
preservedFlags.push(parts[i]);
}
}
// Build the normalized pattern
if (parts.length <= argStart && preservedFlags.length === 0) {
return prefix; // no args, no flags: e.g., "git status"
}
const flagStr = preservedFlags.length > 0 ? " " + preservedFlags.join(" ") : "";
const hasVaryingArgs = parts.length > argStart + preservedFlags.length;
if (hasVaryingArgs) {
return prefix + flagStr + " *";
}
return prefix + flagStr;
}
// ── Session file scanning ──────────────────────────────────────────────────
const commands = new Map();
let filesScanned = 0;
const sessionsScanned = new Set();
async function listDirs(dir) {
try {
const entries = await readdir(dir, { withFileTypes: true });
return entries.filter((e) => e.isDirectory()).map((e) => e.name);
} catch {
return [];
}
}
async function listJsonlFiles(dir) {
try {
const entries = await readdir(dir, { withFileTypes: true });
return entries
.filter((e) => e.isFile() && e.name.endsWith(".jsonl"))
.map((e) => e.name);
} catch {
return [];
}
}
async function processFile(filePath, sessionId) {
try {
filesScanned++;
sessionsScanned.add(sessionId);
const content = await readFile(filePath, "utf-8");
for (const line of content.split("\n")) {
if (!line.includes('"Bash"')) continue;
try {
const record = JSON.parse(line);
if (record.type !== "assistant") continue;
const blocks = record.message?.content;
if (!Array.isArray(blocks)) continue;
for (const block of blocks) {
if (block.type !== "tool_use" || block.name !== "Bash") continue;
const cmd = block.input?.command;
if (!cmd) continue;
const ts = record.timestamp
? new Date(record.timestamp).getTime()
: info.mtimeMs;
const existing = commands.get(cmd);
if (existing) {
existing.count++;
existing.sessions.add(sessionId);
existing.firstSeen = Math.min(existing.firstSeen, ts);
existing.lastSeen = Math.max(existing.lastSeen, ts);
} else {
commands.set(cmd, {
count: 1,
sessions: new Set([sessionId]),
firstSeen: ts,
lastSeen: ts,
});
}
}
} catch {
// skip malformed lines
}
}
} catch {
// skip unreadable files
}
}
// Collect all candidate session files, then sort by recency and limit
const candidates = [];
const projectSlugs = await listDirs(projectsDir);
for (const slug of projectSlugs) {
if (projectSlugFilter && slug !== projectSlugFilter) continue;
const slugDir = join(projectsDir, slug);
const jsonlFiles = await listJsonlFiles(slugDir);
for (const f of jsonlFiles) {
const filePath = join(slugDir, f);
try {
const info = await stat(filePath);
if (info.mtimeMs >= cutoff) {
candidates.push({ filePath, sessionId: f.replace(".jsonl", ""), mtime: info.mtimeMs });
}
} catch {
// skip unreadable files
}
}
}
// Sort by most recent first, then take at most maxSessions
candidates.sort((a, b) => b.mtime - a.mtime);
const toProcess = candidates.slice(0, maxSessions);
await Promise.all(
toProcess.map((c) => processFile(c.filePath, c.sessionId))
);
// ── Filter, normalize, group, classify ─────────────────────────────────────
const totalExtracted = commands.size;
let alreadyCovered = 0;
let belowThreshold = 0;
// Group raw commands by normalized pattern, tracking unique sessions per group.
// Normalize and group FIRST, then apply the min-count threshold to the grouped
// totals. This prevents many low-frequency variants of the same pattern from
// being individually discarded as noise when they collectively exceed the threshold.
const patternGroups = new Map();
for (const [command, data] of commands) {
if (isAllowed(command)) {
alreadyCovered++;
continue;
}
const pattern = "Bash(" + normalize(command) + ")";
const { tier, reason } = classify(command);
const existing = patternGroups.get(pattern);
if (existing) {
existing.rawCommands.push({ command, count: data.count });
existing.totalCount += data.count;
// Merge session sets to avoid overcounting
for (const s of data.sessions) existing.sessionSet.add(s);
// Escalation: highest tier wins
if (tier === "red" && existing.tier !== "red") {
existing.tier = "red";
existing.reason = reason;
} else if (tier === "yellow" && existing.tier === "green") {
existing.tier = "yellow";
} else if (tier === "unknown" && existing.tier === "green") {
existing.tier = "unknown";
}
} else {
patternGroups.set(pattern, {
rawCommands: [{ command, count: data.count }],
totalCount: data.count,
sessionSet: new Set(data.sessions),
tier,
reason: reason || null,
});
}
}
// Now filter by min-count on the GROUPED totals
for (const [pattern, data] of patternGroups) {
if (data.totalCount < minCount) {
belowThreshold += data.rawCommands.length;
patternGroups.delete(pattern);
}
}
// Post-grouping safety check: normalization can broaden a safe command into an
// unsafe pattern (e.g., "node --version" is green, but normalizes to "node *"
// which would also match arbitrary code execution). Re-classify the normalized
// pattern itself and escalate if the broader form is riskier.
for (const [pattern, data] of patternGroups) {
if (data.tier !== "green") continue;
if (!pattern.includes("*")) continue;
const cmd = pattern.replace(/^Bash\(|\)$/g, "");
const { tier, reason } = classify(cmd);
if (tier === "red") {
data.tier = "red";
data.reason = reason;
} else if (tier === "yellow") {
data.tier = "yellow";
} else if (tier === "unknown") {
data.tier = "unknown";
}
}
// Only output green (safe) patterns. Yellow, red, and unknown are counted
// in stats for transparency but not included as arrays.
const green = [];
let greenRawCount = 0; // unique raw commands covered by green patterns
let yellowCount = 0;
const redBlocked = [];
let unclassified = 0;
const yellowNames = []; // brief list for the footnote
for (const [pattern, data] of patternGroups) {
switch (data.tier) {
case "green":
green.push({
pattern,
count: data.totalCount,
sessions: data.sessionSet.size,
examples: data.rawCommands
.sort((a, b) => b.count - a.count)
.slice(0, 3)
.map((c) => c.command),
});
greenRawCount += data.rawCommands.length;
break;
case "yellow":
yellowCount++;
yellowNames.push(pattern.replace(/^Bash\(|\)$/g, "").replace(/ \*$/, ""));
break;
case "red":
redBlocked.push({
pattern: pattern.replace(/^Bash\(|\)$/g, ""),
reason: data.reason,
count: data.totalCount,
});
break;
default:
unclassified++;
}
}
green.sort((a, b) => b.count - a.count);
redBlocked.sort((a, b) => b.count - a.count);
const output = {
green,
redExamples: redBlocked.slice(0, 5),
yellowFootnote: yellowNames.length > 0
? `Also frequently used: ${yellowNames.join(", ")} (not classified as safe to auto-allow but may be worth reviewing)`
: null,
stats: {
totalExtracted,
alreadyCovered,
belowThreshold,
unclassified,
yellowSkipped: yellowCount,
redBlocked: redBlocked.length,
patternsReturned: green.length,
greenRawCount,
sessionsScanned: sessionsScanned.size,
filesScanned,
allowPatternsLoaded: allowPatterns.length,
daysWindow: days,
minCount,
},
};
console.log(JSON.stringify(output, null, 2));

View File

@@ -1,9 +0,0 @@
---
name: create-agent-skill
description: Create or edit Claude Code skills with expert guidance on structure and best practices
allowed-tools: Skill(create-agent-skills)
argument-hint: "[skill description or requirements]"
disable-model-invocation: true
---
Invoke the create-agent-skills skill for: $ARGUMENTS

View File

@@ -1,264 +0,0 @@
---
name: create-agent-skills
description: Expert guidance for creating Claude Code skills and slash commands. Use when working with SKILL.md files, authoring new skills, improving existing skills, creating slash commands, or understanding skill structure and best practices.
---
# Creating Skills & Commands
This skill teaches how to create effective Claude Code skills following the official specification from [code.claude.com/docs/en/skills](https://code.claude.com/docs/en/skills).
## Commands and Skills Are Now The Same Thing
Custom slash commands have been merged into skills. A file at `.claude/commands/review.md` and a skill at `.claude/skills/review/SKILL.md` both create `/review` and work the same way. Existing `.claude/commands/` files keep working. Skills add optional features: a directory for supporting files, frontmatter to control invocation, and automatic context loading.
**If a skill and a command share the same name, the skill takes precedence.**
## When To Create What
**Use a command file** (`commands/name.md`) when:
- Simple, single-file workflow
- No supporting files needed
- Task-oriented action (deploy, commit, triage)
**Use a skill directory** (`skills/name/SKILL.md`) when:
- Need supporting reference files, scripts, or templates
- Background knowledge Claude should auto-load
- Complex enough to benefit from progressive disclosure
Both use identical YAML frontmatter and markdown content format.
## Standard Markdown Format
Use YAML frontmatter + markdown body with **standard markdown headings**. Keep it clean and direct.
```markdown
---
name: my-skill-name
description: What it does and when to use it
---
# My Skill Name
## Quick Start
Immediate actionable guidance...
## Instructions
Step-by-step procedures...
## Examples
Concrete usage examples...
```
## Frontmatter Reference
All fields are optional. Only `description` is recommended.
| Field | Required | Description |
|-------|----------|-------------|
| `name` | No | Display name. Lowercase letters, numbers, hyphens (max 64 chars). Defaults to directory name. |
| `description` | Recommended | What it does AND when to use it. Claude uses this for auto-discovery. Max 1024 chars. |
| `argument-hint` | No | Hint shown during autocomplete. Example: `[issue-number]` |
| `disable-model-invocation` | No | Set `true` to prevent Claude auto-loading. Use for manual workflows like `/deploy`, `/commit`. Default: `false`. |
| `user-invocable` | No | Set `false` to hide from `/` menu. Use for background knowledge. Default: `true`. |
| `allowed-tools` | No | Tools Claude can use without permission prompts. Example: `Read, Bash(git *)` |
| `model` | No | Model to use. Options: `haiku`, `sonnet`, `opus`. |
| `context` | No | Set `fork` to run in isolated subagent context. |
| `agent` | No | Subagent type when `context: fork`. Options: `Explore`, `Plan`, `general-purpose`, or custom agent name. |
### Invocation Control
| Frontmatter | User can invoke | Claude can invoke | When loaded |
|-------------|----------------|-------------------|-------------|
| (default) | Yes | Yes | Description always in context, full content loads when invoked |
| `disable-model-invocation: true` | Yes | No | Description not in context, loads only when user invokes |
| `user-invocable: false` | No | Yes | Description always in context, loads when relevant |
**Use `disable-model-invocation: true`** for workflows with side effects: `/deploy`, `/commit`, `/triage-prs`, `/send-slack-message`. You don't want Claude deciding to deploy because your code looks ready.
**Use `user-invocable: false`** for background knowledge that isn't a meaningful user action: coding conventions, domain context, legacy system docs.
## Dynamic Features
### Arguments
Use `$ARGUMENTS` placeholder for user input. If not present in content, arguments are appended automatically.
```yaml
---
name: fix-issue
description: Fix a GitHub issue
disable-model-invocation: true
---
Fix GitHub issue $ARGUMENTS following our coding standards.
```
Access individual args: `$ARGUMENTS[0]` or shorthand `$0`, `$1`, `$2`.
### Dynamic Context Injection
Skills support dynamic context injection: prefix a backtick-wrapped shell command with an exclamation mark, and the preprocessor executes it at load time, replacing the directive with stdout. Write an exclamation mark immediately before the opening backtick of the command you want executed (for example, to inject the current git branch, write the exclamation mark followed by `git branch --show-current` wrapped in backticks).
**Important:** The preprocessor scans the entire SKILL.md as plain text — it does not parse markdown. Directives inside fenced code blocks or inline code spans are still executed. If a skill documents this syntax with literal examples, the preprocessor will attempt to run them, causing load failures. To safely document this feature, describe it in prose (as done here) or place examples in a reference file, which is loaded on-demand by Claude and not preprocessed.
For a concrete example of dynamic context injection in a skill, see [official-spec.md](references/official-spec.md) § "Dynamic Context Injection".
### Running in a Subagent
Add `context: fork` to run in isolation. The skill content becomes the subagent's prompt. It won't have conversation history.
```yaml
---
name: deep-research
description: Research a topic thoroughly
context: fork
agent: Explore
---
Research $ARGUMENTS thoroughly:
1. Find relevant files
2. Analyze the code
3. Summarize findings
```
## Progressive Disclosure
Keep SKILL.md under 500 lines. Split detailed content into reference files:
```
my-skill/
├── SKILL.md # Entry point (required, overview + navigation)
├── reference.md # Detailed docs (loaded when needed)
├── examples.md # Usage examples (loaded when needed)
└── scripts/
└── helper.py # Utility script (executed, not loaded)
```
Link from SKILL.md: `For API details, see [reference.md](reference.md).`
Keep references **one level deep** from SKILL.md. Avoid nested chains.
## Effective Descriptions
The description enables skill discovery. Include both **what** it does and **when** to use it.
**Good:**
```yaml
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
```
**Bad:**
```yaml
description: Helps with documents
```
## What Would You Like To Do?
1. **Create new skill** - Build from scratch
2. **Create new command** - Build a slash command
3. **Audit existing skill** - Check against best practices
4. **Add component** - Add workflow/reference/example
5. **Get guidance** - Understand skill design
## Creating a New Skill or Command
### Step 1: Choose Type
Ask: Is this a manual workflow (deploy, commit, triage) or background knowledge (conventions, patterns)?
- **Manual workflow** → command with `disable-model-invocation: true`
- **Background knowledge** → skill without `disable-model-invocation`
- **Complex with supporting files** → skill directory
### Step 2: Create the File
**Command:**
```markdown
---
name: my-command
description: What this command does
argument-hint: [expected arguments]
disable-model-invocation: true
allowed-tools: Bash(gh *), Read
---
# Command Title
## Workflow
### Step 1: Gather Context
...
### Step 2: Execute
...
## Success Criteria
- [ ] Expected outcome 1
- [ ] Expected outcome 2
```
**Skill:**
```markdown
---
name: my-skill
description: What it does. Use when [trigger conditions].
---
# Skill Title
## Quick Start
[Immediate actionable example]
## Instructions
[Core guidance]
## Examples
[Concrete input/output pairs]
```
### Step 3: Add Reference Files (If Needed)
Link from SKILL.md to detailed content:
```markdown
For API reference, see [reference.md](reference.md).
For form filling guide, see [forms.md](forms.md).
```
### Step 4: Test With Real Usage
1. Test with actual tasks, not test scenarios
2. Invoke directly with `/skill-name` to verify
3. Check auto-triggering by asking something that matches the description
4. Refine based on real behavior
## Audit Checklist
- [ ] Valid YAML frontmatter (name + description)
- [ ] Description includes trigger keywords and is specific
- [ ] Uses standard markdown headings (not XML tags)
- [ ] SKILL.md under 500 lines
- [ ] `disable-model-invocation: true` if it has side effects
- [ ] `allowed-tools` set if specific tools needed
- [ ] References one level deep, properly linked
- [ ] Examples are concrete, not abstract
- [ ] Tested with real usage
## Anti-Patterns to Avoid
- **XML tags in body** - Use standard markdown headings
- **Vague descriptions** - Be specific with trigger keywords
- **Deep nesting** - Keep references one level from SKILL.md
- **Missing invocation control** - Side-effect workflows need `disable-model-invocation: true`
- **Too many options** - Provide a default with escape hatch
- **Punting to Claude** - Scripts should handle errors explicitly
## Reference Files
For detailed guidance, see:
- [official-spec.md](references/official-spec.md) - Official skill specification
- [best-practices.md](references/best-practices.md) - Skill authoring best practices
## Sources
- [Extend Claude with skills - Official Docs](https://code.claude.com/docs/en/skills)
- [GitHub - anthropics/skills](https://github.com/anthropics/skills)

View File

@@ -1,226 +0,0 @@
<overview>
When building skills that make API calls requiring credentials (API keys, tokens, secrets), follow this protocol to prevent credentials from appearing in chat.
</overview>
<the_problem>
Raw curl commands with environment variables expose credentials:
```bash
# ❌ BAD - API key visible in chat
curl -H "Authorization: Bearer $API_KEY" https://api.example.com/data
```
When Claude executes this, the full command with expanded `$API_KEY` appears in the conversation.
</the_problem>
<the_solution>
Use `~/.claude/scripts/secure-api.sh` - a wrapper that loads credentials internally.
<for_supported_services>
```bash
# ✅ GOOD - No credentials visible
~/.claude/scripts/secure-api.sh <service> <operation> [args]
# Examples:
~/.claude/scripts/secure-api.sh facebook list-campaigns
~/.claude/scripts/secure-api.sh ghl search-contact "email@example.com"
```
</for_supported_services>
<adding_new_services>
When building a new skill that requires API calls:
1. **Add operations to the wrapper** (`~/.claude/scripts/secure-api.sh`):
```bash
case "$SERVICE" in
yourservice)
case "$OPERATION" in
list-items)
curl -s -G \
-H "Authorization: Bearer $YOUR_API_KEY" \
"https://api.yourservice.com/items"
;;
get-item)
ITEM_ID=$1
curl -s -G \
-H "Authorization: Bearer $YOUR_API_KEY" \
"https://api.yourservice.com/items/$ITEM_ID"
;;
*)
echo "Unknown operation: $OPERATION" >&2
exit 1
;;
esac
;;
esac
```
2. **Add profile support to the wrapper** (if service needs multiple accounts):
```bash
# In secure-api.sh, add to profile remapping section:
yourservice)
SERVICE_UPPER="YOURSERVICE"
YOURSERVICE_API_KEY=$(eval echo \$${SERVICE_UPPER}_${PROFILE_UPPER}_API_KEY)
YOURSERVICE_ACCOUNT_ID=$(eval echo \$${SERVICE_UPPER}_${PROFILE_UPPER}_ACCOUNT_ID)
;;
```
3. **Add credential placeholders to `~/.claude/.env`** using profile naming:
```bash
# Check if entries already exist
grep -q "YOURSERVICE_MAIN_API_KEY=" ~/.claude/.env 2>/dev/null || \
echo -e "\n# Your Service - Main profile\nYOURSERVICE_MAIN_API_KEY=\nYOURSERVICE_MAIN_ACCOUNT_ID=" >> ~/.claude/.env
echo "Added credential placeholders to ~/.claude/.env - user needs to fill them in"
```
4. **Document profile workflow in your SKILL.md**:
```markdown
## Profile Selection Workflow
**CRITICAL:** Always use profile selection to prevent using wrong account credentials.
### When user requests YourService operation:
1. **Check for saved profile:**
```bash
~/.claude/scripts/profile-state get yourservice
```
2. **If no profile saved, discover available profiles:**
```bash
~/.claude/scripts/list-profiles yourservice
```
3. **If only ONE profile:** Use it automatically and announce:
```
"Using YourService profile 'main' to list items..."
```
4. **If MULTIPLE profiles:** Ask user which one:
```
"Which YourService profile: main, clienta, or clientb?"
```
5. **Save user's selection:**
```bash
~/.claude/scripts/profile-state set yourservice <selected_profile>
```
6. **Always announce which profile before calling API:**
```
"Using YourService profile 'main' to list items..."
```
7. **Make API call with profile:**
```bash
~/.claude/scripts/secure-api.sh yourservice:<profile> list-items
```
## Secure API Calls
All API calls use profile syntax:
```bash
~/.claude/scripts/secure-api.sh yourservice:<profile> <operation> [args]
# Examples:
~/.claude/scripts/secure-api.sh yourservice:main list-items
~/.claude/scripts/secure-api.sh yourservice:main get-item <ITEM_ID>
```
**Profile persists for session:** Once selected, use same profile for subsequent operations unless user explicitly changes it.
```
</adding_new_services>
</the_solution>
<pattern_guidelines>
<simple_get_requests>
```bash
curl -s -G \
-H "Authorization: Bearer $API_KEY" \
"https://api.example.com/endpoint"
```
</simple_get_requests>
<post_with_json_body>
```bash
ITEM_ID=$1
curl -s -X POST \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d @- \
"https://api.example.com/items/$ITEM_ID"
```
Usage:
```bash
echo '{"name":"value"}' | ~/.claude/scripts/secure-api.sh service create-item
```
</post_with_json_body>
<post_with_form_data>
```bash
curl -s -X POST \
-F "field1=value1" \
-F "field2=value2" \
-F "access_token=$API_TOKEN" \
"https://api.example.com/endpoint"
```
</post_with_form_data>
</pattern_guidelines>
<credential_storage>
**Location:** `~/.claude/.env` (global for all skills, accessible from any directory)
**Format:**
```bash
# Service credentials
SERVICE_API_KEY=your-key-here
SERVICE_ACCOUNT_ID=account-id-here
# Another service
OTHER_API_TOKEN=token-here
OTHER_BASE_URL=https://api.other.com
```
**Loading in script:**
```bash
set -a
source ~/.claude/.env 2>/dev/null || { echo "Error: ~/.claude/.env not found" >&2; exit 1; }
set +a
```
</credential_storage>
<best_practices>
1. **Never use raw curl with `$VARIABLE` in skill examples** - always use the wrapper
2. **Add all operations to the wrapper** - don't make users figure out curl syntax
3. **Auto-create credential placeholders** - add empty fields to `~/.claude/.env` immediately when creating the skill
4. **Keep credentials in `~/.claude/.env`** - one central location, works everywhere
5. **Document each operation** - show examples in SKILL.md
6. **Handle errors gracefully** - check for missing env vars, show helpful error messages
</best_practices>
<testing>
Test the wrapper without exposing credentials:
```bash
# This command appears in chat
~/.claude/scripts/secure-api.sh facebook list-campaigns
# But API keys never appear - they're loaded inside the script
```
Verify credentials are loaded:
```bash
# Check .env exists
ls -la ~/.claude/.env
# Check specific variables (without showing values)
grep -q "YOUR_API_KEY=" ~/.claude/.env && echo "API key configured" || echo "API key missing"
```
</testing>

View File

@@ -1,531 +0,0 @@
<golden_rule>
Show your skill to someone with minimal context and ask them to follow the instructions. If they're confused, Claude will likely be too.
</golden_rule>
<overview>
Clarity and directness are fundamental to effective skill authoring. Clear instructions reduce errors, improve execution quality, and minimize token waste.
</overview>
<guidelines>
<contextual_information>
Give Claude contextual information that frames the task:
- What the task results will be used for
- What audience the output is meant for
- What workflow the task is part of
- The end goal or what successful completion looks like
Context helps Claude make better decisions and produce more appropriate outputs.
<example>
```xml
<context>
This analysis will be presented to investors who value transparency and actionable insights. Focus on financial metrics and clear recommendations.
</context>
```
</example>
</contextual_information>
<specificity>
Be specific about what you want Claude to do. If you want code only and nothing else, say so.
**Vague**: "Help with the report"
**Specific**: "Generate a markdown report with three sections: Executive Summary, Key Findings, Recommendations"
**Vague**: "Process the data"
**Specific**: "Extract customer names and email addresses from the CSV file, removing duplicates, and save to JSON format"
Specificity eliminates ambiguity and reduces iteration cycles.
</specificity>
<sequential_steps>
Provide instructions as sequential steps. Use numbered lists or bullet points.
```xml
<workflow>
1. Extract data from source file
2. Transform to target format
3. Validate transformation
4. Save to output file
5. Verify output correctness
</workflow>
```
Sequential steps create clear expectations and reduce the chance Claude skips important operations.
</sequential_steps>
</guidelines>
<example_comparison>
<unclear_example>
```xml
<quick_start>
Please remove all personally identifiable information from these customer feedback messages: {{FEEDBACK_DATA}}
</quick_start>
```
**Problems**:
- What counts as PII?
- What should replace PII?
- What format should the output be?
- What if no PII is found?
- Should product names be redacted?
</unclear_example>
<clear_example>
```xml
<objective>
Anonymize customer feedback for quarterly review presentation.
</objective>
<quick_start>
<instructions>
1. Replace all customer names with "CUSTOMER_[ID]" (e.g., "Jane Doe" → "CUSTOMER_001")
2. Replace email addresses with "EMAIL_[ID]@example.com"
3. Redact phone numbers as "PHONE_[ID]"
4. If a message mentions a specific product (e.g., "AcmeCloud"), leave it intact
5. If no PII is found, copy the message verbatim
6. Output only the processed messages, separated by "---"
</instructions>
Data to process: {{FEEDBACK_DATA}}
</quick_start>
<success_criteria>
- All customer names replaced with IDs
- All emails and phones redacted
- Product names preserved
- Output format matches specification
</success_criteria>
```
**Why this is better**:
- States the purpose (quarterly review)
- Provides explicit step-by-step rules
- Defines output format clearly
- Specifies edge cases (product names, no PII found)
- Defines success criteria
</clear_example>
</example_comparison>
<key_differences>
The clear version:
- States the purpose (quarterly review)
- Provides explicit step-by-step rules
- Defines output format
- Specifies edge cases (product names, no PII found)
- Includes success criteria
The unclear version leaves all these decisions to Claude, increasing the chance of misalignment with expectations.
</key_differences>
<show_dont_just_tell>
<principle>
When format matters, show an example rather than just describing it.
</principle>
<telling_example>
```xml
<commit_messages>
Generate commit messages in conventional format with type, scope, and description.
</commit_messages>
```
</telling_example>
<showing_example>
```xml
<commit_message_format>
Generate commit messages following these examples:
<example number="1">
<input>Added user authentication with JWT tokens</input>
<output>
```
feat(auth): implement JWT-based authentication
Add login endpoint and token validation middleware
```
</output>
</example>
<example number="2">
<input>Fixed bug where dates displayed incorrectly in reports</input>
<output>
```
fix(reports): correct date formatting in timezone conversion
Use UTC timestamps consistently across report generation
```
</output>
</example>
Follow this style: type(scope): brief description, then detailed explanation.
</commit_message_format>
```
</showing_example>
<why_showing_works>
Examples communicate nuances that text descriptions can't:
- Exact formatting (spacing, capitalization, punctuation)
- Tone and style
- Level of detail
- Pattern across multiple cases
Claude learns patterns from examples more reliably than from descriptions.
</why_showing_works>
</show_dont_just_tell>
<avoid_ambiguity>
<principle>
Eliminate words and phrases that create ambiguity or leave decisions open.
</principle>
<ambiguous_phrases>
**"Try to..."** - Implies optional
**"Always..."** or **"Never..."** - Clear requirement
**"Should probably..."** - Unclear obligation
**"Must..."** or **"May optionally..."** - Clear obligation level
**"Generally..."** - When are exceptions allowed?
**"Always... except when..."** - Clear rule with explicit exceptions
**"Consider..."** - Should Claude always do this or only sometimes?
**"If X, then Y"** or **"Always..."** - Clear conditions
</ambiguous_phrases>
<example>
**Ambiguous**:
```xml
<validation>
You should probably validate the output and try to fix any errors.
</validation>
```
**Clear**:
```xml
<validation>
Always validate output before proceeding:
```bash
python scripts/validate.py output_dir/
```
If validation fails, fix errors and re-validate. Only proceed when validation passes with zero errors.
</validation>
```
</example>
</avoid_ambiguity>
<define_edge_cases>
<principle>
Anticipate edge cases and define how to handle them. Don't leave Claude guessing.
</principle>
<without_edge_cases>
```xml
<quick_start>
Extract email addresses from the text file and save to a JSON array.
</quick_start>
```
**Questions left unanswered**:
- What if no emails are found?
- What if the same email appears multiple times?
- What if emails are malformed?
- What JSON format exactly?
</without_edge_cases>
<with_edge_cases>
```xml
<quick_start>
Extract email addresses from the text file and save to a JSON array.
<edge_cases>
- **No emails found**: Save empty array `[]`
- **Duplicate emails**: Keep only unique emails
- **Malformed emails**: Skip invalid formats, log to stderr
- **Output format**: Array of strings, one email per element
</edge_cases>
<example_output>
```json
[
"user1@example.com",
"user2@example.com"
]
```
</example_output>
</quick_start>
```
</with_edge_cases>
</define_edge_cases>
<output_format_specification>
<principle>
When output format matters, specify it precisely. Show examples.
</principle>
<vague_format>
```xml
<output>
Generate a report with the analysis results.
</output>
```
</vague_format>
<specific_format>
```xml
<output_format>
Generate a markdown report with this exact structure:
```markdown
# Analysis Report: [Title]
## Executive Summary
[1-2 paragraphs summarizing key findings]
## Key Findings
- Finding 1 with supporting data
- Finding 2 with supporting data
- Finding 3 with supporting data
## Recommendations
1. Specific actionable recommendation
2. Specific actionable recommendation
## Appendix
[Raw data and detailed calculations]
```
**Requirements**:
- Use exactly these section headings
- Executive summary must be 1-2 paragraphs
- List 3-5 key findings
- Provide 2-4 recommendations
- Include appendix with source data
</output_format>
```
</specific_format>
</output_format_specification>
<decision_criteria>
<principle>
When Claude must make decisions, provide clear criteria.
</principle>
<no_criteria>
```xml
<workflow>
Analyze the data and decide which visualization to use.
</workflow>
```
**Problem**: What factors should guide this decision?
</no_criteria>
<with_criteria>
```xml
<workflow>
Analyze the data and select appropriate visualization:
<decision_criteria>
**Use bar chart when**:
- Comparing quantities across categories
- Fewer than 10 categories
- Exact values matter
**Use line chart when**:
- Showing trends over time
- Continuous data
- Pattern recognition matters more than exact values
**Use scatter plot when**:
- Showing relationship between two variables
- Looking for correlations
- Individual data points matter
</decision_criteria>
</workflow>
```
**Benefits**: Claude has objective criteria for making the decision rather than guessing.
</with_criteria>
</decision_criteria>
<constraints_and_requirements>
<principle>
Clearly separate "must do" from "nice to have" from "must not do".
</principle>
<unclear_requirements>
```xml
<requirements>
The report should include financial data, customer metrics, and market analysis. It would be good to have visualizations. Don't make it too long.
</requirements>
```
**Problems**:
- Are all three content types required?
- Are visualizations optional or required?
- How long is "too long"?
</unclear_requirements>
<clear_requirements>
```xml
<requirements>
<must_have>
- Financial data (revenue, costs, profit margins)
- Customer metrics (acquisition, retention, lifetime value)
- Market analysis (competition, trends, opportunities)
- Maximum 5 pages
</must_have>
<nice_to_have>
- Charts and visualizations
- Industry benchmarks
- Future projections
</nice_to_have>
<must_not>
- Include confidential customer names
- Exceed 5 pages
- Use technical jargon without definitions
</must_not>
</requirements>
```
**Benefits**: Clear priorities and constraints prevent misalignment.
</clear_requirements>
</constraints_and_requirements>
<success_criteria>
<principle>
Define what success looks like. How will Claude know it succeeded?
</principle>
<without_success_criteria>
```xml
<objective>
Process the CSV file and generate a report.
</objective>
```
**Problem**: When is this task complete? What defines success?
</without_success_criteria>
<with_success_criteria>
```xml
<objective>
Process the CSV file and generate a summary report.
</objective>
<success_criteria>
- All rows in CSV successfully parsed
- No data validation errors
- Report generated with all required sections
- Report saved to output/report.md
- Output file is valid markdown
- Process completes without errors
</success_criteria>
```
**Benefits**: Clear completion criteria eliminate ambiguity about when the task is done.
</with_success_criteria>
</success_criteria>
<testing_clarity>
<principle>
Test your instructions by asking: "Could I hand these instructions to a junior developer and expect correct results?"
</principle>
<testing_process>
1. Read your skill instructions
2. Remove context only you have (project knowledge, unstated assumptions)
3. Identify ambiguous terms or vague requirements
4. Add specificity where needed
5. Test with someone who doesn't have your context
6. Iterate based on their questions and confusion
If a human with minimal context struggles, Claude will too.
</testing_process>
</testing_clarity>
<practical_examples>
<example domain="data_processing">
**Unclear**:
```xml
<quick_start>
Clean the data and remove bad entries.
</quick_start>
```
**Clear**:
```xml
<quick_start>
<data_cleaning>
1. Remove rows where required fields (name, email, date) are empty
2. Standardize date format to YYYY-MM-DD
3. Remove duplicate entries based on email address
4. Validate email format (must contain @ and domain)
5. Save cleaned data to output/cleaned_data.csv
</data_cleaning>
<success_criteria>
- No empty required fields
- All dates in YYYY-MM-DD format
- No duplicate emails
- All emails valid format
- Output file created successfully
</success_criteria>
</quick_start>
```
</example>
<example domain="code_generation">
**Unclear**:
```xml
<quick_start>
Write a function to process user input.
</quick_start>
```
**Clear**:
```xml
<quick_start>
<function_specification>
Write a Python function with this signature:
```python
def process_user_input(raw_input: str) -> dict:
"""
Validate and parse user input.
Args:
raw_input: Raw string from user (format: "name:email:age")
Returns:
dict with keys: name (str), email (str), age (int)
Raises:
ValueError: If input format is invalid
"""
```
**Requirements**:
- Split input on colon delimiter
- Validate email contains @ and domain
- Convert age to integer, raise ValueError if not numeric
- Return dictionary with specified keys
- Include docstring and type hints
</function_specification>
<success_criteria>
- Function signature matches specification
- All validation checks implemented
- Proper error handling for invalid input
- Type hints included
- Docstring included
</success_criteria>
</quick_start>
```
</example>
</practical_examples>

View File

@@ -1,404 +0,0 @@
# Skill Authoring Best Practices
Source: [platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices)
## Core Principles
### Concise is Key
The context window is a public good. Your Skill shares the context window with everything else Claude needs to know.
**Default assumption**: Claude is already very smart. Only add context Claude doesn't already have.
Challenge each piece of information:
- "Does Claude really need this explanation?"
- "Can I assume Claude knows this?"
- "Does this paragraph justify its token cost?"
**Good example (concise, ~50 tokens):**
```markdown
## Extract PDF text
Use pdfplumber for text extraction:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
```
**Bad example (too verbose, ~150 tokens):**
```markdown
## Extract PDF text
PDF (Portable Document Format) files are a common file format that contains
text, images, and other content. To extract text from a PDF, you'll need to
use a library. There are many libraries available...
```
### Set Appropriate Degrees of Freedom
Match specificity to task fragility and variability.
**High freedom** (multiple valid approaches):
```markdown
## Code review process
1. Analyze the code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability
4. Verify adherence to project conventions
```
**Medium freedom** (preferred pattern with variation):
```markdown
## Generate report
Use this template and customize as needed:
```python
def generate_report(data, format="markdown"):
# Process data
# Generate output in specified format
```
```
**Low freedom** (fragile, exact sequence required):
```markdown
## Database migration
Run exactly this script:
```bash
python scripts/migrate.py --verify --backup
```
Do not modify the command or add flags.
```
### Test With All Models
Skills act as additions to models. Test with Haiku, Sonnet, and Opus.
- **Haiku**: Does the Skill provide enough guidance?
- **Sonnet**: Is the Skill clear and efficient?
- **Opus**: Does the Skill avoid over-explaining?
## Naming Conventions
Use **gerund form** (verb + -ing) for Skill names:
**Good:**
- `processing-pdfs`
- `analyzing-spreadsheets`
- `managing-databases`
- `testing-code`
- `writing-documentation`
**Acceptable alternatives:**
- Noun phrases: `pdf-processing`, `spreadsheet-analysis`
- Action-oriented: `process-pdfs`, `analyze-spreadsheets`
**Avoid:**
- Vague: `helper`, `utils`, `tools`
- Generic: `documents`, `data`, `files`
- Reserved: `anthropic-*`, `claude-*`
## Writing Effective Descriptions
**Always write in third person.** The description is injected into the system prompt.
**Be specific and include key terms:**
```yaml
# PDF Processing skill
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
# Excel Analysis skill
description: Analyze Excel spreadsheets, create pivot tables, generate charts. Use when analyzing Excel files, spreadsheets, tabular data, or .xlsx files.
# Git Commit Helper skill
description: Generate descriptive commit messages by analyzing git diffs. Use when the user asks for help writing commit messages or reviewing staged changes.
```
**Avoid vague descriptions:**
```yaml
description: Helps with documents # Too vague!
description: Processes data # Too generic!
description: Does stuff with files # Useless!
```
## Progressive Disclosure Patterns
### Pattern 1: High-level guide with references
```markdown
---
name: pdf-processing
description: Extracts text and tables from PDF files, fills forms, merges documents.
---
# PDF Processing
## Quick start
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
## Advanced features
**Form filling**: See [FORMS.md](FORMS.md)
**API reference**: See [REFERENCE.md](REFERENCE.md)
**Examples**: See [EXAMPLES.md](EXAMPLES.md)
```
### Pattern 2: Domain-specific organization
```
bigquery-skill/
├── SKILL.md (overview and navigation)
└── reference/
├── finance.md (revenue, billing)
├── sales.md (opportunities, pipeline)
├── product.md (API usage, features)
└── marketing.md (campaigns, attribution)
```
### Pattern 3: Conditional details
```markdown
# DOCX Processing
## Creating documents
Use docx-js for new documents. See [DOCX-JS.md](DOCX-JS.md).
## Editing documents
For simple edits, modify the XML directly.
**For tracked changes**: See [REDLINING.md](REDLINING.md)
**For OOXML details**: See [OOXML.md](OOXML.md)
```
## Keep References One Level Deep
Claude may partially read files when they're referenced from other referenced files.
**Bad (too deep):**
```markdown
# SKILL.md
See [advanced.md](advanced.md)...
# advanced.md
See [details.md](details.md)...
# details.md
Here's the actual information...
```
**Good (one level deep):**
```markdown
# SKILL.md
**Basic usage**: [in SKILL.md]
**Advanced features**: See [advanced.md](advanced.md)
**API reference**: See [reference.md](reference.md)
**Examples**: See [examples.md](examples.md)
```
## Workflows and Feedback Loops
### Workflow with Checklist
```markdown
## Research synthesis workflow
Copy this checklist:
```
- [ ] Step 1: Read all source documents
- [ ] Step 2: Identify key themes
- [ ] Step 3: Cross-reference claims
- [ ] Step 4: Create structured summary
- [ ] Step 5: Verify citations
```
**Step 1: Read all source documents**
Review each document in `sources/`. Note main arguments.
...
```
### Feedback Loop Pattern
```markdown
## Document editing process
1. Make your edits to `word/document.xml`
2. **Validate immediately**: `python scripts/validate.py unpacked_dir/`
3. If validation fails:
- Review the error message
- Fix the issues
- Run validation again
4. **Only proceed when validation passes**
5. Rebuild: `python scripts/pack.py unpacked_dir/ output.docx`
```
## Common Patterns
### Template Pattern
```markdown
## Report structure
Use this template:
```markdown
# [Analysis Title]
## Executive summary
[One-paragraph overview]
## Key findings
- Finding 1 with supporting data
- Finding 2 with supporting data
## Recommendations
1. Specific actionable recommendation
2. Specific actionable recommendation
```
```
### Examples Pattern
```markdown
## Commit message format
**Example 1:**
Input: Added user authentication with JWT tokens
Output:
```
feat(auth): implement JWT-based authentication
Add login endpoint and token validation middleware
```
**Example 2:**
Input: Fixed bug where dates displayed incorrectly
Output:
```
fix(reports): correct date formatting in timezone conversion
```
```
### Conditional Workflow Pattern
```markdown
## Document modification
1. Determine the modification type:
**Creating new content?** → Follow "Creation workflow"
**Editing existing?** → Follow "Editing workflow"
2. Creation workflow:
- Use docx-js library
- Build document from scratch
3. Editing workflow:
- Unpack existing document
- Modify XML directly
- Validate after each change
```
## Content Guidelines
### Avoid Time-Sensitive Information
**Bad:**
```markdown
If you're doing this before August 2025, use the old API.
```
**Good:**
```markdown
## Current method
Use the v2 API endpoint: `api.example.com/v2/messages`
## Old patterns
<details>
<summary>Legacy v1 API (deprecated 2025-08)</summary>
The v1 API used: `api.example.com/v1/messages`
</details>
```
### Use Consistent Terminology
**Good - Consistent:**
- Always "API endpoint"
- Always "field"
- Always "extract"
**Bad - Inconsistent:**
- Mix "API endpoint", "URL", "API route", "path"
- Mix "field", "box", "element", "control"
## Anti-Patterns to Avoid
### Windows-Style Paths
- **Good**: `scripts/helper.py`, `reference/guide.md`
- **Avoid**: `scripts\helper.py`, `reference\guide.md`
### Too Many Options
**Bad:**
```markdown
You can use pypdf, or pdfplumber, or PyMuPDF, or pdf2image, or...
```
**Good:**
```markdown
Use pdfplumber for text extraction:
```python
import pdfplumber
```
For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
```
## Checklist for Effective Skills
### Core Quality
- [ ] Description is specific and includes key terms
- [ ] Description includes both what and when
- [ ] SKILL.md body under 500 lines
- [ ] Additional details in separate files
- [ ] No time-sensitive information
- [ ] Consistent terminology
- [ ] Examples are concrete
- [ ] References one level deep
- [ ] Progressive disclosure used appropriately
- [ ] Workflows have clear steps
### Code and Scripts
- [ ] Scripts handle errors explicitly
- [ ] No "voodoo constants" (all values justified)
- [ ] Required packages listed
- [ ] Scripts have clear documentation
- [ ] No Windows-style paths
- [ ] Validation steps for critical operations
- [ ] Feedback loops for quality-critical tasks
### Testing
- [ ] At least three test scenarios
- [ ] Tested with Haiku, Sonnet, and Opus
- [ ] Tested with real usage scenarios
- [ ] Team feedback incorporated

View File

@@ -1,595 +0,0 @@
<overview>
This reference documents common patterns for skill authoring, including templates, examples, terminology consistency, and anti-patterns. All patterns use pure XML structure.
</overview>
<template_pattern>
<description>
Provide templates for output format. Match the level of strictness to your needs.
</description>
<strict_requirements>
Use when output format must be exact and consistent:
```xml
<report_structure>
ALWAYS use this exact template structure:
```markdown
# [Analysis Title]
## Executive summary
[One-paragraph overview of key findings]
## Key findings
- Finding 1 with supporting data
- Finding 2 with supporting data
- Finding 3 with supporting data
## Recommendations
1. Specific actionable recommendation
2. Specific actionable recommendation
```
</report_structure>
```
**When to use**: Compliance reports, standardized formats, automated processing
</strict_requirements>
<flexible_guidance>
Use when Claude should adapt the format based on context:
```xml
<report_structure>
Here is a sensible default format, but use your best judgment:
```markdown
# [Analysis Title]
## Executive summary
[Overview]
## Key findings
[Adapt sections based on what you discover]
## Recommendations
[Tailor to the specific context]
```
Adjust sections as needed for the specific analysis type.
</report_structure>
```
**When to use**: Exploratory analysis, context-dependent formatting, creative tasks
</flexible_guidance>
</template_pattern>
<examples_pattern>
<description>
For skills where output quality depends on seeing examples, provide input/output pairs.
</description>
<commit_messages_example>
```xml
<objective>
Generate commit messages following conventional commit format.
</objective>
<commit_message_format>
Generate commit messages following these examples:
<example number="1">
<input>Added user authentication with JWT tokens</input>
<output>
```
feat(auth): implement JWT-based authentication
Add login endpoint and token validation middleware
```
</output>
</example>
<example number="2">
<input>Fixed bug where dates displayed incorrectly in reports</input>
<output>
```
fix(reports): correct date formatting in timezone conversion
Use UTC timestamps consistently across report generation
```
</output>
</example>
Follow this style: type(scope): brief description, then detailed explanation.
</commit_message_format>
```
</commit_messages_example>
<when_to_use>
- Output format has nuances that text explanations can't capture
- Pattern recognition is easier than rule following
- Examples demonstrate edge cases
- Multi-shot learning improves quality
</when_to_use>
</examples_pattern>
<consistent_terminology>
<principle>
Choose one term and use it throughout the skill. Inconsistent terminology confuses Claude and reduces execution quality.
</principle>
<good_example>
Consistent usage:
- Always "API endpoint" (not mixing with "URL", "API route", "path")
- Always "field" (not mixing with "box", "element", "control")
- Always "extract" (not mixing with "pull", "get", "retrieve")
```xml
<objective>
Extract data from API endpoints using field mappings.
</objective>
<quick_start>
1. Identify the API endpoint
2. Map response fields to your schema
3. Extract field values
</quick_start>
```
</good_example>
<bad_example>
Inconsistent usage creates confusion:
```xml
<objective>
Pull data from API routes using element mappings.
</objective>
<quick_start>
1. Identify the URL
2. Map response boxes to your schema
3. Retrieve control values
</quick_start>
```
Claude must now interpret: Are "API routes" and "URLs" the same? Are "fields", "boxes", "elements", and "controls" the same?
</bad_example>
<implementation>
1. Choose terminology early in skill development
2. Document key terms in `<objective>` or `<context>`
3. Use find/replace to enforce consistency
4. Review reference files for consistent usage
</implementation>
</consistent_terminology>
<provide_default_with_escape_hatch>
<principle>
Provide a default approach with an escape hatch for special cases, not a list of alternatives. Too many options paralyze decision-making.
</principle>
<good_example>
Clear default with escape hatch:
```xml
<quick_start>
Use pdfplumber for text extraction:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
</quick_start>
```
</good_example>
<bad_example>
Too many options creates decision paralysis:
```xml
<quick_start>
You can use any of these libraries:
- **pypdf**: Good for basic extraction
- **pdfplumber**: Better for tables
- **PyMuPDF**: Faster but more complex
- **pdf2image**: For scanned documents
- **pdfminer**: Low-level control
- **tabula-py**: Table-focused
Choose based on your needs.
</quick_start>
```
Claude must now research and compare all options before starting. This wastes tokens and time.
</bad_example>
<implementation>
1. Recommend ONE default approach
2. Explain when to use the default (implied: most of the time)
3. Add ONE escape hatch for edge cases
4. Link to advanced reference if multiple alternatives truly needed
</implementation>
</provide_default_with_escape_hatch>
<anti_patterns>
<description>
Common mistakes to avoid when authoring skills.
</description>
<pitfall name="markdown_headings_in_body">
**BAD**: Using markdown headings in skill body:
```markdown
# PDF Processing
## Quick start
Extract text with pdfplumber...
## Advanced features
Form filling requires additional setup...
```
**GOOD**: Using pure XML structure:
```xml
<objective>
PDF processing with text extraction, form filling, and merging capabilities.
</objective>
<quick_start>
Extract text with pdfplumber...
</quick_start>
<advanced_features>
Form filling requires additional setup...
</advanced_features>
```
**Why it matters**: XML provides semantic meaning, reliable parsing, and token efficiency.
</pitfall>
<pitfall name="vague_descriptions">
**BAD**:
```yaml
description: Helps with documents
```
**GOOD**:
```yaml
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
```
**Why it matters**: Vague descriptions prevent Claude from discovering and using the skill appropriately.
</pitfall>
<pitfall name="inconsistent_pov">
**BAD**:
```yaml
description: I can help you process Excel files and generate reports
```
**GOOD**:
```yaml
description: Processes Excel files and generates reports. Use when analyzing spreadsheets or .xlsx files.
```
**Why it matters**: Skills must use third person. First/second person breaks the skill metadata pattern.
</pitfall>
<pitfall name="wrong_naming_convention">
**BAD**: Directory name doesn't match skill name or verb-noun convention:
- Directory: `facebook-ads`, Name: `facebook-ads-manager`
- Directory: `stripe-integration`, Name: `stripe`
- Directory: `helper-scripts`, Name: `helper`
**GOOD**: Consistent verb-noun convention:
- Directory: `manage-facebook-ads`, Name: `manage-facebook-ads`
- Directory: `setup-stripe-payments`, Name: `setup-stripe-payments`
- Directory: `process-pdfs`, Name: `process-pdfs`
**Why it matters**: Consistency in naming makes skills discoverable and predictable.
</pitfall>
<pitfall name="too_many_options">
**BAD**:
```xml
<quick_start>
You can use pypdf, or pdfplumber, or PyMuPDF, or pdf2image, or pdfminer, or tabula-py...
</quick_start>
```
**GOOD**:
```xml
<quick_start>
Use pdfplumber for text extraction:
```python
import pdfplumber
```
For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
</quick_start>
```
**Why it matters**: Decision paralysis. Provide one default approach with escape hatch for special cases.
</pitfall>
<pitfall name="deeply_nested_references">
❌ **BAD**: References nested multiple levels:
```
SKILL.md → advanced.md → details.md → examples.md
```
✅ **GOOD**: References one level deep from SKILL.md:
```
SKILL.md → advanced.md
SKILL.md → details.md
SKILL.md → examples.md
```
**Why it matters**: Claude may only partially read deeply nested files. Keep references one level deep from SKILL.md.
</pitfall>
<pitfall name="windows_paths">
❌ **BAD**:
```xml
<reference_guides>
See scripts\validate.py for validation
</reference_guides>
```
**GOOD**:
```xml
<reference_guides>
See scripts/validate.py for validation
</reference_guides>
```
**Why it matters**: Always use forward slashes for cross-platform compatibility.
</pitfall>
<pitfall name="dynamic_context_and_file_reference_execution">
**Problem**: When showing examples of dynamic context syntax (exclamation mark + backticks) or file references (@ prefix), the skill loader executes these during skill loading.
**BAD** - These execute during skill load:
```xml
<examples>
Load current status with: !`git status`
Review dependencies in: @package.json
</examples>
```
**GOOD** - Add space to prevent execution:
```xml
<examples>
Load current status with: ! `git status` (remove space before backtick in actual usage)
Review dependencies in: @ package.json (remove space after @ in actual usage)
</examples>
```
**When this applies**:
- Skills that teach users about dynamic context (slash commands, prompts)
- Any documentation showing the exclamation mark prefix syntax or @ file references
- Skills with example commands or file paths that shouldn't execute during loading
**Why it matters**: Without the space, these execute during skill load, causing errors or unwanted file reads.
</pitfall>
<pitfall name="missing_required_tags">
**BAD**: Missing required tags:
```xml
<quick_start>
Use this tool for processing...
</quick_start>
```
**GOOD**: All required tags present:
```xml
<objective>
Process data files with validation and transformation.
</objective>
<quick_start>
Use this tool for processing...
</quick_start>
<success_criteria>
- Input file successfully processed
- Output file validates without errors
- Transformation applied correctly
</success_criteria>
```
**Why it matters**: Every skill must have `<objective>`, `<quick_start>`, and `<success_criteria>` (or `<when_successful>`).
</pitfall>
<pitfall name="hybrid_xml_markdown">
**BAD**: Mixing XML tags with markdown headings:
```markdown
<objective>
PDF processing capabilities
</objective>
## Quick start
Extract text with pdfplumber...
## Advanced features
Form filling...
```
**GOOD**: Pure XML throughout:
```xml
<objective>
PDF processing capabilities
</objective>
<quick_start>
Extract text with pdfplumber...
</quick_start>
<advanced_features>
Form filling...
</advanced_features>
```
**Why it matters**: Consistency in structure. Either use pure XML or pure markdown (prefer XML).
</pitfall>
<pitfall name="unclosed_xml_tags">
**BAD**: Forgetting to close XML tags:
```xml
<objective>
Process PDF files
<quick_start>
Use pdfplumber...
</quick_start>
```
**GOOD**: Properly closed tags:
```xml
<objective>
Process PDF files
</objective>
<quick_start>
Use pdfplumber...
</quick_start>
```
**Why it matters**: Unclosed tags break XML parsing and create ambiguous boundaries.
</pitfall>
</anti_patterns>
<progressive_disclosure_pattern>
<description>
Keep SKILL.md concise by linking to detailed reference files. Claude loads reference files only when needed.
</description>
<implementation>
```xml
<objective>
Manage Facebook Ads campaigns, ad sets, and ads via the Marketing API.
</objective>
<quick_start>
<basic_operations>
See [basic-operations.md](basic-operations.md) for campaign creation and management.
</basic_operations>
</quick_start>
<advanced_features>
**Custom audiences**: See [audiences.md](audiences.md)
**Conversion tracking**: See [conversions.md](conversions.md)
**Budget optimization**: See [budgets.md](budgets.md)
**API reference**: See [api-reference.md](api-reference.md)
</advanced_features>
```
**Benefits**:
- SKILL.md stays under 500 lines
- Claude only reads relevant reference files
- Token usage scales with task complexity
- Easier to maintain and update
</implementation>
</progressive_disclosure_pattern>
<validation_pattern>
<description>
For skills with validation steps, make validation scripts verbose and specific.
</description>
<implementation>
```xml
<validation>
After making changes, validate immediately:
```bash
python scripts/validate.py output_dir/
```
If validation fails, fix errors before continuing. Validation errors include:
- **Field not found**: "Field 'signature_date' not found. Available fields: customer_name, order_total, signature_date_signed"
- **Type mismatch**: "Field 'order_total' expects number, got string"
- **Missing required field**: "Required field 'customer_name' is missing"
Only proceed when validation passes with zero errors.
</validation>
```
**Why verbose errors help**:
- Claude can fix issues without guessing
- Specific error messages reduce iteration cycles
- Available options shown in error messages
</implementation>
</validation_pattern>
<checklist_pattern>
<description>
For complex multi-step workflows, provide a checklist Claude can copy and track progress.
</description>
<implementation>
```xml
<workflow>
Copy this checklist and check off items as you complete them:
```
Task Progress:
- [ ] Step 1: Analyze the form (run analyze_form.py)
- [ ] Step 2: Create field mapping (edit fields.json)
- [ ] Step 3: Validate mapping (run validate_fields.py)
- [ ] Step 4: Fill the form (run fill_form.py)
- [ ] Step 5: Verify output (run verify_output.py)
```
<step_1>
**Analyze the form**
Run: `python scripts/analyze_form.py input.pdf`
This extracts form fields and their locations, saving to `fields.json`.
</step_1>
<step_2>
**Create field mapping**
Edit `fields.json` to add values for each field.
</step_2>
<step_3>
**Validate mapping**
Run: `python scripts/validate_fields.py fields.json`
Fix any validation errors before continuing.
</step_3>
<step_4>
**Fill the form**
Run: `python scripts/fill_form.py input.pdf fields.json output.pdf`
</step_4>
<step_5>
**Verify output**
Run: `python scripts/verify_output.py output.pdf`
If verification fails, return to Step 2.
</step_5>
</workflow>
```
**Benefits**:
- Clear progress tracking
- Prevents skipping steps
- Easy to resume after interruption
</implementation>
</checklist_pattern>

View File

@@ -1,437 +0,0 @@
<overview>
Core principles guide skill authoring decisions. These principles ensure skills are efficient, effective, and maintainable across different models and use cases.
</overview>
<xml_structure_principle>
<description>
Skills use pure XML structure for consistent parsing, efficient token usage, and improved Claude performance.
</description>
<why_xml>
<consistency>
XML enforces consistent structure across all skills. All skills use the same tag names for the same purposes:
- `<objective>` always defines what the skill does
- `<quick_start>` always provides immediate guidance
- `<success_criteria>` always defines completion
This consistency makes skills predictable and easier to maintain.
</consistency>
<parseability>
XML provides unambiguous boundaries and semantic meaning. Claude can reliably:
- Identify section boundaries (where content starts and ends)
- Understand content purpose (what role each section plays)
- Skip irrelevant sections (progressive disclosure)
- Parse programmatically (validation tools can check structure)
Markdown headings are just visual formatting. Claude must infer meaning from heading text, which is less reliable.
</parseability>
<token_efficiency>
XML tags are more efficient than markdown headings:
**Markdown headings**:
```markdown
## Quick start
## Workflow
## Advanced features
## Success criteria
```
Total: ~20 tokens, no semantic meaning to Claude
**XML tags**:
```xml
<quick_start>
<workflow>
<advanced_features>
<success_criteria>
```
Total: ~15 tokens, semantic meaning built-in
Savings compound across all skills in the ecosystem.
</token_efficiency>
<claude_performance>
Claude performs better with pure XML because:
- Unambiguous section boundaries reduce parsing errors
- Semantic tags convey intent directly (no inference needed)
- Nested tags create clear hierarchies
- Consistent structure across skills reduces cognitive load
- Progressive disclosure works more reliably
Pure XML structure is not just a style preference—it's a performance optimization.
</claude_performance>
</why_xml>
<critical_rule>
**Remove ALL markdown headings (#, ##, ###) from skill body content.** Replace with semantic XML tags. Keep markdown formatting WITHIN content (bold, italic, lists, code blocks, links).
</critical_rule>
<required_tags>
Every skill MUST have:
- `<objective>` - What the skill does and why it matters
- `<quick_start>` - Immediate, actionable guidance
- `<success_criteria>` or `<when_successful>` - How to know it worked
See [use-xml-tags.md](use-xml-tags.md) for conditional tags and intelligence rules.
</required_tags>
</xml_structure_principle>
<conciseness_principle>
<description>
The context window is shared. Your skill shares it with the system prompt, conversation history, other skills' metadata, and the actual request.
</description>
<guidance>
Only add context Claude doesn't already have. Challenge each piece of information:
- "Does Claude really need this explanation?"
- "Can I assume Claude knows this?"
- "Does this paragraph justify its token cost?"
Assume Claude is smart. Don't explain obvious concepts.
</guidance>
<concise_example>
**Concise** (~50 tokens):
```xml
<quick_start>
Extract PDF text with pdfplumber:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
</quick_start>
```
**Verbose** (~150 tokens):
```xml
<quick_start>
PDF files are a common file format used for documents. To extract text from them, we'll use a Python library called pdfplumber. First, you'll need to import the library, then open the PDF file using the open method, and finally extract the text from each page. Here's how to do it:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
This code opens the PDF and extracts text from the first page.
</quick_start>
```
The concise version assumes Claude knows what PDFs are, understands Python imports, and can read code. All those assumptions are correct.
</concise_example>
<when_to_elaborate>
Add explanation when:
- Concept is domain-specific (not general programming knowledge)
- Pattern is non-obvious or counterintuitive
- Context affects behavior in subtle ways
- Trade-offs require judgment
Don't add explanation for:
- Common programming concepts (loops, functions, imports)
- Standard library usage (reading files, making HTTP requests)
- Well-known tools (git, npm, pip)
- Obvious next steps
</when_to_elaborate>
</conciseness_principle>
<degrees_of_freedom_principle>
<description>
Match the level of specificity to the task's fragility and variability. Give Claude more freedom for creative tasks, less freedom for fragile operations.
</description>
<high_freedom>
<when>
- Multiple approaches are valid
- Decisions depend on context
- Heuristics guide the approach
- Creative solutions welcome
</when>
<example>
```xml
<objective>
Review code for quality, bugs, and maintainability.
</objective>
<workflow>
1. Analyze the code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability and maintainability
4. Verify adherence to project conventions
</workflow>
<success_criteria>
- All major issues identified
- Suggestions are actionable and specific
- Review balances praise and criticism
</success_criteria>
```
Claude has freedom to adapt the review based on what the code needs.
</example>
</high_freedom>
<medium_freedom>
<when>
- A preferred pattern exists
- Some variation is acceptable
- Configuration affects behavior
- Template can be adapted
</when>
<example>
```xml
<objective>
Generate reports with customizable format and sections.
</objective>
<report_template>
Use this template and customize as needed:
```python
def generate_report(data, format="markdown", include_charts=True):
# Process data
# Generate output in specified format
# Optionally include visualizations
```
</report_template>
<success_criteria>
- Report includes all required sections
- Format matches user preference
- Data accurately represented
</success_criteria>
```
Claude can customize the template based on requirements.
</example>
</medium_freedom>
<low_freedom>
<when>
- Operations are fragile and error-prone
- Consistency is critical
- A specific sequence must be followed
- Deviation causes failures
</when>
<example>
```xml
<objective>
Run database migration with exact sequence to prevent data loss.
</objective>
<workflow>
Run exactly this script:
```bash
python scripts/migrate.py --verify --backup
```
**Do not modify the command or add additional flags.**
</workflow>
<success_criteria>
- Migration completes without errors
- Backup created before migration
- Verification confirms data integrity
</success_criteria>
```
Claude must follow the exact command with no variation.
</example>
</low_freedom>
<matching_specificity>
The key is matching specificity to fragility:
- **Fragile operations** (database migrations, payment processing, security): Low freedom, exact instructions
- **Standard operations** (API calls, file processing, data transformation): Medium freedom, preferred pattern with flexibility
- **Creative operations** (code review, content generation, analysis): High freedom, heuristics and principles
Mismatched specificity causes problems:
- Too much freedom on fragile tasks → errors and failures
- Too little freedom on creative tasks → rigid, suboptimal outputs
</matching_specificity>
</degrees_of_freedom_principle>
<model_testing_principle>
<description>
Skills act as additions to models, so effectiveness depends on the underlying model. What works for Opus might need more detail for Haiku.
</description>
<testing_across_models>
Test your skill with all models you plan to use:
<haiku_testing>
**Claude Haiku** (fast, economical)
Questions to ask:
- Does the skill provide enough guidance?
- Are examples clear and complete?
- Do implicit assumptions become explicit?
- Does Haiku need more structure?
Haiku benefits from:
- More explicit instructions
- Complete examples (no partial code)
- Clear success criteria
- Step-by-step workflows
</haiku_testing>
<sonnet_testing>
**Claude Sonnet** (balanced)
Questions to ask:
- Is the skill clear and efficient?
- Does it avoid over-explanation?
- Are workflows well-structured?
- Does progressive disclosure work?
Sonnet benefits from:
- Balanced detail level
- XML structure for clarity
- Progressive disclosure
- Concise but complete guidance
</sonnet_testing>
<opus_testing>
**Claude Opus** (powerful reasoning)
Questions to ask:
- Does the skill avoid over-explaining?
- Can Opus infer obvious steps?
- Are constraints clear?
- Is context minimal but sufficient?
Opus benefits from:
- Concise instructions
- Principles over procedures
- High degrees of freedom
- Trust in reasoning capabilities
</opus_testing>
</testing_across_models>
<balancing_across_models>
Aim for instructions that work well across all target models:
**Good balance**:
```xml
<quick_start>
Use pdfplumber for text extraction:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
For scanned PDFs requiring OCR, use pdf2image with pytesseract instead.
</quick_start>
```
This works for all models:
- Haiku gets complete working example
- Sonnet gets clear default with escape hatch
- Opus gets enough context without over-explanation
**Too minimal for Haiku**:
```xml
<quick_start>
Use pdfplumber for text extraction.
</quick_start>
```
**Too verbose for Opus**:
```xml
<quick_start>
PDF files are documents that contain text. To extract that text, we use a library called pdfplumber. First, import the library at the top of your Python file. Then, open the PDF file using the pdfplumber.open() method. This returns a PDF object. Access the pages attribute to get a list of pages. Each page has an extract_text() method that returns the text content...
</quick_start>
```
</balancing_across_models>
<iterative_improvement>
1. Start with medium detail level
2. Test with target models
3. Observe where models struggle or succeed
4. Adjust based on actual performance
5. Re-test and iterate
Don't optimize for one model. Find the balance that works across your target models.
</iterative_improvement>
</model_testing_principle>
<progressive_disclosure_principle>
<description>
SKILL.md serves as an overview. Reference files contain details. Claude loads reference files only when needed.
</description>
<token_efficiency>
Progressive disclosure keeps token usage proportional to task complexity:
- Simple task: Load SKILL.md only (~500 tokens)
- Medium task: Load SKILL.md + one reference (~1000 tokens)
- Complex task: Load SKILL.md + multiple references (~2000 tokens)
Without progressive disclosure, every task loads all content regardless of need.
</token_efficiency>
<implementation>
- Keep SKILL.md under 500 lines
- Split detailed content into reference files
- Keep references one level deep from SKILL.md
- Link to references from relevant sections
- Use descriptive reference file names
See [skill-structure.md](skill-structure.md) for progressive disclosure patterns.
</implementation>
</progressive_disclosure_principle>
<validation_principle>
<description>
Validation scripts are force multipliers. They catch errors that Claude might miss and provide actionable feedback.
</description>
<characteristics>
Good validation scripts:
- Provide verbose, specific error messages
- Show available valid options when something is invalid
- Pinpoint exact location of problems
- Suggest actionable fixes
- Are deterministic and reliable
See [workflows-and-validation.md](workflows-and-validation.md) for validation patterns.
</characteristics>
</validation_principle>
<principle_summary>
<xml_structure>
Use pure XML structure for consistency, parseability, and Claude performance. Required tags: objective, quick_start, success_criteria.
</xml_structure>
<conciseness>
Only add context Claude doesn't have. Assume Claude is smart. Challenge every piece of content.
</conciseness>
<degrees_of_freedom>
Match specificity to fragility. High freedom for creative tasks, low freedom for fragile operations, medium for standard work.
</degrees_of_freedom>
<model_testing>
Test with all target models. Balance detail level to work across Haiku, Sonnet, and Opus.
</model_testing>
<progressive_disclosure>
Keep SKILL.md concise. Split details into reference files. Load reference files only when needed.
</progressive_disclosure>
<validation>
Make validation scripts verbose and specific. Catch errors early with actionable feedback.
</validation>
</principle_summary>

View File

@@ -1,175 +0,0 @@
<when_to_use_scripts>
Even if Claude could write a script, pre-made scripts offer advantages:
- More reliable than generated code
- Save tokens (no need to include code in context)
- Save time (no code generation required)
- Ensure consistency across uses
<execution_vs_reference>
Make clear whether Claude should:
- **Execute the script** (most common): "Run `analyze_form.py` to extract fields"
- **Read it as reference** (for complex logic): "See `analyze_form.py` for the extraction algorithm"
For most utility scripts, execution is preferred.
</execution_vs_reference>
<how_scripts_work>
When Claude executes a script via bash:
1. Script code never enters context window
2. Only script output consumes tokens
3. Far more efficient than having Claude generate equivalent code
</how_scripts_work>
</when_to_use_scripts>
<file_organization>
<scripts_directory>
**Best practice**: Place all executable scripts in a `scripts/` subdirectory within the skill folder.
```
skill-name/
├── SKILL.md
├── scripts/
│ ├── main_utility.py
│ ├── helper_script.py
│ └── validator.py
└── references/
└── api-docs.md
```
**Benefits**:
- Keeps skill root clean and organized
- Clear separation between documentation and executable code
- Consistent pattern across all skills
- Easy to reference: `python scripts/script_name.py`
**Reference pattern**: In SKILL.md, reference scripts using the `scripts/` path:
```bash
python ~/.claude/skills/skill-name/scripts/analyze.py input.har
```
</scripts_directory>
</file_organization>
<utility_scripts_pattern>
<example>
## Utility scripts
**analyze_form.py**: Extract all form fields from PDF
```bash
python scripts/analyze_form.py input.pdf > fields.json
```
Output format:
```json
{
"field_name": { "type": "text", "x": 100, "y": 200 },
"signature": { "type": "sig", "x": 150, "y": 500 }
}
```
**validate_boxes.py**: Check for overlapping bounding boxes
```bash
python scripts/validate_boxes.py fields.json
# Returns: "OK" or lists conflicts
```
**fill_form.py**: Apply field values to PDF
```bash
python scripts/fill_form.py input.pdf fields.json output.pdf
```
</example>
</utility_scripts_pattern>
<solve_dont_punt>
Handle error conditions rather than punting to Claude.
<example type="good">
```python
def process_file(path):
"""Process a file, creating it if it doesn't exist."""
try:
with open(path) as f:
return f.read()
except FileNotFoundError:
print(f"File {path} not found, creating default")
with open(path, 'w') as f:
f.write('')
return ''
except PermissionError:
print(f"Cannot access {path}, using default")
return ''
```
</example>
<example type="bad">
```python
def process_file(path):
# Just fail and let Claude figure it out
return open(path).read()
```
</example>
<configuration_values>
Document configuration parameters to avoid "voodoo constants":
<example type="good">
```python
# HTTP requests typically complete within 30 seconds
REQUEST_TIMEOUT = 30
# Three retries balances reliability vs speed
MAX_RETRIES = 3
```
</example>
<example type="bad">
```python
TIMEOUT = 47 # Why 47?
RETRIES = 5 # Why 5?
```
</example>
</configuration_values>
</solve_dont_punt>
<package_dependencies>
<runtime_constraints>
Skills run in code execution environment with platform-specific limitations:
- **claude.ai**: Can install packages from npm and PyPI
- **Anthropic API**: No network access and no runtime package installation
</runtime_constraints>
<guidance>
List required packages in your SKILL.md and verify they're available.
<example type="good">
Install required package: `pip install pypdf`
Then use it:
```python
from pypdf import PdfReader
reader = PdfReader("file.pdf")
```
</example>
<example type="bad">
"Use the pdf library to process the file."
</example>
</guidance>
</package_dependencies>
<mcp_tool_references>
If your Skill uses MCP (Model Context Protocol) tools, always use fully qualified tool names.
<format>ServerName:tool_name</format>
<examples>
- Use the BigQuery:bigquery_schema tool to retrieve table schemas.
- Use the GitHub:create_issue tool to create issues.
</examples>
Without the server prefix, Claude may fail to locate the tool, especially when multiple MCP servers are available.
</mcp_tool_references>

View File

@@ -1,474 +0,0 @@
<overview>
Skills improve through iteration and testing. This reference covers evaluation-driven development, Claude A/B testing patterns, and XML structure validation during testing.
</overview>
<evaluation_driven_development>
<principle>
Create evaluations BEFORE writing extensive documentation. This ensures your skill solves real problems rather than documenting imagined ones.
</principle>
<workflow>
<step_1>
**Identify gaps**: Run Claude on representative tasks without a skill. Document specific failures or missing context.
</step_1>
<step_2>
**Create evaluations**: Build three scenarios that test these gaps.
</step_2>
<step_3>
**Establish baseline**: Measure Claude's performance without the skill.
</step_3>
<step_4>
**Write minimal instructions**: Create just enough content to address the gaps and pass evaluations.
</step_4>
<step_5>
**Iterate**: Execute evaluations, compare against baseline, and refine.
</step_5>
</workflow>
<evaluation_structure>
```json
{
"skills": ["pdf-processing"],
"query": "Extract all text from this PDF file and save it to output.txt",
"files": ["test-files/document.pdf"],
"expected_behavior": [
"Successfully reads the PDF file using appropriate library",
"Extracts text content from all pages without missing any",
"Saves extracted text to output.txt in clear, readable format"
]
}
```
</evaluation_structure>
<why_evaluations_first>
- Prevents documenting imagined problems
- Forces clarity about what success looks like
- Provides objective measurement of skill effectiveness
- Keeps skill focused on actual needs
- Enables quantitative improvement tracking
</why_evaluations_first>
</evaluation_driven_development>
<iterative_development_with_claude>
<principle>
The most effective skill development uses Claude itself. Work with "Claude A" (expert who helps refine) to create skills used by "Claude B" (agent executing tasks).
</principle>
<creating_skills>
<workflow>
<step_1>
**Complete task without skill**: Work through problem with Claude A, noting what context you repeatedly provide.
</step_1>
<step_2>
**Ask Claude A to create skill**: "Create a skill that captures this pattern we just used"
</step_2>
<step_3>
**Review for conciseness**: Remove unnecessary explanations.
</step_3>
<step_4>
**Improve architecture**: Organize content with progressive disclosure.
</step_4>
<step_5>
**Test with Claude B**: Use fresh instance to test on real tasks.
</step_5>
<step_6>
**Iterate based on observation**: Return to Claude A with specific issues observed.
</step_6>
</workflow>
<insight>
Claude models understand skill format natively. Simply ask Claude to create a skill and it will generate properly structured SKILL.md content.
</insight>
</creating_skills>
<improving_skills>
<workflow>
<step_1>
**Use skill in real workflows**: Give Claude B actual tasks.
</step_1>
<step_2>
**Observe behavior**: Where does it struggle, succeed, or make unexpected choices?
</step_2>
<step_3>
**Return to Claude A**: Share observations and current SKILL.md.
</step_3>
<step_4>
**Review suggestions**: Claude A might suggest reorganization, stronger language, or workflow restructuring.
</step_4>
<step_5>
**Apply and test**: Update skill and test again.
</step_5>
<step_6>
**Repeat**: Continue based on real usage, not assumptions.
</step_6>
</workflow>
<what_to_watch_for>
- **Unexpected exploration paths**: Structure might not be intuitive
- **Missed connections**: Links might need to be more explicit
- **Overreliance on sections**: Consider moving frequently-read content to main SKILL.md
- **Ignored content**: Poorly signaled or unnecessary files
- **Critical metadata**: The name and description in your skill's metadata are critical for discovery
</what_to_watch_for>
</improving_skills>
</iterative_development_with_claude>
<model_testing>
<principle>
Test with all models you plan to use. Different models have different strengths and need different levels of detail.
</principle>
<haiku_testing>
**Claude Haiku** (fast, economical)
Questions to ask:
- Does the skill provide enough guidance?
- Are examples clear and complete?
- Do implicit assumptions become explicit?
- Does Haiku need more structure?
Haiku benefits from:
- More explicit instructions
- Complete examples (no partial code)
- Clear success criteria
- Step-by-step workflows
</haiku_testing>
<sonnet_testing>
**Claude Sonnet** (balanced)
Questions to ask:
- Is the skill clear and efficient?
- Does it avoid over-explanation?
- Are workflows well-structured?
- Does progressive disclosure work?
Sonnet benefits from:
- Balanced detail level
- XML structure for clarity
- Progressive disclosure
- Concise but complete guidance
</sonnet_testing>
<opus_testing>
**Claude Opus** (powerful reasoning)
Questions to ask:
- Does the skill avoid over-explaining?
- Can Opus infer obvious steps?
- Are constraints clear?
- Is context minimal but sufficient?
Opus benefits from:
- Concise instructions
- Principles over procedures
- High degrees of freedom
- Trust in reasoning capabilities
</opus_testing>
<balancing_across_models>
What works for Opus might need more detail for Haiku. Aim for instructions that work well across all target models. Find the balance that serves your target audience.
See [core-principles.md](core-principles.md) for model testing examples.
</balancing_across_models>
</model_testing>
<xml_structure_validation>
<principle>
During testing, validate that your skill's XML structure is correct and complete.
</principle>
<validation_checklist>
After updating a skill, verify:
<required_tags_present>
-`<objective>` tag exists and defines what skill does
-`<quick_start>` tag exists with immediate guidance
-`<success_criteria>` or `<when_successful>` tag exists
</required_tags_present>
<no_markdown_headings>
- ✅ No `#`, `##`, or `###` headings in skill body
- ✅ All sections use XML tags instead
- ✅ Markdown formatting within tags is preserved (bold, italic, lists, code blocks)
</no_markdown_headings>
<proper_xml_nesting>
- ✅ All XML tags properly closed
- ✅ Nested tags have correct hierarchy
- ✅ No unclosed tags
</proper_xml_nesting>
<conditional_tags_appropriate>
- ✅ Conditional tags match skill complexity
- ✅ Simple skills use required tags only
- ✅ Complex skills add appropriate conditional tags
- ✅ No over-engineering or under-specifying
</conditional_tags_appropriate>
<reference_files_check>
- ✅ Reference files also use pure XML structure
- ✅ Links to reference files are correct
- ✅ References are one level deep from SKILL.md
</reference_files_check>
</validation_checklist>
<testing_xml_during_iteration>
When iterating on a skill:
1. Make changes to XML structure
2. **Validate XML structure** (check tags, nesting, completeness)
3. Test with Claude on representative tasks
4. Observe if XML structure aids or hinders Claude's understanding
5. Iterate structure based on actual performance
</testing_xml_during_iteration>
</xml_structure_validation>
<observation_based_iteration>
<principle>
Iterate based on what you observe, not what you assume. Real usage reveals issues assumptions miss.
</principle>
<observation_categories>
<what_claude_reads>
Which sections does Claude actually read? Which are ignored? This reveals:
- Relevance of content
- Effectiveness of progressive disclosure
- Whether section names are clear
</what_claude_reads>
<where_claude_struggles>
Which tasks cause confusion or errors? This reveals:
- Missing context
- Unclear instructions
- Insufficient examples
- Ambiguous requirements
</where_claude_struggles>
<where_claude_succeeds>
Which tasks go smoothly? This reveals:
- Effective patterns
- Good examples
- Clear instructions
- Appropriate detail level
</where_claude_succeeds>
<unexpected_behaviors>
What does Claude do that surprises you? This reveals:
- Unstated assumptions
- Ambiguous phrasing
- Missing constraints
- Alternative interpretations
</unexpected_behaviors>
</observation_categories>
<iteration_pattern>
1. **Observe**: Run Claude on real tasks with current skill
2. **Document**: Note specific issues, not general feelings
3. **Hypothesize**: Why did this issue occur?
4. **Fix**: Make targeted changes to address specific issues
5. **Test**: Verify fix works on same scenario
6. **Validate**: Ensure fix doesn't break other scenarios
7. **Repeat**: Continue with next observed issue
</iteration_pattern>
</observation_based_iteration>
<progressive_refinement>
<principle>
Skills don't need to be perfect initially. Start minimal, observe usage, add what's missing.
</principle>
<initial_version>
Start with:
- Valid YAML frontmatter
- Required XML tags: objective, quick_start, success_criteria
- Minimal working example
- Basic success criteria
Skip initially:
- Extensive examples
- Edge case documentation
- Advanced features
- Detailed reference files
</initial_version>
<iteration_additions>
Add through iteration:
- Examples when patterns aren't clear from description
- Edge cases when observed in real usage
- Advanced features when users need them
- Reference files when SKILL.md approaches 500 lines
- Validation scripts when errors are common
</iteration_additions>
<benefits>
- Faster to initial working version
- Additions solve real needs, not imagined ones
- Keeps skills focused and concise
- Progressive disclosure emerges naturally
- Documentation stays aligned with actual usage
</benefits>
</progressive_refinement>
<testing_discovery>
<principle>
Test that Claude can discover and use your skill when appropriate.
</principle>
<discovery_testing>
<test_description>
Test if Claude loads your skill when it should:
1. Start fresh conversation (Claude B)
2. Ask question that should trigger skill
3. Check if skill was loaded
4. Verify skill was used appropriately
</test_description>
<description_quality>
If skill isn't discovered:
- Check description includes trigger keywords
- Verify description is specific, not vague
- Ensure description explains when to use skill
- Test with different phrasings of the same request
The description is Claude's primary discovery mechanism.
</description_quality>
</discovery_testing>
</testing_discovery>
<common_iteration_patterns>
<pattern name="too_verbose">
**Observation**: Skill works but uses lots of tokens
**Fix**:
- Remove obvious explanations
- Assume Claude knows common concepts
- Use examples instead of lengthy descriptions
- Move advanced content to reference files
</pattern>
<pattern name="too_minimal">
**Observation**: Claude makes incorrect assumptions or misses steps
**Fix**:
- Add explicit instructions where assumptions fail
- Provide complete working examples
- Define edge cases
- Add validation steps
</pattern>
<pattern name="poor_discovery">
**Observation**: Skill exists but Claude doesn't load it when needed
**Fix**:
- Improve description with specific triggers
- Add relevant keywords
- Test description against actual user queries
- Make description more specific about use cases
</pattern>
<pattern name="unclear_structure">
**Observation**: Claude reads wrong sections or misses relevant content
**Fix**:
- Use clearer XML tag names
- Reorganize content hierarchy
- Move frequently-needed content earlier
- Add explicit links to relevant sections
</pattern>
<pattern name="incomplete_examples">
**Observation**: Claude produces outputs that don't match expected pattern
**Fix**:
- Add more examples showing pattern
- Make examples more complete
- Show edge cases in examples
- Add anti-pattern examples (what not to do)
</pattern>
</common_iteration_patterns>
<iteration_velocity>
<principle>
Small, frequent iterations beat large, infrequent rewrites.
</principle>
<fast_iteration>
**Good approach**:
1. Make one targeted change
2. Test on specific scenario
3. Verify improvement
4. Commit change
5. Move to next issue
Total time: Minutes per iteration
Iterations per day: 10-20
Learning rate: High
</fast_iteration>
<slow_iteration>
**Problematic approach**:
1. Accumulate many issues
2. Make large refactor
3. Test everything at once
4. Debug multiple issues simultaneously
5. Hard to know what fixed what
Total time: Hours per iteration
Iterations per day: 1-2
Learning rate: Low
</slow_iteration>
<benefits_of_fast_iteration>
- Isolate cause and effect
- Build pattern recognition faster
- Less wasted work from wrong directions
- Easier to revert if needed
- Maintains momentum
</benefits_of_fast_iteration>
</iteration_velocity>
<success_metrics>
<principle>
Define how you'll measure if the skill is working. Quantify success.
</principle>
<objective_metrics>
- **Success rate**: Percentage of tasks completed correctly
- **Token usage**: Average tokens consumed per task
- **Iteration count**: How many tries to get correct output
- **Error rate**: Percentage of tasks with errors
- **Discovery rate**: How often skill loads when it should
</objective_metrics>
<subjective_metrics>
- **Output quality**: Does output meet requirements?
- **Appropriate detail**: Too verbose or too minimal?
- **Claude confidence**: Does Claude seem uncertain?
- **User satisfaction**: Does skill solve the actual problem?
</subjective_metrics>
<tracking_improvement>
Compare metrics before and after changes:
- Baseline: Measure without skill
- Initial: Measure with first version
- Iteration N: Measure after each change
Track which changes improve which metrics. Double down on effective patterns.
</tracking_improvement>
</success_metrics>

View File

@@ -1,134 +0,0 @@
# Official Skill Specification (2026)
Source: [code.claude.com/docs/en/skills](https://code.claude.com/docs/en/skills)
## Commands and Skills Are Merged
Custom slash commands have been merged into skills. A file at `.claude/commands/review.md` and a skill at `.claude/skills/review/SKILL.md` both create `/review` and work the same way. Existing `.claude/commands/` files keep working. Skills add optional features: a directory for supporting files, frontmatter to control invocation, and automatic context loading.
If a skill and a command share the same name, the skill takes precedence.
## SKILL.md File Structure
Every skill requires a `SKILL.md` file with YAML frontmatter followed by standard markdown instructions.
```markdown
---
name: your-skill-name
description: What it does and when to use it
---
# Your Skill Name
## Instructions
Clear, step-by-step guidance.
## Examples
Concrete examples of using this skill.
```
## Complete Frontmatter Reference
All fields are optional. Only `description` is recommended.
| Field | Required | Description |
|-------|----------|-------------|
| `name` | No | Display name. Lowercase letters, numbers, hyphens only (max 64 chars). Defaults to directory name if omitted. |
| `description` | Recommended | What it does AND when to use it (max 1024 chars). Claude uses this to decide when to apply the skill. |
| `argument-hint` | No | Hint shown during autocomplete. Example: `[issue-number]` or `[filename] [format]` |
| `disable-model-invocation` | No | Set `true` to prevent Claude from auto-loading. Use for manual workflows. Default: `false` |
| `user-invocable` | No | Set `false` to hide from `/` menu. Use for background knowledge. Default: `true` |
| `allowed-tools` | No | Tools Claude can use without permission prompts. Example: `Read, Bash(git *)` |
| `model` | No | Model to use: `haiku`, `sonnet`, or `opus` |
| `context` | No | Set `fork` to run in isolated subagent context |
| `agent` | No | Subagent type when `context: fork`. Options: `Explore`, `Plan`, `general-purpose`, or custom agent name |
| `hooks` | No | Hooks scoped to this skill's lifecycle |
## Invocation Control
| Frontmatter | User can invoke | Claude can invoke | When loaded into context |
|-------------|----------------|-------------------|--------------------------|
| (default) | Yes | Yes | Description always in context, full skill loads when invoked |
| `disable-model-invocation: true` | Yes | No | Description not in context, full skill loads when you invoke |
| `user-invocable: false` | No | Yes | Description always in context, full skill loads when invoked |
## Skill Locations & Priority
```
Enterprise (highest priority) → Personal → Project → Plugin (lowest priority)
```
| Type | Path | Applies to |
|------|------|-----------|
| Enterprise | See managed settings | All users in organization |
| Personal | `~/.claude/skills/<name>/SKILL.md` | You, across all projects |
| Project | `.claude/skills/<name>/SKILL.md` | Anyone working in repository |
| Plugin | `<plugin>/skills/<name>/SKILL.md` | Where plugin is enabled |
Plugin skills use a `plugin-name:skill-name` namespace, so they cannot conflict with other levels.
## How Skills Work
1. **Discovery**: Claude loads only name and description at startup (2% of context window budget)
2. **Activation**: When your request matches a skill's description, Claude loads the full content
3. **Execution**: Claude follows the skill's instructions
## String Substitutions
| Variable | Description |
|----------|-------------|
| `$ARGUMENTS` | All arguments passed when invoking |
| `$ARGUMENTS[N]` | Specific argument by 0-based index |
| `$N` | Shorthand for `$ARGUMENTS[N]` |
| `${CLAUDE_SESSION_ID}` | Current session ID |
## Dynamic Context Injection
The `` !`command` `` syntax runs shell commands before content is sent to Claude:
```markdown
## Context
- Current branch: !`git branch --show-current`
- PR diff: !`gh pr diff`
```
Commands execute immediately and their output replaces the placeholder. Claude only sees the final result.
## Progressive Disclosure
```
my-skill/
├── SKILL.md # Entry point (required)
├── reference.md # Detailed docs (loaded when needed)
├── examples.md # Usage examples (loaded when needed)
└── scripts/
└── helper.py # Utility script (executed, not loaded)
```
Keep SKILL.md under 500 lines. Link to supporting files:
```markdown
For API details, see [reference.md](reference.md).
```
## Running in a Subagent
Add `context: fork` to run in isolation:
```yaml
---
name: deep-research
description: Research a topic thoroughly
context: fork
agent: Explore
---
Research $ARGUMENTS thoroughly...
```
The skill content becomes the subagent's prompt. It won't have access to conversation history.
## Distribution
- **Project skills**: Commit `.claude/skills/` to version control
- **Plugins**: Add `skills/` directory to plugin
- **Enterprise**: Deploy organization-wide through managed settings

View File

@@ -1,168 +0,0 @@
# Recommended Skill Structure
The optimal structure for complex skills separates routing, workflows, and knowledge.
<structure>
```
skill-name/
├── SKILL.md # Router + essential principles (unavoidable)
├── workflows/ # Step-by-step procedures (how)
│ ├── workflow-a.md
│ ├── workflow-b.md
│ └── ...
└── references/ # Domain knowledge (what)
├── reference-a.md
├── reference-b.md
└── ...
```
</structure>
<why_this_works>
## Problems This Solves
**Problem 1: Context gets skipped**
When important principles are in a separate file, Claude may not read them.
**Solution:** Put essential principles directly in SKILL.md. They load automatically.
**Problem 2: Wrong context loaded**
A "build" task loads debugging references. A "debug" task loads build references.
**Solution:** Intake question determines intent → routes to specific workflow → workflow specifies which references to read.
**Problem 3: Monolithic skills are overwhelming**
500+ lines of mixed content makes it hard to find relevant parts.
**Solution:** Small router (SKILL.md) + focused workflows + reference library.
**Problem 4: Procedures mixed with knowledge**
"How to do X" mixed with "What X means" creates confusion.
**Solution:** Workflows are procedures (steps). References are knowledge (patterns, examples).
</why_this_works>
<skill_md_template>
## SKILL.md Template
```markdown
---
name: skill-name
description: What it does and when to use it.
---
<essential_principles>
## How This Skill Works
[Inline principles that apply to ALL workflows. Cannot be skipped.]
### Principle 1: [Name]
[Brief explanation]
### Principle 2: [Name]
[Brief explanation]
</essential_principles>
<intake>
**Ask the user:**
What would you like to do?
1. [Option A]
2. [Option B]
3. [Option C]
4. Something else
**Wait for response before proceeding.**
</intake>
<routing>
| Response | Workflow |
|----------|----------|
| 1, "keyword", "keyword" | `workflows/option-a.md` |
| 2, "keyword", "keyword" | `workflows/option-b.md` |
| 3, "keyword", "keyword" | `workflows/option-c.md` |
| 4, other | Clarify, then select |
**After reading the workflow, follow it exactly.**
</routing>
<reference_index>
All domain knowledge in `references/`:
**Category A:** file-a.md, file-b.md
**Category B:** file-c.md, file-d.md
</reference_index>
<workflows_index>
| Workflow | Purpose |
|----------|---------|
| option-a.md | [What it does] |
| option-b.md | [What it does] |
| option-c.md | [What it does] |
</workflows_index>
```
</skill_md_template>
<workflow_template>
## Workflow Template
```markdown
# Workflow: [Name]
<required_reading>
**Read these reference files NOW:**
1. references/relevant-file.md
2. references/another-file.md
</required_reading>
<process>
## Step 1: [Name]
[What to do]
## Step 2: [Name]
[What to do]
## Step 3: [Name]
[What to do]
</process>
<success_criteria>
This workflow is complete when:
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3
</success_criteria>
```
</workflow_template>
<when_to_use_this_pattern>
## When to Use This Pattern
**Use router + workflows + references when:**
- Multiple distinct workflows (build vs debug vs ship)
- Different workflows need different references
- Essential principles must not be skipped
- Skill has grown beyond 200 lines
**Use simple single-file skill when:**
- One workflow
- Small reference set
- Under 200 lines total
- No essential principles to enforce
</when_to_use_this_pattern>
<key_insight>
## The Key Insight
**SKILL.md is always loaded. Use this guarantee.**
Put unavoidable content in SKILL.md:
- Essential principles
- Intake question
- Routing logic
Put workflow-specific content in workflows/:
- Step-by-step procedures
- Required references for that workflow
- Success criteria for that workflow
Put reusable knowledge in references/:
- Patterns and examples
- Technical details
- Domain expertise
</key_insight>

View File

@@ -1,152 +0,0 @@
# Skill Structure Reference
Skills have three structural components: YAML frontmatter (metadata), standard markdown body (content), and progressive disclosure (file organization).
## Body Format
Use **standard markdown headings** for structure. Keep markdown formatting within content (bold, italic, lists, code blocks, links).
```markdown
---
name: my-skill
description: What it does and when to use it
---
# Skill Name
## Quick Start
Immediate actionable guidance...
## Instructions
Step-by-step procedures...
## Examples
Concrete usage examples...
## Guidelines
Rules and constraints...
```
## Recommended Sections
Every skill should have:
- **Quick Start** - Immediate, actionable guidance (minimal working example)
- **Instructions** - Core step-by-step guidance
- **Success Criteria** - How to know it worked
Add based on complexity:
- **Context** - Background/situational information
- **Workflow** - Multi-step procedures
- **Examples** - Concrete input/output pairs
- **Advanced Features** - Deep-dive topics (link to reference files)
- **Anti-Patterns** - Common mistakes to avoid
- **Guidelines** - Rules and constraints
## YAML Frontmatter
### Required/Recommended Fields
```yaml
---
name: skill-name-here
description: What it does and when to use it (specific triggers included)
---
```
### Name Field
**Validation rules:**
- Maximum 64 characters
- Lowercase letters, numbers, hyphens only
- Must match directory name
- No reserved words: "anthropic", "claude"
**Examples:**
- `triage-prs`
- `deploy-production`
- `review-code`
- `setup-stripe-payments`
**Avoid:** `helper`, `utils`, `tools`, generic names
### Description Field
**Validation rules:**
- Maximum 1024 characters
- Include what it does AND when to use it
- Third person voice
**Good:**
```yaml
description: Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.
```
**Bad:**
```yaml
description: Helps with documents
```
### Optional Fields
| Field | Description |
|-------|-------------|
| `argument-hint` | Usage hints. Example: `[issue-number]` |
| `disable-model-invocation` | `true` to prevent auto-loading. Use for side-effect workflows. |
| `user-invocable` | `false` to hide from `/` menu. Use for background knowledge. |
| `allowed-tools` | Tools without permission prompts. Example: `Read, Bash(git *)` |
| `model` | `haiku`, `sonnet`, or `opus` |
| `context` | `fork` for isolated subagent execution |
| `agent` | Subagent type: `Explore`, `Plan`, `general-purpose`, or custom |
## Naming Conventions
Use descriptive names that indicate purpose:
| Pattern | Examples |
|---------|----------|
| Action-oriented | `triage-prs`, `deploy-production`, `review-code` |
| Domain-specific | `setup-stripe-payments`, `manage-facebook-ads` |
| Descriptive | `git-worktree`, `frontend-design`, `dhh-rails-style` |
## Progressive Disclosure
Keep SKILL.md under 500 lines. Split into reference files:
```
my-skill/
├── SKILL.md # Entry point (required, overview + navigation)
├── reference.md # Detailed docs (loaded when needed)
├── examples.md # Usage examples (loaded when needed)
└── scripts/
└── helper.py # Utility script (executed, not loaded)
```
**Rules:**
- Keep references one level deep from SKILL.md
- Add table of contents to reference files over 100 lines
- Use forward slashes in paths: `scripts/helper.py`
- Name files descriptively: `form_validation_rules.md` not `doc2.md`
## Validation Checklist
Before finalizing:
- [ ] YAML frontmatter valid (name matches directory, description specific)
- [ ] Uses standard markdown headings (not XML tags)
- [ ] Has Quick Start, Instructions, and Success Criteria sections
- [ ] `disable-model-invocation: true` if skill has side effects
- [ ] SKILL.md under 500 lines
- [ ] Reference files linked properly from SKILL.md
- [ ] File paths use forward slashes
- [ ] Tested with real usage
## Anti-Patterns
- **XML tags in body** - Use standard markdown headings
- **Vague descriptions** - Be specific with trigger keywords
- **Deep nesting** - Keep references one level from SKILL.md
- **Missing invocation control** - Side-effect workflows need `disable-model-invocation: true`
- **Inconsistent naming** - Directory name must match `name` field
- **Windows paths** - Always use forward slashes

View File

@@ -1,113 +0,0 @@
# Using Scripts in Skills
<purpose>
Scripts are executable code that Claude runs as-is rather than regenerating each time. They ensure reliable, error-free execution of repeated operations.
</purpose>
<when_to_use>
Use scripts when:
- The same code runs across multiple skill invocations
- Operations are error-prone when rewritten from scratch
- Complex shell commands or API interactions are involved
- Consistency matters more than flexibility
Common script types:
- **Deployment** - Deploy to Vercel, publish packages, push releases
- **Setup** - Initialize projects, install dependencies, configure environments
- **API calls** - Authenticated requests, webhook handlers, data fetches
- **Data processing** - Transform files, batch operations, migrations
- **Build processes** - Compile, bundle, test runners
</when_to_use>
<script_structure>
Scripts live in `scripts/` within the skill directory:
```
skill-name/
├── SKILL.md
├── workflows/
├── references/
├── templates/
└── scripts/
├── deploy.sh
├── setup.py
└── fetch-data.ts
```
A well-structured script includes:
1. Clear purpose comment at top
2. Input validation
3. Error handling
4. Idempotent operations where possible
5. Clear output/feedback
</script_structure>
<script_example>
```bash
#!/bin/bash
# deploy.sh - Deploy project to Vercel
# Usage: ./deploy.sh [environment]
# Environments: preview (default), production
set -euo pipefail
ENVIRONMENT="${1:-preview}"
# Validate environment
if [[ "$ENVIRONMENT" != "preview" && "$ENVIRONMENT" != "production" ]]; then
echo "Error: Environment must be 'preview' or 'production'"
exit 1
fi
echo "Deploying to $ENVIRONMENT..."
if [[ "$ENVIRONMENT" == "production" ]]; then
vercel --prod
else
vercel
fi
echo "Deployment complete."
```
</script_example>
<workflow_integration>
Workflows reference scripts like this:
```xml
<process>
## Step 5: Deploy
1. Ensure all tests pass
2. Run `scripts/deploy.sh production`
3. Verify deployment succeeded
4. Update user with deployment URL
</process>
```
The workflow tells Claude WHEN to run the script. The script handles HOW the operation executes.
</workflow_integration>
<best_practices>
**Do:**
- Make scripts idempotent (safe to run multiple times)
- Include clear usage comments
- Validate inputs before executing
- Provide meaningful error messages
- Use `set -euo pipefail` in bash scripts
**Don't:**
- Hardcode secrets or credentials (use environment variables)
- Create scripts for one-off operations
- Skip error handling
- Make scripts do too many unrelated things
- Forget to make scripts executable (`chmod +x`)
</best_practices>
<security_considerations>
- Never embed API keys, tokens, or secrets in scripts
- Use environment variables for sensitive configuration
- Validate and sanitize any user-provided inputs
- Be cautious with scripts that delete or modify data
- Consider adding `--dry-run` options for destructive operations
</security_considerations>

View File

@@ -1,112 +0,0 @@
# Using Templates in Skills
<purpose>
Templates are reusable output structures that Claude copies and fills in. They ensure consistent, high-quality outputs without regenerating structure each time.
</purpose>
<when_to_use>
Use templates when:
- Output should have consistent structure across invocations
- The structure matters more than creative generation
- Filling placeholders is more reliable than blank-page generation
- Users expect predictable, professional-looking outputs
Common template types:
- **Plans** - Project plans, implementation plans, migration plans
- **Specifications** - Technical specs, feature specs, API specs
- **Documents** - Reports, proposals, summaries
- **Configurations** - Config files, settings, environment setups
- **Scaffolds** - File structures, boilerplate code
</when_to_use>
<template_structure>
Templates live in `templates/` within the skill directory:
```
skill-name/
├── SKILL.md
├── workflows/
├── references/
└── templates/
├── plan-template.md
├── spec-template.md
└── report-template.md
```
A template file contains:
1. Clear section markers
2. Placeholder indicators (use `{{placeholder}}` or `[PLACEHOLDER]`)
3. Inline guidance for what goes where
4. Example content where helpful
</template_structure>
<template_example>
```markdown
# {{PROJECT_NAME}} Implementation Plan
## Overview
{{1-2 sentence summary of what this plan covers}}
## Goals
- {{Primary goal}}
- {{Secondary goals...}}
## Scope
**In scope:**
- {{What's included}}
**Out of scope:**
- {{What's explicitly excluded}}
## Phases
### Phase 1: {{Phase name}}
**Duration:** {{Estimated duration}}
**Deliverables:**
- {{Deliverable 1}}
- {{Deliverable 2}}
### Phase 2: {{Phase name}}
...
## Success Criteria
- [ ] {{Measurable criterion 1}}
- [ ] {{Measurable criterion 2}}
## Risks
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| {{Risk}} | {{H/M/L}} | {{H/M/L}} | {{Strategy}} |
```
</template_example>
<workflow_integration>
Workflows reference templates like this:
```xml
<process>
## Step 3: Generate Plan
1. Read `templates/plan-template.md`
2. Copy the template structure
3. Fill each placeholder based on gathered requirements
4. Review for completeness
</process>
```
The workflow tells Claude WHEN to use the template. The template provides WHAT structure to produce.
</workflow_integration>
<best_practices>
**Do:**
- Keep templates focused on structure, not content
- Use clear placeholder syntax consistently
- Include brief inline guidance where sections might be ambiguous
- Make templates complete but minimal
**Don't:**
- Put excessive example content that might be copied verbatim
- Create templates for outputs that genuinely need creative generation
- Over-constrain with too many required sections
- Forget to update templates when requirements change
</best_practices>

View File

@@ -1,510 +0,0 @@
<overview>
This reference covers patterns for complex workflows, validation loops, and feedback cycles in skill authoring. All patterns use pure XML structure.
</overview>
<complex_workflows>
<principle>
Break complex operations into clear, sequential steps. For particularly complex workflows, provide a checklist.
</principle>
<pdf_forms_example>
```xml
<objective>
Fill PDF forms with validated data from JSON field mappings.
</objective>
<workflow>
Copy this checklist and check off items as you complete them:
```
Task Progress:
- [ ] Step 1: Analyze the form (run analyze_form.py)
- [ ] Step 2: Create field mapping (edit fields.json)
- [ ] Step 3: Validate mapping (run validate_fields.py)
- [ ] Step 4: Fill the form (run fill_form.py)
- [ ] Step 5: Verify output (run verify_output.py)
```
<step_1>
**Analyze the form**
Run: `python scripts/analyze_form.py input.pdf`
This extracts form fields and their locations, saving to `fields.json`.
</step_1>
<step_2>
**Create field mapping**
Edit `fields.json` to add values for each field.
</step_2>
<step_3>
**Validate mapping**
Run: `python scripts/validate_fields.py fields.json`
Fix any validation errors before continuing.
</step_3>
<step_4>
**Fill the form**
Run: `python scripts/fill_form.py input.pdf fields.json output.pdf`
</step_4>
<step_5>
**Verify output**
Run: `python scripts/verify_output.py output.pdf`
If verification fails, return to Step 2.
</step_5>
</workflow>
```
</pdf_forms_example>
<when_to_use>
Use checklist pattern when:
- Workflow has 5+ sequential steps
- Steps must be completed in order
- Progress tracking helps prevent errors
- Easy resumption after interruption is valuable
</when_to_use>
</complex_workflows>
<feedback_loops>
<validate_fix_repeat_pattern>
<principle>
Run validator → fix errors → repeat. This pattern greatly improves output quality.
</principle>
<document_editing_example>
```xml
<objective>
Edit OOXML documents with XML validation at each step.
</objective>
<editing_process>
<step_1>
Make your edits to `word/document.xml`
</step_1>
<step_2>
**Validate immediately**: `python ooxml/scripts/validate.py unpacked_dir/`
</step_2>
<step_3>
If validation fails:
- Review the error message carefully
- Fix the issues in the XML
- Run validation again
</step_3>
<step_4>
**Only proceed when validation passes**
</step_4>
<step_5>
Rebuild: `python ooxml/scripts/pack.py unpacked_dir/ output.docx`
</step_5>
<step_6>
Test the output document
</step_6>
</editing_process>
<validation>
Never skip validation. Catching errors early prevents corrupted output files.
</validation>
```
</document_editing_example>
<why_it_works>
- Catches errors early before changes are applied
- Machine-verifiable with objective verification
- Plan can be iterated without touching originals
- Reduces total iteration cycles
</why_it_works>
</validate_fix_repeat_pattern>
<plan_validate_execute_pattern>
<principle>
When Claude performs complex, open-ended tasks, create a plan in a structured format, validate it, then execute.
Workflow: analyze → **create plan file****validate plan** → execute → verify
</principle>
<batch_update_example>
```xml
<objective>
Apply batch updates to spreadsheet with plan validation.
</objective>
<workflow>
<plan_phase>
<step_1>
Analyze the spreadsheet and requirements
</step_1>
<step_2>
Create `changes.json` with all planned updates
</step_2>
</plan_phase>
<validation_phase>
<step_3>
Validate the plan: `python scripts/validate_changes.py changes.json`
</step_3>
<step_4>
If validation fails:
- Review error messages
- Fix issues in changes.json
- Validate again
</step_4>
<step_5>
Only proceed when validation passes
</step_5>
</validation_phase>
<execution_phase>
<step_6>
Apply changes: `python scripts/apply_changes.py changes.json`
</step_6>
<step_7>
Verify output
</step_7>
</execution_phase>
</workflow>
<success_criteria>
- Plan validation passes with zero errors
- All changes applied successfully
- Output verification confirms expected results
</success_criteria>
```
</batch_update_example>
<implementation_tip>
Make validation scripts verbose with specific error messages:
**Good error message**:
"Field 'signature_date' not found. Available fields: customer_name, order_total, signature_date_signed"
**Bad error message**:
"Invalid field"
Specific errors help Claude fix issues without guessing.
</implementation_tip>
<when_to_use>
Use plan-validate-execute when:
- Operations are complex and error-prone
- Changes are irreversible or difficult to undo
- Planning can be validated independently
- Catching errors early saves significant time
</when_to_use>
</plan_validate_execute_pattern>
</feedback_loops>
<conditional_workflows>
<principle>
Guide Claude through decision points with clear branching logic.
</principle>
<document_modification_example>
```xml
<objective>
Modify DOCX files using appropriate method based on task type.
</objective>
<workflow>
<decision_point_1>
Determine the modification type:
**Creating new content?** → Follow "Creation workflow"
**Editing existing content?** → Follow "Editing workflow"
</decision_point_1>
<creation_workflow>
<objective>Build documents from scratch</objective>
<steps>
1. Use docx-js library
2. Build document from scratch
3. Export to .docx format
</steps>
</creation_workflow>
<editing_workflow>
<objective>Modify existing documents</objective>
<steps>
1. Unpack existing document
2. Modify XML directly
3. Validate after each change
4. Repack when complete
</steps>
</editing_workflow>
</workflow>
<success_criteria>
- Correct workflow chosen based on task type
- All steps in chosen workflow completed
- Output file validated and verified
</success_criteria>
```
</document_modification_example>
<when_to_use>
Use conditional workflows when:
- Different task types require different approaches
- Decision points are clear and well-defined
- Workflows are mutually exclusive
- Guiding Claude to correct path improves outcomes
</when_to_use>
</conditional_workflows>
<validation_scripts>
<principles>
Validation scripts are force multipliers. They catch errors that Claude might miss and provide actionable feedback for fixing issues.
</principles>
<characteristics_of_good_validation>
<verbose_errors>
**Good**: "Field 'signature_date' not found. Available fields: customer_name, order_total, signature_date_signed"
**Bad**: "Invalid field"
Verbose errors help Claude fix issues in one iteration instead of multiple rounds of guessing.
</verbose_errors>
<specific_feedback>
**Good**: "Line 47: Expected closing tag `</paragraph>` but found `</section>`"
**Bad**: "XML syntax error"
Specific feedback pinpoints exact location and nature of the problem.
</specific_feedback>
<actionable_suggestions>
**Good**: "Required field 'customer_name' is missing. Add: {\"customer_name\": \"value\"}"
**Bad**: "Missing required field"
Actionable suggestions show Claude exactly what to fix.
</actionable_suggestions>
<available_options>
When validation fails, show available valid options:
**Good**: "Invalid status 'pending_review'. Valid statuses: active, paused, archived"
**Bad**: "Invalid status"
Showing valid options eliminates guesswork.
</available_options>
</characteristics_of_good_validation>
<implementation_pattern>
```xml
<validation>
After making changes, validate immediately:
```bash
python scripts/validate.py output_dir/
```
If validation fails, fix errors before continuing. Validation errors include:
- **Field not found**: "Field 'signature_date' not found. Available fields: customer_name, order_total, signature_date_signed"
- **Type mismatch**: "Field 'order_total' expects number, got string"
- **Missing required field**: "Required field 'customer_name' is missing"
- **Invalid value**: "Invalid status 'pending_review'. Valid statuses: active, paused, archived"
Only proceed when validation passes with zero errors.
</validation>
```
</implementation_pattern>
<benefits>
- Catches errors before they propagate
- Reduces iteration cycles
- Provides learning feedback
- Makes debugging deterministic
- Enables confident execution
</benefits>
</validation_scripts>
<iterative_refinement>
<principle>
Many workflows benefit from iteration: generate → validate → refine → validate → finalize.
</principle>
<implementation_example>
```xml
<objective>
Generate reports with iterative quality improvement.
</objective>
<workflow>
<iteration_1>
**Generate initial draft**
Create report based on data and requirements.
</iteration_1>
<iteration_2>
**Validate draft**
Run: `python scripts/validate_report.py draft.md`
Fix any structural issues, missing sections, or data errors.
</iteration_2>
<iteration_3>
**Refine content**
Improve clarity, add supporting data, enhance visualizations.
</iteration_3>
<iteration_4>
**Final validation**
Run: `python scripts/validate_report.py final.md`
Ensure all quality criteria met.
</iteration_4>
<iteration_5>
**Finalize**
Export to final format and deliver.
</iteration_5>
</workflow>
<success_criteria>
- Final validation passes with zero errors
- All quality criteria met
- Report ready for delivery
</success_criteria>
```
</implementation_example>
<when_to_use>
Use iterative refinement when:
- Quality improves with multiple passes
- Validation provides actionable feedback
- Time permits iteration
- Perfect output matters more than speed
</when_to_use>
</iterative_refinement>
<checkpoint_pattern>
<principle>
For long workflows, add checkpoints where Claude can pause and verify progress before continuing.
</principle>
<implementation_example>
```xml
<workflow>
<phase_1>
**Data collection** (Steps 1-3)
1. Extract data from source
2. Transform to target format
3. **CHECKPOINT**: Verify data completeness
Only continue if checkpoint passes.
</phase_1>
<phase_2>
**Data processing** (Steps 4-6)
4. Apply business rules
5. Validate transformations
6. **CHECKPOINT**: Verify processing accuracy
Only continue if checkpoint passes.
</phase_2>
<phase_3>
**Output generation** (Steps 7-9)
7. Generate output files
8. Validate output format
9. **CHECKPOINT**: Verify final output
Proceed to delivery only if checkpoint passes.
</phase_3>
</workflow>
<checkpoint_validation>
At each checkpoint:
1. Run validation script
2. Review output for correctness
3. Verify no errors or warnings
4. Only proceed when validation passes
</checkpoint_validation>
```
</implementation_example>
<benefits>
- Prevents cascading errors
- Easier to diagnose issues
- Clear progress indicators
- Natural pause points for review
- Reduces wasted work from early errors
</benefits>
</checkpoint_pattern>
<error_recovery>
<principle>
Design workflows with clear error recovery paths. Claude should know what to do when things go wrong.
</principle>
<implementation_example>
```xml
<workflow>
<normal_path>
1. Process input file
2. Validate output
3. Save results
</normal_path>
<error_recovery>
**If validation fails in step 2:**
- Review validation errors
- Check if input file is corrupted → Return to step 1 with different input
- Check if processing logic failed → Fix logic, return to step 1
- Check if output format wrong → Fix format, return to step 2
**If save fails in step 3:**
- Check disk space
- Check file permissions
- Check file path validity
- Retry save with corrected conditions
</error_recovery>
<escalation>
**If error persists after 3 attempts:**
- Document the error with full context
- Save partial results if available
- Report issue to user with diagnostic information
</escalation>
</workflow>
```
</implementation_example>
<when_to_use>
Include error recovery when:
- Workflows interact with external systems
- File operations could fail
- Network calls could timeout
- User input could be invalid
- Errors are recoverable
</when_to_use>
</error_recovery>

View File

@@ -1,73 +0,0 @@
---
name: {{SKILL_NAME}}
description: {{What it does}} Use when {{trigger conditions}}.
---
<essential_principles>
## {{Core Concept}}
{{Principles that ALWAYS apply, regardless of which workflow runs}}
### 1. {{First principle}}
{{Explanation}}
### 2. {{Second principle}}
{{Explanation}}
### 3. {{Third principle}}
{{Explanation}}
</essential_principles>
<intake>
**Ask the user:**
What would you like to do?
1. {{First option}}
2. {{Second option}}
3. {{Third option}}
**Wait for response before proceeding.**
</intake>
<routing>
| Response | Workflow |
|----------|----------|
| 1, "{{keywords}}" | `workflows/{{first-workflow}}.md` |
| 2, "{{keywords}}" | `workflows/{{second-workflow}}.md` |
| 3, "{{keywords}}" | `workflows/{{third-workflow}}.md` |
**After reading the workflow, follow it exactly.**
</routing>
<quick_reference>
## {{Skill Name}} Quick Reference
{{Brief reference information always useful to have visible}}
</quick_reference>
<reference_index>
## Domain Knowledge
All in `references/`:
- {{reference-1.md}} - {{purpose}}
- {{reference-2.md}} - {{purpose}}
</reference_index>
<workflows_index>
## Workflows
All in `workflows/`:
| Workflow | Purpose |
|----------|---------|
| {{first-workflow}}.md | {{purpose}} |
| {{second-workflow}}.md | {{purpose}} |
| {{third-workflow}}.md | {{purpose}} |
</workflows_index>
<success_criteria>
A well-executed {{skill name}}:
- {{First criterion}}
- {{Second criterion}}
- {{Third criterion}}
</success_criteria>

View File

@@ -1,33 +0,0 @@
---
name: {{SKILL_NAME}}
description: {{What it does}} Use when {{trigger conditions}}.
---
<objective>
{{Clear statement of what this skill accomplishes}}
</objective>
<quick_start>
{{Immediate actionable guidance - what Claude should do first}}
</quick_start>
<process>
## Step 1: {{First action}}
{{Instructions for step 1}}
## Step 2: {{Second action}}
{{Instructions for step 2}}
## Step 3: {{Third action}}
{{Instructions for step 3}}
</process>
<success_criteria>
{{Skill name}} is complete when:
- [ ] {{First success criterion}}
- [ ] {{Second success criterion}}
- [ ] {{Third success criterion}}
</success_criteria>

View File

@@ -1,96 +0,0 @@
# Workflow: Add a Reference to Existing Skill
<required_reading>
**Read these reference files NOW:**
1. references/recommended-structure.md
2. references/skill-structure.md
</required_reading>
<process>
## Step 1: Select the Skill
```bash
ls ~/.claude/skills/
```
Present numbered list, ask: "Which skill needs a new reference?"
## Step 2: Analyze Current Structure
```bash
cat ~/.claude/skills/{skill-name}/SKILL.md
ls ~/.claude/skills/{skill-name}/references/ 2>/dev/null
```
Determine:
- **Has references/ folder?** → Good, can add directly
- **Simple skill?** → May need to create references/ first
- **What references exist?** → Understand the knowledge landscape
Report current references to user.
## Step 3: Gather Reference Requirements
Ask:
- What knowledge should this reference contain?
- Which workflows will use it?
- Is this reusable across workflows or specific to one?
**If specific to one workflow** → Consider putting it inline in that workflow instead.
## Step 4: Create the Reference File
Create `references/{reference-name}.md`:
Use semantic XML tags to structure the content:
```xml
<overview>
Brief description of what this reference covers
</overview>
<patterns>
## Common Patterns
[Reusable patterns, examples, code snippets]
</patterns>
<guidelines>
## Guidelines
[Best practices, rules, constraints]
</guidelines>
<examples>
## Examples
[Concrete examples with explanation]
</examples>
```
## Step 5: Update SKILL.md
Add the new reference to `<reference_index>`:
```markdown
**Category:** existing.md, new-reference.md
```
## Step 6: Update Workflows That Need It
For each workflow that should use this reference:
1. Read the workflow file
2. Add to its `<required_reading>` section
3. Verify the workflow still makes sense with this addition
## Step 7: Verify
- [ ] Reference file exists and is well-structured
- [ ] Reference is in SKILL.md reference_index
- [ ] Relevant workflows have it in required_reading
- [ ] No broken references
</process>
<success_criteria>
Reference addition is complete when:
- [ ] Reference file created with useful content
- [ ] Added to reference_index in SKILL.md
- [ ] Relevant workflows updated to read it
- [ ] Content is reusable (not workflow-specific)
</success_criteria>

View File

@@ -1,93 +0,0 @@
# Workflow: Add a Script to a Skill
<required_reading>
**Read these reference files NOW:**
1. references/using-scripts.md
</required_reading>
<process>
## Step 1: Identify the Skill
Ask (if not already provided):
- Which skill needs a script?
- What operation should the script perform?
## Step 2: Analyze Script Need
Confirm this is a good script candidate:
- [ ] Same code runs across multiple invocations
- [ ] Operation is error-prone when rewritten
- [ ] Consistency matters more than flexibility
If not a good fit, suggest alternatives (inline code in workflow, reference examples).
## Step 3: Create Scripts Directory
```bash
mkdir -p ~/.claude/skills/{skill-name}/scripts
```
## Step 4: Design Script
Gather requirements:
- What inputs does the script need?
- What should it output or accomplish?
- What errors might occur?
- Should it be idempotent?
Choose language:
- **bash** - Shell operations, file manipulation, CLI tools
- **python** - Data processing, API calls, complex logic
- **node/ts** - JavaScript ecosystem, async operations
## Step 5: Write Script File
Create `scripts/{script-name}.{ext}` with:
- Purpose comment at top
- Usage instructions
- Input validation
- Error handling
- Clear output/feedback
For bash scripts:
```bash
#!/bin/bash
set -euo pipefail
```
## Step 6: Make Executable (if bash)
```bash
chmod +x ~/.claude/skills/{skill-name}/scripts/{script-name}.sh
```
## Step 7: Update Workflow to Use Script
Find the workflow that needs this operation. Add:
```xml
<process>
...
N. Run `scripts/{script-name}.sh [arguments]`
N+1. Verify operation succeeded
...
</process>
```
## Step 8: Test
Invoke the skill workflow and verify:
- Script runs at the right step
- Inputs are passed correctly
- Errors are handled gracefully
- Output matches expectations
</process>
<success_criteria>
Script is complete when:
- [ ] scripts/ directory exists
- [ ] Script file has proper structure (comments, validation, error handling)
- [ ] Script is executable (if bash)
- [ ] At least one workflow references the script
- [ ] No hardcoded secrets or credentials
- [ ] Tested with real invocation
</success_criteria>

View File

@@ -1,74 +0,0 @@
# Workflow: Add a Template to a Skill
<required_reading>
**Read these reference files NOW:**
1. references/using-templates.md
</required_reading>
<process>
## Step 1: Identify the Skill
Ask (if not already provided):
- Which skill needs a template?
- What output does this template structure?
## Step 2: Analyze Template Need
Confirm this is a good template candidate:
- [ ] Output has consistent structure across uses
- [ ] Structure matters more than creative generation
- [ ] Filling placeholders is more reliable than blank-page generation
If not a good fit, suggest alternatives (workflow guidance, reference examples).
## Step 3: Create Templates Directory
```bash
mkdir -p ~/.claude/skills/{skill-name}/templates
```
## Step 4: Design Template Structure
Gather requirements:
- What sections does the output need?
- What information varies between uses? (→ placeholders)
- What stays constant? (→ static structure)
## Step 5: Write Template File
Create `templates/{template-name}.md` with:
- Clear section markers
- `{{PLACEHOLDER}}` syntax for variable content
- Brief inline guidance where helpful
- Minimal example content
## Step 6: Update Workflow to Use Template
Find the workflow that produces this output. Add:
```xml
<process>
...
N. Read `templates/{template-name}.md`
N+1. Copy template structure
N+2. Fill each placeholder based on gathered context
...
</process>
```
## Step 7: Test
Invoke the skill workflow and verify:
- Template is read at the right step
- All placeholders get filled appropriately
- Output structure matches template
- No placeholders left unfilled
</process>
<success_criteria>
Template is complete when:
- [ ] templates/ directory exists
- [ ] Template file has clear structure with placeholders
- [ ] At least one workflow references the template
- [ ] Workflow instructions explain when/how to use template
- [ ] Tested with real invocation
</success_criteria>

View File

@@ -1,126 +0,0 @@
# Workflow: Add a Workflow to Existing Skill
## Interaction Method
If `AskUserQuestion` is available, use it for all prompts below.
If not, present each question as a numbered list and wait for a reply before proceeding to the next step. Never skip or auto-configure.
<required_reading>
**Read these reference files NOW:**
1. references/recommended-structure.md
2. references/workflows-and-validation.md
</required_reading>
<process>
## Step 1: Select the Skill
**DO NOT use AskUserQuestion** - there may be many skills.
```bash
ls ~/.claude/skills/
```
Present numbered list, ask: "Which skill needs a new workflow?"
## Step 2: Analyze Current Structure
Read the skill:
```bash
cat ~/.claude/skills/{skill-name}/SKILL.md
ls ~/.claude/skills/{skill-name}/workflows/ 2>/dev/null
```
Determine:
- **Simple skill?** → May need to upgrade to router pattern first
- **Already has workflows/?** → Good, can add directly
- **What workflows exist?** → Avoid duplication
Report current structure to user.
## Step 3: Gather Workflow Requirements
Ask using AskUserQuestion or direct question:
- What should this workflow do?
- When would someone use it vs existing workflows?
- What references would it need?
## Step 4: Upgrade to Router Pattern (if needed)
**If skill is currently simple (no workflows/):**
Ask: "This skill needs to be upgraded to the router pattern first. Should I restructure it?"
If yes:
1. Create workflows/ directory
2. Move existing process content to workflows/main.md
3. Rewrite SKILL.md as router with intake + routing
4. Verify structure works before proceeding
## Step 5: Create the Workflow File
Create `workflows/{workflow-name}.md`:
```markdown
# Workflow: {Workflow Name}
<required_reading>
**Read these reference files NOW:**
1. references/{relevant-file}.md
</required_reading>
<process>
## Step 1: {First Step}
[What to do]
## Step 2: {Second Step}
[What to do]
## Step 3: {Third Step}
[What to do]
</process>
<success_criteria>
This workflow is complete when:
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3
</success_criteria>
```
## Step 6: Update SKILL.md
Add the new workflow to:
1. **Intake question** - Add new option
2. **Routing table** - Map option to workflow file
3. **Workflows index** - Add to the list
## Step 7: Create References (if needed)
If the workflow needs domain knowledge that doesn't exist:
1. Create `references/{reference-name}.md`
2. Add to reference_index in SKILL.md
3. Reference it in the workflow's required_reading
## Step 8: Test
Invoke the skill:
- Does the new option appear in intake?
- Does selecting it route to the correct workflow?
- Does the workflow load the right references?
- Does the workflow execute correctly?
Report results to user.
</process>
<success_criteria>
Workflow addition is complete when:
- [ ] Skill upgraded to router pattern (if needed)
- [ ] Workflow file created with required_reading, process, success_criteria
- [ ] SKILL.md intake updated with new option
- [ ] SKILL.md routing updated
- [ ] SKILL.md workflows_index updated
- [ ] Any needed references created
- [ ] Tested and working
</success_criteria>

View File

@@ -1,138 +0,0 @@
# Workflow: Audit a Skill
<required_reading>
**Read these reference files NOW:**
1. references/recommended-structure.md
2. references/skill-structure.md
3. references/use-xml-tags.md
</required_reading>
<process>
## Step 1: List Available Skills
**DO NOT use AskUserQuestion** - there may be many skills.
Enumerate skills in chat as numbered list:
```bash
ls ~/.claude/skills/
```
Present as:
```
Available skills:
1. create-agent-skills
2. build-macos-apps
3. manage-stripe
...
```
Ask: "Which skill would you like to audit? (enter number or name)"
## Step 2: Read the Skill
After user selects, read the full skill structure:
```bash
# Read main file
cat ~/.claude/skills/{skill-name}/SKILL.md
# Check for workflows and references
ls ~/.claude/skills/{skill-name}/
ls ~/.claude/skills/{skill-name}/workflows/ 2>/dev/null
ls ~/.claude/skills/{skill-name}/references/ 2>/dev/null
```
## Step 3: Run Audit Checklist
Evaluate against each criterion:
### YAML Frontmatter
- [ ] Has `name:` field (lowercase-with-hyphens)
- [ ] Name matches directory name
- [ ] Has `description:` field
- [ ] Description says what it does AND when to use it
- [ ] Description is third person ("Use when...")
### Structure
- [ ] SKILL.md under 500 lines
- [ ] Pure XML structure (no markdown headings # in body)
- [ ] All XML tags properly closed
- [ ] Has required tags: objective OR essential_principles
- [ ] Has success_criteria
### Router Pattern (if complex skill)
- [ ] Essential principles inline in SKILL.md (not in separate file)
- [ ] Has intake question
- [ ] Has routing table
- [ ] All referenced workflow files exist
- [ ] All referenced reference files exist
### Workflows (if present)
- [ ] Each has required_reading section
- [ ] Each has process section
- [ ] Each has success_criteria section
- [ ] Required reading references exist
### Content Quality
- [ ] Principles are actionable (not vague platitudes)
- [ ] Steps are specific (not "do the thing")
- [ ] Success criteria are verifiable
- [ ] No redundant content across files
## Step 4: Generate Report
Present findings as:
```
## Audit Report: {skill-name}
### ✅ Passing
- [list passing items]
### ⚠️ Issues Found
1. **[Issue name]**: [Description]
→ Fix: [Specific action]
2. **[Issue name]**: [Description]
→ Fix: [Specific action]
### 📊 Score: X/Y criteria passing
```
## Step 5: Offer Fixes
If issues found, ask:
"Would you like me to fix these issues?"
Options:
1. **Fix all** - Apply all recommended fixes
2. **Fix one by one** - Review each fix before applying
3. **Just the report** - No changes needed
If fixing:
- Make each change
- Verify file validity after each change
- Report what was fixed
</process>
<audit_anti_patterns>
## Common Anti-Patterns to Flag
**Skippable principles**: Essential principles in separate file instead of inline
**Monolithic skill**: Single file over 500 lines
**Mixed concerns**: Procedures and knowledge in same file
**Vague steps**: "Handle the error appropriately"
**Untestable criteria**: "User is satisfied"
**Markdown headings in body**: Using # instead of XML tags
**Missing routing**: Complex skill without intake/routing
**Broken references**: Files mentioned but don't exist
**Redundant content**: Same information in multiple places
</audit_anti_patterns>
<success_criteria>
Audit is complete when:
- [ ] Skill fully read and analyzed
- [ ] All checklist items evaluated
- [ ] Report presented to user
- [ ] Fixes applied (if requested)
- [ ] User has clear picture of skill health
</success_criteria>

View File

@@ -1,605 +0,0 @@
# Workflow: Create Exhaustive Domain Expertise Skill
<objective>
Build a comprehensive execution skill that does real work in a specific domain. Domain expertise skills are full-featured build skills with exhaustive domain knowledge in references, complete workflows for the full lifecycle (build → debug → optimize → ship), and can be both invoked directly by users AND loaded by other skills (like create-plans) for domain knowledge.
</objective>
<critical_distinction>
**Regular skill:** "Do one specific task"
**Domain expertise skill:** "Do EVERYTHING in this domain, with complete practitioner knowledge"
Examples:
- `expertise/macos-apps` - Build macOS apps from scratch through shipping
- `expertise/python-games` - Build complete Python games with full game dev lifecycle
- `expertise/rust-systems` - Build Rust systems programs with exhaustive systems knowledge
- `expertise/web-scraping` - Build scrapers, handle all edge cases, deploy at scale
Domain expertise skills:
- ✅ Execute tasks (build, debug, optimize, ship)
- ✅ Have comprehensive domain knowledge in references
- ✅ Are invoked directly by users ("build a macOS app")
- ✅ Can be loaded by other skills (create-plans reads references for planning)
- ✅ Cover the FULL lifecycle, not just getting started
</critical_distinction>
<required_reading>
**Read these reference files NOW:**
1. references/recommended-structure.md
2. references/core-principles.md
3. references/use-xml-tags.md
</required_reading>
<process>
## Step 1: Identify Domain
Ask user what domain expertise to build:
**Example domains:**
- macOS/iOS app development
- Python game development
- Rust systems programming
- Machine learning / AI
- Web scraping and automation
- Data engineering pipelines
- Audio processing / DSP
- 3D graphics / shaders
- Unity/Unreal game development
- Embedded systems
Get specific: "Python games" or "Python games with Pygame specifically"?
## Step 2: Confirm Target Location
Explain:
```
Domain expertise skills go in: ~/.claude/skills/expertise/{domain-name}/
These are comprehensive BUILD skills that:
- Execute tasks (build, debug, optimize, ship)
- Contain exhaustive domain knowledge
- Can be invoked directly by users
- Can be loaded by other skills for domain knowledge
Name suggestion: {suggested-name}
Location: ~/.claude/skills/expertise/{suggested-name}/
```
Confirm or adjust name.
## Step 3: Identify Workflows
Domain expertise skills cover the FULL lifecycle. Identify what workflows are needed.
**Common workflows for most domains:**
1. **build-new-{thing}.md** - Create from scratch
2. **add-feature.md** - Extend existing {thing}
3. **debug-{thing}.md** - Find and fix bugs
4. **write-tests.md** - Test for correctness
5. **optimize-performance.md** - Profile and speed up
6. **ship-{thing}.md** - Deploy/distribute
**Domain-specific workflows:**
- Games: `implement-game-mechanic.md`, `add-audio.md`, `polish-ui.md`
- Web apps: `setup-auth.md`, `add-api-endpoint.md`, `setup-database.md`
- Systems: `optimize-memory.md`, `profile-cpu.md`, `cross-compile.md`
Each workflow = one complete task type that users actually do.
## Step 4: Exhaustive Research Phase
**CRITICAL:** This research must be comprehensive, not superficial.
### Research Strategy
Run multiple web searches to ensure coverage:
**Search 1: Current ecosystem**
- "best {domain} libraries 2024 2025 2026"
- "popular {domain} frameworks comparison"
- "{domain} tech stack recommendations"
**Search 2: Architecture patterns**
- "{domain} architecture patterns"
- "{domain} best practices design patterns"
- "how to structure {domain} projects"
**Search 3: Lifecycle and tooling**
- "{domain} development workflow"
- "{domain} testing debugging best practices"
- "{domain} deployment distribution"
**Search 4: Common pitfalls**
- "{domain} common mistakes avoid"
- "{domain} anti-patterns"
- "what not to do {domain}"
**Search 5: Real-world usage**
- "{domain} production examples GitHub"
- "{domain} case studies"
- "successful {domain} projects"
### Verification Requirements
For EACH major library/tool/pattern found:
- **Check recency:** When was it last updated?
- **Check adoption:** Is it actively maintained? Community size?
- **Check alternatives:** What else exists? When to use each?
- **Check deprecation:** Is anything being replaced?
**Red flags for outdated content:**
- Articles from before 2023 (unless fundamental concepts)
- Abandoned libraries (no commits in 12+ months)
- Deprecated APIs or patterns
- "This used to be popular but..."
### Documentation Sources
Use Context7 MCP when available:
```
mcp__context7__resolve-library-id: {library-name}
mcp__context7__get-library-docs: {library-id}
```
Focus on official docs, not tutorials.
## Step 5: Organize Knowledge Into Domain Areas
Structure references by domain concerns, NOT by arbitrary categories.
**For game development example:**
```
references/
├── architecture.md # ECS, component-based, state machines
├── libraries.md # Pygame, Arcade, Panda3D (when to use each)
├── graphics-rendering.md # 2D/3D rendering, sprites, shaders
├── physics.md # Collision, physics engines
├── audio.md # Sound effects, music, spatial audio
├── input.md # Keyboard, mouse, gamepad, touch
├── ui-menus.md # HUD, menus, dialogs
├── game-loop.md # Update/render loop, fixed timestep
├── state-management.md # Game states, scene management
├── networking.md # Multiplayer, client-server, P2P
├── asset-pipeline.md # Loading, caching, optimization
├── testing-debugging.md # Unit tests, profiling, debugging tools
├── performance.md # Optimization, profiling, benchmarking
├── packaging.md # Building executables, installers
├── distribution.md # Steam, itch.io, app stores
└── anti-patterns.md # Common mistakes, what NOT to do
```
**For macOS app development example:**
```
references/
├── app-architecture.md # State management, dependency injection
├── swiftui-patterns.md # Declarative UI patterns
├── appkit-integration.md # Using AppKit with SwiftUI
├── concurrency-patterns.md # Async/await, actors, structured concurrency
├── data-persistence.md # Storage strategies
├── networking.md # URLSession, async networking
├── system-apis.md # macOS-specific frameworks
├── testing-tdd.md # Testing patterns
├── testing-debugging.md # Debugging tools and techniques
├── performance.md # Profiling, optimization
├── design-system.md # Platform conventions
├── macos-polish.md # Native feel, accessibility
├── security-code-signing.md # Signing, notarization
└── project-scaffolding.md # CLI-based setup
```
**For each reference file:**
- Pure XML structure
- Decision trees: "If X, use Y. If Z, use A instead."
- Comparison tables: Library vs Library (speed, features, learning curve)
- Code examples showing patterns
- "When to use" guidance
- Platform-specific considerations
- Current versions and compatibility
## Step 6: Create SKILL.md
Domain expertise skills use router pattern with essential principles:
```yaml
---
name: build-{domain-name}
description: Build {domain things} from scratch through shipping. Full lifecycle - build, debug, test, optimize, ship. {Any specific constraints like "CLI-only, no IDE"}.
---
<essential_principles>
## How {This Domain} Works
{Domain-specific principles that ALWAYS apply}
### 1. {First Principle}
{Critical practice that can't be skipped}
### 2. {Second Principle}
{Another fundamental practice}
### 3. {Third Principle}
{Core workflow pattern}
</essential_principles>
<intake>
**Ask the user:**
What would you like to do?
1. Build a new {thing}
2. Debug an existing {thing}
3. Add a feature
4. Write/run tests
5. Optimize performance
6. Ship/release
7. Something else
**Then read the matching workflow from `workflows/` and follow it.**
</intake>
<routing>
| Response | Workflow |
|----------|----------|
| 1, "new", "create", "build", "start" | `workflows/build-new-{thing}.md` |
| 2, "broken", "fix", "debug", "crash", "bug" | `workflows/debug-{thing}.md` |
| 3, "add", "feature", "implement", "change" | `workflows/add-feature.md` |
| 4, "test", "tests", "TDD", "coverage" | `workflows/write-tests.md` |
| 5, "slow", "optimize", "performance", "fast" | `workflows/optimize-performance.md` |
| 6, "ship", "release", "deploy", "publish" | `workflows/ship-{thing}.md` |
| 7, other | Clarify, then select workflow or references |
</routing>
<verification_loop>
## After Every Change
{Domain-specific verification steps}
Example for compiled languages:
```bash
# 1. Does it build?
{build command}
# 2. Do tests pass?
{test command}
# 3. Does it run?
{run command}
```
Report to the user:
- "Build: ✓"
- "Tests: X pass, Y fail"
- "Ready for you to check [specific thing]"
</verification_loop>
<reference_index>
## Domain Knowledge
All in `references/`:
**Architecture:** {list files}
**{Domain Area}:** {list files}
**{Domain Area}:** {list files}
**Development:** {list files}
**Shipping:** {list files}
</reference_index>
<workflows_index>
## Workflows
All in `workflows/`:
| File | Purpose |
|------|---------|
| build-new-{thing}.md | Create new {thing} from scratch |
| debug-{thing}.md | Find and fix bugs |
| add-feature.md | Add to existing {thing} |
| write-tests.md | Write and run tests |
| optimize-performance.md | Profile and speed up |
| ship-{thing}.md | Deploy/distribute |
</workflows_index>
```
## Step 7: Write Workflows
For EACH workflow identified in Step 3:
### Workflow Template
```markdown
# Workflow: {Workflow Name}
<required_reading>
**Read these reference files NOW before {doing the task}:**
1. references/{relevant-file}.md
2. references/{another-relevant-file}.md
3. references/{third-relevant-file}.md
</required_reading>
<process>
## Step 1: {First Action}
{What to do}
## Step 2: {Second Action}
{What to do - actual implementation steps}
## Step 3: {Third Action}
{What to do}
## Step 4: Verify
{How to prove it works}
```bash
{verification commands}
```
</process>
<anti_patterns>
Avoid:
- {Common mistake 1}
- {Common mistake 2}
- {Common mistake 3}
</anti_patterns>
<success_criteria>
A well-{completed task}:
- {Criterion 1}
- {Criterion 2}
- {Criterion 3}
- Builds/runs without errors
- Tests pass
- Feels {native/professional/correct}
</success_criteria>
```
**Key workflow characteristics:**
- Starts with required_reading (which references to load)
- Contains actual implementation steps (not just "read references")
- Includes verification steps
- Has success criteria
- Documents anti-patterns
## Step 8: Write Comprehensive References
For EACH reference file identified in Step 5:
### Structure Template
```xml
<overview>
Brief introduction to this domain area
</overview>
<options>
## Available Approaches/Libraries
<option name="Library A">
**When to use:** [specific scenarios]
**Strengths:** [what it's best at]
**Weaknesses:** [what it's not good for]
**Current status:** v{version}, actively maintained
**Learning curve:** [easy/medium/hard]
```code
# Example usage
```
</option>
<option name="Library B">
[Same structure]
</option>
</options>
<decision_tree>
## Choosing the Right Approach
**If you need [X]:** Use [Library A]
**If you need [Y]:** Use [Library B]
**If you have [constraint Z]:** Use [Library C]
**Avoid [Library D] if:** [specific scenarios]
</decision_tree>
<patterns>
## Common Patterns
<pattern name="Pattern Name">
**Use when:** [scenario]
**Implementation:** [code example]
**Considerations:** [trade-offs]
</pattern>
</patterns>
<anti_patterns>
## What NOT to Do
<anti_pattern name="Common Mistake">
**Problem:** [what people do wrong]
**Why it's bad:** [consequences]
**Instead:** [correct approach]
</anti_pattern>
</anti_patterns>
<platform_considerations>
## Platform-Specific Notes
**Windows:** [considerations]
**macOS:** [considerations]
**Linux:** [considerations]
**Mobile:** [if applicable]
</platform_considerations>
```
### Quality Standards
Each reference must include:
- **Current information** (verify dates)
- **Multiple options** (not just one library)
- **Decision guidance** (when to use each)
- **Real examples** (working code, not pseudocode)
- **Trade-offs** (no silver bullets)
- **Anti-patterns** (what NOT to do)
### Common Reference Files
Most domains need:
- **architecture.md** - How to structure projects
- **libraries.md** - Ecosystem overview with comparisons
- **patterns.md** - Design patterns specific to domain
- **testing-debugging.md** - How to verify correctness
- **performance.md** - Optimization strategies
- **deployment.md** - How to ship/distribute
- **anti-patterns.md** - Common mistakes consolidated
## Step 9: Validate Completeness
### Completeness Checklist
Ask: "Could a user build a professional {domain thing} from scratch through shipping using just this skill?"
**Must answer YES to:**
- [ ] All major libraries/frameworks covered?
- [ ] All architectural approaches documented?
- [ ] Complete lifecycle addressed (build → debug → test → optimize → ship)?
- [ ] Platform-specific considerations included?
- [ ] "When to use X vs Y" guidance provided?
- [ ] Common pitfalls documented?
- [ ] Current as of 2024-2026?
- [ ] Workflows actually execute tasks (not just reference knowledge)?
- [ ] Each workflow specifies which references to read?
**Specific gaps to check:**
- [ ] Testing strategy covered?
- [ ] Debugging/profiling tools listed?
- [ ] Deployment/distribution methods documented?
- [ ] Performance optimization addressed?
- [ ] Security considerations (if applicable)?
- [ ] Asset/resource management (if applicable)?
- [ ] Networking (if applicable)?
### Dual-Purpose Test
Test both use cases:
**Direct invocation:** "Can a user invoke this skill and build something?"
- Intake routes to appropriate workflow
- Workflow loads relevant references
- Workflow provides implementation steps
- Success criteria are clear
**Knowledge reference:** "Can create-plans load references to plan a project?"
- References contain decision guidance
- All options compared
- Complete lifecycle covered
- Architecture patterns documented
## Step 10: Create Directory and Files
```bash
# Create structure
mkdir -p ~/.claude/skills/expertise/{domain-name}
mkdir -p ~/.claude/skills/expertise/{domain-name}/workflows
mkdir -p ~/.claude/skills/expertise/{domain-name}/references
# Write SKILL.md
# Write all workflow files
# Write all reference files
# Verify structure
ls -R ~/.claude/skills/expertise/{domain-name}
```
## Step 11: Document in create-plans
Update `~/.claude/skills/create-plans/SKILL.md` to reference this new domain:
Add to the domain inference table:
```markdown
| "{keyword}", "{domain term}" | expertise/{domain-name} |
```
So create-plans can auto-detect and offer to load it.
## Step 12: Final Quality Check
Review entire skill:
**SKILL.md:**
- [ ] Name matches directory (build-{domain-name})
- [ ] Description explains it builds things from scratch through shipping
- [ ] Essential principles inline (always loaded)
- [ ] Intake asks what user wants to do
- [ ] Routing maps to workflows
- [ ] Reference index complete and organized
- [ ] Workflows index complete
**Workflows:**
- [ ] Each workflow starts with required_reading
- [ ] Each workflow has actual implementation steps
- [ ] Each workflow has verification steps
- [ ] Each workflow has success criteria
- [ ] Workflows cover full lifecycle (build, debug, test, optimize, ship)
**References:**
- [ ] Pure XML structure (no markdown headings)
- [ ] Decision guidance in every file
- [ ] Current versions verified
- [ ] Code examples work
- [ ] Anti-patterns documented
- [ ] Platform considerations included
**Completeness:**
- [ ] A professional practitioner would find this comprehensive
- [ ] No major libraries/patterns missing
- [ ] Full lifecycle covered
- [ ] Passes the "build from scratch through shipping" test
- [ ] Can be invoked directly by users
- [ ] Can be loaded by create-plans for knowledge
</process>
<success_criteria>
Domain expertise skill is complete when:
- [ ] Comprehensive research completed (5+ web searches)
- [ ] All sources verified for recency (2024-2026)
- [ ] Knowledge organized by domain areas (not arbitrary)
- [ ] Essential principles in SKILL.md (always loaded)
- [ ] Intake routes to appropriate workflows
- [ ] Each workflow has required_reading + implementation steps + verification
- [ ] Each reference has decision trees and comparisons
- [ ] Anti-patterns documented throughout
- [ ] Full lifecycle covered (build → debug → test → optimize → ship)
- [ ] Platform-specific considerations included
- [ ] Located in ~/.claude/skills/expertise/{domain-name}/
- [ ] Referenced in create-plans domain inference table
- [ ] Passes dual-purpose test: Can be invoked directly AND loaded for knowledge
- [ ] User can build something professional from scratch through shipping
</success_criteria>
<anti_patterns>
**DON'T:**
- Copy tutorial content without verification
- Include only "getting started" material
- Skip the "when NOT to use" guidance
- Forget to check if libraries are still maintained
- Organize by document type instead of domain concerns
- Make it knowledge-only with no execution workflows
- Skip verification steps in workflows
- Include outdated content from old blog posts
- Skip decision trees and comparisons
- Create workflows that just say "read the references"
**DO:**
- Verify everything is current
- Include complete lifecycle (build → ship)
- Provide decision guidance
- Document anti-patterns
- Make workflows execute real tasks
- Start workflows with required_reading
- Include verification in every workflow
- Make it exhaustive, not minimal
- Test both direct invocation and knowledge reference use cases
</anti_patterns>

View File

@@ -1,197 +0,0 @@
# Workflow: Create a New Skill
## Interaction Method
If `AskUserQuestion` is available, use it for all prompts below.
If not, present each question as a numbered list and wait for a reply before proceeding to the next step. For multiSelect questions, accept comma-separated numbers (e.g. `1, 3`). Never skip or auto-configure.
<required_reading>
**Read these reference files NOW:**
1. references/recommended-structure.md
2. references/skill-structure.md
3. references/core-principles.md
4. references/use-xml-tags.md
</required_reading>
<process>
## Step 1: Adaptive Requirements Gathering
**If user provided context** (e.g., "build a skill for X"):
→ Analyze what's stated, what can be inferred, what's unclear
→ Skip to asking about genuine gaps only
**If user just invoked skill without context:**
→ Ask what they want to build
### Using AskUserQuestion
Ask 2-4 domain-specific questions based on actual gaps. Each question should:
- Have specific options with descriptions
- Focus on scope, complexity, outputs, boundaries
- NOT ask things obvious from context
Example questions:
- "What specific operations should this skill handle?" (with options based on domain)
- "Should this also handle [related thing] or stay focused on [core thing]?"
- "What should the user see when successful?"
### Decision Gate
After initial questions, ask:
"Ready to proceed with building, or would you like me to ask more questions?"
Options:
1. **Proceed to building** - I have enough context
2. **Ask more questions** - There are more details to clarify
3. **Let me add details** - I want to provide additional context
## Step 2: Research Trigger (If External API)
**When external service detected**, ask using AskUserQuestion:
"This involves [service name] API. Would you like me to research current endpoints and patterns before building?"
Options:
1. **Yes, research first** - Fetch current documentation for accurate implementation
2. **No, proceed with general patterns** - Use common patterns without specific API research
If research requested:
- Use Context7 MCP to fetch current library documentation
- Or use WebSearch for recent API documentation
- Focus on 2024-2026 sources
- Store findings for use in content generation
## Step 3: Decide Structure
**Simple skill (single workflow, <200 lines):**
→ Single SKILL.md file with all content
**Complex skill (multiple workflows OR domain knowledge):**
→ Router pattern:
```
skill-name/
├── SKILL.md (router + principles)
├── workflows/ (procedures - FOLLOW)
├── references/ (knowledge - READ)
├── templates/ (output structures - COPY + FILL)
└── scripts/ (reusable code - EXECUTE)
```
Factors favoring router pattern:
- Multiple distinct user intents (create vs debug vs ship)
- Shared domain knowledge across workflows
- Essential principles that must not be skipped
- Skill likely to grow over time
**Consider templates/ when:**
- Skill produces consistent output structures (plans, specs, reports)
- Structure matters more than creative generation
**Consider scripts/ when:**
- Same code runs across invocations (deploy, setup, API calls)
- Operations are error-prone when rewritten each time
See references/recommended-structure.md for templates.
## Step 4: Create Directory
```bash
mkdir -p ~/.claude/skills/{skill-name}
# If complex:
mkdir -p ~/.claude/skills/{skill-name}/workflows
mkdir -p ~/.claude/skills/{skill-name}/references
# If needed:
mkdir -p ~/.claude/skills/{skill-name}/templates # for output structures
mkdir -p ~/.claude/skills/{skill-name}/scripts # for reusable code
```
## Step 5: Write SKILL.md
**Simple skill:** Write complete skill file with:
- YAML frontmatter (name, description)
- `<objective>`
- `<quick_start>`
- Content sections with pure XML
- `<success_criteria>`
**Complex skill:** Write router with:
- YAML frontmatter
- `<essential_principles>` (inline, unavoidable)
- `<intake>` (question to ask user)
- `<routing>` (maps answers to workflows)
- `<reference_index>` and `<workflows_index>`
## Step 6: Write Workflows (if complex)
For each workflow:
```xml
<required_reading>
Which references to load for this workflow
</required_reading>
<process>
Step-by-step procedure
</process>
<success_criteria>
How to know this workflow is done
</success_criteria>
```
## Step 7: Write References (if needed)
Domain knowledge that:
- Multiple workflows might need
- Doesn't change based on workflow
- Contains patterns, examples, technical details
## Step 8: Validate Structure
Check:
- [ ] YAML frontmatter valid
- [ ] Name matches directory (lowercase-with-hyphens)
- [ ] Description says what it does AND when to use it (third person)
- [ ] No markdown headings (#) in body - use XML tags
- [ ] Required tags present: objective, quick_start, success_criteria
- [ ] All referenced files exist
- [ ] SKILL.md under 500 lines
- [ ] XML tags properly closed
## Step 9: Create Slash Command
```bash
cat > ~/.claude/commands/{skill-name}.md << 'EOF'
---
description: {Brief description}
argument-hint: [{argument hint}]
allowed-tools: Skill({skill-name})
---
Invoke the {skill-name} skill for: $ARGUMENTS
EOF
```
## Step 10: Test
Invoke the skill and observe:
- Does it ask the right intake question?
- Does it load the right workflow?
- Does the workflow load the right references?
- Does output match expectations?
Iterate based on real usage, not assumptions.
</process>
<success_criteria>
Skill is complete when:
- [ ] Requirements gathered with appropriate questions
- [ ] API research done if external service involved
- [ ] Directory structure correct
- [ ] SKILL.md has valid frontmatter
- [ ] Essential principles inline (if complex skill)
- [ ] Intake question routes to correct workflow
- [ ] All workflows have required_reading + process + success_criteria
- [ ] References contain reusable domain knowledge
- [ ] Slash command exists and works
- [ ] Tested with real invocation
</success_criteria>

View File

@@ -1,121 +0,0 @@
# Workflow: Get Guidance on Skill Design
<required_reading>
**Read these reference files NOW:**
1. references/core-principles.md
2. references/recommended-structure.md
</required_reading>
<process>
## Step 1: Understand the Problem Space
Ask the user:
- What task or domain are you trying to support?
- Is this something you do repeatedly?
- What makes it complex enough to need a skill?
## Step 2: Determine If a Skill Is Right
**Create a skill when:**
- Task is repeated across multiple sessions
- Domain knowledge doesn't change frequently
- Complex enough to benefit from structure
- Would save significant time if automated
**Don't create a skill when:**
- One-off task (just do it directly)
- Changes constantly (will be outdated quickly)
- Too simple (overhead isn't worth it)
- Better as a slash command (user-triggered, no context needed)
Share this assessment with user.
## Step 3: Map the Workflows
Ask: "What are the different things someone might want to do with this skill?"
Common patterns:
- Create / Read / Update / Delete
- Build / Debug / Ship
- Setup / Use / Troubleshoot
- Import / Process / Export
Each distinct workflow = potential workflow file.
## Step 4: Identify Domain Knowledge
Ask: "What knowledge is needed regardless of which workflow?"
This becomes references:
- API patterns
- Best practices
- Common examples
- Configuration details
## Step 5: Draft the Structure
Based on answers, recommend structure:
**If 1 workflow, simple knowledge:**
```
skill-name/
└── SKILL.md (everything in one file)
```
**If 2+ workflows, shared knowledge:**
```
skill-name/
├── SKILL.md (router)
├── workflows/
│ ├── workflow-a.md
│ └── workflow-b.md
└── references/
└── shared-knowledge.md
```
## Step 6: Identify Essential Principles
Ask: "What rules should ALWAYS apply, no matter which workflow?"
These become `<essential_principles>` in SKILL.md.
Examples:
- "Always verify before reporting success"
- "Never store credentials in code"
- "Ask before making destructive changes"
## Step 7: Present Recommendation
Summarize:
- Recommended structure (simple vs router pattern)
- List of workflows
- List of references
- Essential principles
Ask: "Does this structure make sense? Ready to build it?"
If yes → offer to switch to "Create a new skill" workflow
If no → clarify and iterate
</process>
<decision_framework>
## Quick Decision Framework
| Situation | Recommendation |
|-----------|----------------|
| Single task, repeat often | Simple skill |
| Multiple related tasks | Router + workflows |
| Complex domain, many patterns | Router + workflows + references |
| User-triggered, fresh context | Slash command, not skill |
| One-off task | No skill needed |
</decision_framework>
<success_criteria>
Guidance is complete when:
- [ ] User understands if they need a skill
- [ ] Structure is recommended and explained
- [ ] Workflows are identified
- [ ] References are identified
- [ ] Essential principles are identified
- [ ] User is ready to build (or decided not to)
</success_criteria>

View File

@@ -1,161 +0,0 @@
# Workflow: Upgrade Skill to Router Pattern
<required_reading>
**Read these reference files NOW:**
1. references/recommended-structure.md
2. references/skill-structure.md
</required_reading>
<process>
## Step 1: Select the Skill
```bash
ls ~/.claude/skills/
```
Present numbered list, ask: "Which skill should be upgraded to the router pattern?"
## Step 2: Verify It Needs Upgrading
Read the skill:
```bash
cat ~/.claude/skills/{skill-name}/SKILL.md
ls ~/.claude/skills/{skill-name}/
```
**Already a router?** (has workflows/ and intake question)
→ Tell user it's already using router pattern, offer to add workflows instead
**Simple skill that should stay simple?** (under 200 lines, single workflow)
→ Explain that router pattern may be overkill, ask if they want to proceed anyway
**Good candidate for upgrade:**
- Over 200 lines
- Multiple distinct use cases
- Essential principles that shouldn't be skipped
- Growing complexity
## Step 3: Identify Components
Analyze the current skill and identify:
1. **Essential principles** - Rules that apply to ALL use cases
2. **Distinct workflows** - Different things a user might want to do
3. **Reusable knowledge** - Patterns, examples, technical details
Present findings:
```
## Analysis
**Essential principles I found:**
- [Principle 1]
- [Principle 2]
**Distinct workflows I identified:**
- [Workflow A]: [description]
- [Workflow B]: [description]
**Knowledge that could be references:**
- [Reference topic 1]
- [Reference topic 2]
```
Ask: "Does this breakdown look right? Any adjustments?"
## Step 4: Create Directory Structure
```bash
mkdir -p ~/.claude/skills/{skill-name}/workflows
mkdir -p ~/.claude/skills/{skill-name}/references
```
## Step 5: Extract Workflows
For each identified workflow:
1. Create `workflows/{workflow-name}.md`
2. Add required_reading section (references it needs)
3. Add process section (steps from original skill)
4. Add success_criteria section
## Step 6: Extract References
For each identified reference topic:
1. Create `references/{reference-name}.md`
2. Move relevant content from original skill
3. Structure with semantic XML tags
## Step 7: Rewrite SKILL.md as Router
Replace SKILL.md with router structure:
```markdown
---
name: {skill-name}
description: {existing description}
---
<essential_principles>
[Extracted principles - inline, cannot be skipped]
</essential_principles>
<intake>
**Ask the user:**
What would you like to do?
1. [Workflow A option]
2. [Workflow B option]
...
**Wait for response before proceeding.**
</intake>
<routing>
| Response | Workflow |
|----------|----------|
| 1, "keywords" | `workflows/workflow-a.md` |
| 2, "keywords" | `workflows/workflow-b.md` |
</routing>
<reference_index>
[List all references by category]
</reference_index>
<workflows_index>
| Workflow | Purpose |
|----------|---------|
| workflow-a.md | [What it does] |
| workflow-b.md | [What it does] |
</workflows_index>
```
## Step 8: Verify Nothing Was Lost
Compare original skill content against new structure:
- [ ] All principles preserved (now inline)
- [ ] All procedures preserved (now in workflows)
- [ ] All knowledge preserved (now in references)
- [ ] No orphaned content
## Step 9: Test
Invoke the upgraded skill:
- Does intake question appear?
- Does each routing option work?
- Do workflows load correct references?
- Does behavior match original skill?
Report any issues.
</process>
<success_criteria>
Upgrade is complete when:
- [ ] workflows/ directory created with workflow files
- [ ] references/ directory created (if needed)
- [ ] SKILL.md rewritten as router
- [ ] Essential principles inline in SKILL.md
- [ ] All original content preserved
- [ ] Intake question routes correctly
- [ ] Tested and working
</success_criteria>

View File

@@ -1,204 +0,0 @@
# Workflow: Verify Skill Content Accuracy
<required_reading>
**Read these reference files NOW:**
1. references/skill-structure.md
</required_reading>
<purpose>
Audit checks structure. **Verify checks truth.**
Skills contain claims about external things: APIs, CLI tools, frameworks, services. These change over time. This workflow checks if a skill's content is still accurate.
</purpose>
<process>
## Step 1: Select the Skill
```bash
ls ~/.claude/skills/
```
Present numbered list, ask: "Which skill should I verify for accuracy?"
## Step 2: Read and Categorize
Read the entire skill (SKILL.md + workflows/ + references/):
```bash
cat ~/.claude/skills/{skill-name}/SKILL.md
cat ~/.claude/skills/{skill-name}/workflows/*.md 2>/dev/null
cat ~/.claude/skills/{skill-name}/references/*.md 2>/dev/null
```
Categorize by primary dependency type:
| Type | Examples | Verification Method |
|------|----------|---------------------|
| **API/Service** | manage-stripe, manage-gohighlevel | Context7 + WebSearch |
| **CLI Tools** | build-macos-apps (xcodebuild, swift) | Run commands |
| **Framework** | build-iphone-apps (SwiftUI, UIKit) | Context7 for docs |
| **Integration** | setup-stripe-payments | WebFetch + Context7 |
| **Pure Process** | create-agent-skills | No external deps |
Report: "This skill is primarily [type]-based. I'll verify using [method]."
## Step 3: Extract Verifiable Claims
Scan skill content and extract:
**CLI Tools mentioned:**
- Tool names (xcodebuild, swift, npm, etc.)
- Specific flags/options documented
- Expected output patterns
**API Endpoints:**
- Service names (Stripe, Meta, etc.)
- Specific endpoints documented
- Authentication methods
- SDK versions
**Framework Patterns:**
- Framework names (SwiftUI, React, etc.)
- Specific APIs/patterns documented
- Version-specific features
**File Paths/Structures:**
- Expected project structures
- Config file locations
Present: "Found X verifiable claims to check."
## Step 4: Verify by Type
### For CLI Tools
```bash
# Check tool exists
which {tool-name}
# Check version
{tool-name} --version
# Verify documented flags work
{tool-name} --help | grep "{documented-flag}"
```
### For API/Service Skills
Use Context7 to fetch current documentation:
```
mcp__context7__resolve-library-id: {service-name}
mcp__context7__get-library-docs: {library-id}, topic: {relevant-topic}
```
Compare skill's documented patterns against current docs:
- Are endpoints still valid?
- Has authentication changed?
- Are there deprecated methods being used?
### For Framework Skills
Use Context7:
```
mcp__context7__resolve-library-id: {framework-name}
mcp__context7__get-library-docs: {library-id}, topic: {specific-api}
```
Check:
- Are documented APIs still current?
- Have patterns changed?
- Are there newer recommended approaches?
### For Integration Skills
WebSearch for recent changes:
```
"[service name] API changes 2026"
"[service name] breaking changes"
"[service name] deprecated endpoints"
```
Then Context7 for current SDK patterns.
### For Services with Status Pages
WebFetch official docs/changelog if available.
## Step 5: Generate Freshness Report
Present findings:
```
## Verification Report: {skill-name}
### ✅ Verified Current
- [Claim]: [Evidence it's still accurate]
### ⚠️ May Be Outdated
- [Claim]: [What changed / newer info found]
→ Current: [what docs now say]
### ❌ Broken / Invalid
- [Claim]: [Why it's wrong]
→ Fix: [What it should be]
### Could Not Verify
- [Claim]: [Why verification wasn't possible]
---
**Overall Status:** [Fresh / Needs Updates / Significantly Stale]
**Last Verified:** [Today's date]
```
## Step 6: Offer Updates
If issues found:
"Found [N] items that need updating. Would you like me to:"
1. **Update all** - Apply all corrections
2. **Review each** - Show each change before applying
3. **Just the report** - No changes
If updating:
- Make changes based on verified current information
- Add verification date comment if appropriate
- Report what was updated
## Step 7: Suggest Verification Schedule
Based on skill type, recommend:
| Skill Type | Recommended Frequency |
|------------|----------------------|
| API/Service | Every 1-2 months |
| Framework | Every 3-6 months |
| CLI Tools | Every 6 months |
| Pure Process | Annually |
"This skill should be re-verified in approximately [timeframe]."
</process>
<verification_shortcuts>
## Quick Verification Commands
**Check if CLI tool exists and get version:**
```bash
which {tool} && {tool} --version
```
**Context7 pattern for any library:**
```
1. resolve-library-id: "{library-name}"
2. get-library-docs: "{id}", topic: "{specific-feature}"
```
**WebSearch patterns:**
- Breaking changes: "{service} breaking changes 2026"
- Deprecations: "{service} deprecated API"
- Current best practices: "{framework} best practices 2026"
</verification_shortcuts>
<success_criteria>
Verification is complete when:
- [ ] Skill categorized by dependency type
- [ ] Verifiable claims extracted
- [ ] Each claim checked with appropriate method
- [ ] Freshness report generated
- [ ] Updates applied (if requested)
- [ ] User knows when to re-verify
</success_criteria>

View File

@@ -1,544 +1,409 @@
---
name: deepen-plan
description: Enhance a plan with parallel research agents for each section to add depth, best practices, and implementation details
description: "Stress-test an existing implementation plan and selectively strengthen weak sections with targeted research. Use when a plan needs more confidence around decisions, sequencing, system-wide impact, risks, or verification. Best for Standard or Deep plans, or high-risk topics such as auth, payments, migrations, external APIs, and security. For structural or clarity improvements, prefer document-review instead."
argument-hint: "[path to plan file]"
---
# Deepen Plan - Power Enhancement Mode
# Deepen Plan
## Introduction
**Note: The current year is 2026.** Use this when searching for recent documentation and best practices.
This command takes an existing plan (from `/ce:plan`) and enhances each section with parallel research agents. Each major element gets its own dedicated research sub-agent to find:
- Best practices and industry patterns
- Performance optimizations
- UI/UX improvements (if applicable)
- Quality enhancements and edge cases
- Real-world implementation examples
`ce:plan` does the first planning pass. `deepen-plan` is a second-pass confidence check.
The result is a deeply grounded, production-ready plan with concrete implementation details.
Use this skill when the plan already exists and the question is not "Is this document clear?" but rather "Is this plan grounded enough for the complexity and risk involved?"
This skill does **not** turn plans into implementation scripts. It identifies weak sections, runs targeted research only for those sections, and strengthens the plan in place.
`document-review` and `deepen-plan` are different:
- Use the `document-review` skill when the document needs clarity, simplification, completeness, or scope control
- Use `deepen-plan` when the document is structurally sound but still needs stronger rationale, sequencing, risk treatment, or system-wide thinking
## Interaction Method
Use the platform's question tool when available. When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
Ask one question at a time. Prefer a concise single-select choice when natural options exist.
## Plan File
<plan_path> #$ARGUMENTS </plan_path>
**If the plan path above is empty:**
1. Check for recent plans: `ls -la docs/plans/`
2. Ask the user: "Which plan would you like to deepen? Please provide the path (e.g., `docs/plans/2026-01-15-feat-my-feature-plan.md`)."
If the plan path above is empty:
1. Check `docs/plans/` for recent files
2. Ask the user which plan to deepen using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding
Do not proceed until you have a valid plan file path.
## Main Tasks
## Core Principles
1. **Stress-test, do not inflate** - Deepening should increase justified confidence, not make the plan longer for its own sake.
2. **Selective depth only** - Focus on the weakest 2-5 sections rather than enriching everything.
3. **Prefer the simplest execution mode** - Use direct agent synthesis by default. Switch to artifact-backed research only when the selected research scope is large enough that returning all findings inline would create avoidable context pressure.
4. **Preserve the planning boundary** - No implementation code, no git command choreography, no exact test command recipes.
5. **Use artifact-contained evidence** - Work from the written plan, its `Context & Research`, `Sources & References`, and its origin document when present.
6. **Respect product boundaries** - Do not invent new product requirements. If deepening reveals a product-level gap, surface it as an open question or route back to `ce:brainstorm`.
7. **Prioritize risk and cross-cutting impact** - The more dangerous or interconnected the work, the more valuable another planning pass becomes.
## Workflow
### Phase 0: Load the Plan and Decide Whether Deepening Is Warranted
#### 0.1 Read the Plan and Supporting Inputs
Read the plan file completely.
If the plan frontmatter includes an `origin:` path:
- Read the origin document too
- Use it to check whether the plan still reflects the product intent, scope boundaries, and success criteria
#### 0.2 Classify Plan Depth and Topic Risk
Determine the plan depth from the document:
- **Lightweight** - small, bounded, low ambiguity, usually 2-4 implementation units
- **Standard** - moderate complexity, some technical decisions, usually 3-6 units
- **Deep** - cross-cutting, high-risk, or strategically important work, usually 4-8 units or phased delivery
Also build a risk profile. Treat these as high-risk signals:
- Authentication, authorization, or security-sensitive behavior
- Payments, billing, or financial flows
- Data migrations, backfills, or persistent data changes
- External APIs or third-party integrations
- Privacy, compliance, or user data handling
- Cross-interface parity or multi-surface behavior
- Significant rollout, monitoring, or operational concerns
#### 0.3 Decide Whether to Deepen
Use this default:
- **Lightweight** plans usually do not need deepening unless they are high-risk or the user explicitly requests it
- **Standard** plans often benefit when one or more important sections still look thin
- **Deep** or high-risk plans often benefit from a targeted second pass
If the plan already appears sufficiently grounded:
- Say so briefly
- Recommend moving to `/ce:work` or the `document-review` skill
- If the user explicitly asked to deepen anyway, continue with a light pass and deepen at most 1-2 sections
### Phase 1: Parse the Current `ce:plan` Structure
Map the plan into the current template. Look for these sections, or their nearest equivalents:
- `Overview`
- `Problem Frame`
- `Requirements Trace`
- `Scope Boundaries`
- `Context & Research`
- `Key Technical Decisions`
- `Open Questions`
- `High-Level Technical Design` (optional overview — pseudo-code, DSL grammar, mermaid diagram, or data flow)
- `Implementation Units` (may include per-unit `Technical design` subsections)
- `System-Wide Impact`
- `Risks & Dependencies`
- `Documentation / Operational Notes`
- `Sources & References`
- Optional deep-plan sections such as `Alternative Approaches Considered`, `Success Metrics`, `Phased Delivery`, `Risk Analysis & Mitigation`, and `Operational / Rollout Notes`
If the plan was written manually or uses different headings:
- Map sections by intent rather than exact heading names
- If a section is structurally present but titled differently, treat it as the equivalent section
- If the plan truly lacks a section, decide whether that absence is intentional for the plan depth or a confidence gap worth scoring
Also collect:
- Frontmatter, including existing `deepened:` date if present
- Number of implementation units
- Which files and test files are named
- Which learnings, patterns, or external references are cited
- Which sections appear omitted because they were unnecessary versus omitted because they are missing
### Phase 2: Score Confidence Gaps
Use a checklist-first, risk-weighted scoring pass.
For each section, compute:
- **Trigger count** - number of checklist problems that apply
- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk
- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans
Treat a section as a candidate if:
- it hits **2+ total points**, or
- it hits **1+ point** in a high-risk domain and the section is materially important
Choose only the top **2-5** sections by score. If the user explicitly asked to deepen a lightweight plan, cap at **1-2** sections unless the topic is high-risk.
Example:
- A `Key Technical Decisions` section with 1 checklist trigger and the critical-section bonus scores **2 points** and is a candidate
- A `Risks & Dependencies` section with 1 checklist trigger in a high-risk migration plan also becomes a candidate because the risk bonus applies
If the plan already has a `deepened:` date:
- Prefer sections that have not yet been substantially strengthened, if their scores are comparable
- Revisit an already-deepened section only when it still scores clearly higher than alternatives or the user explicitly asks for another pass on it
#### 2.1 Section Checklists
Use these triggers.
**Requirements Trace**
- Requirements are vague or disconnected from implementation units
- Success criteria are missing or not reflected downstream
- Units do not clearly advance the traced requirements
- Origin requirements are not clearly carried forward
**Context & Research / Sources & References**
- Relevant repo patterns are named but never used in decisions or implementation units
- Cited learnings or references do not materially shape the plan
- High-risk work lacks appropriate external or internal grounding
- Research is generic instead of tied to this repo or this plan
**Key Technical Decisions**
- A decision is stated without rationale
- Rationale does not explain tradeoffs or rejected alternatives
- The decision does not connect back to scope, requirements, or origin context
- An obvious design fork exists but the plan never addresses why one path won
**Open Questions**
- Product blockers are hidden as assumptions
- Planning-owned questions are incorrectly deferred to implementation
- Resolved questions have no clear basis in repo context, research, or origin decisions
- Deferred items are too vague to be useful later
**High-Level Technical Design (when present)**
- The sketch uses the wrong medium for the work (e.g., pseudo-code where a sequence diagram would communicate better)
- The sketch contains implementation code (imports, exact signatures, framework-specific syntax) rather than pseudo-code
- The non-prescriptive framing is missing or weak
- The sketch does not connect to the key technical decisions or implementation units
**High-Level Technical Design (when absent)** *(Standard or Deep plans only)*
- The work involves DSL design, API surface design, multi-component integration, complex data flow, or state-heavy lifecycle
- Key technical decisions would be easier to validate with a visual or pseudo-code representation
- The approach section of implementation units is thin and a higher-level technical design would provide context
**Implementation Units**
- Dependency order is unclear or likely wrong
- File paths or test file paths are missing where they should be explicit
- Units are too large, too vague, or broken into micro-steps
- Approach notes are thin or do not name the pattern to follow
- Test scenarios or verification outcomes are vague
**System-Wide Impact**
- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing
- Failure propagation is underexplored
- State lifecycle, caching, or data integrity risks are absent where relevant
- Integration coverage is weak for cross-layer work
**Risks & Dependencies / Documentation / Operational Notes**
- Risks are listed without mitigation
- Rollout, monitoring, migration, or support implications are missing when warranted
- External dependency assumptions are weak or unstated
- Security, privacy, performance, or data risks are absent where they obviously apply
Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap.
### 1. Parse and Analyze Plan Structure
### Phase 3: Select Targeted Research Agents
<thinking>
First, read and parse the plan to identify each major section that can be enhanced with research.
</thinking>
For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**.
**Read the plan file and extract:**
- [ ] Overview/Problem Statement
- [ ] Proposed Solution sections
- [ ] Technical Approach/Architecture
- [ ] Implementation phases/steps
- [ ] Code examples and file references
- [ ] Acceptance criteria
- [ ] Any UI/UX components mentioned
- [ ] Technologies/frameworks mentioned (Rails, React, Python, TypeScript, etc.)
- [ ] Domain areas (data models, APIs, UI, security, performance, etc.)
Use fully-qualified agent names inside Task calls.
**Create a section manifest:**
```
Section 1: [Title] - [Brief description of what to research]
Section 2: [Title] - [Brief description of what to research]
...
```
#### 3.1 Deterministic Section-to-Agent Mapping
### 2. Discover and Apply Available Skills
**Requirements Trace / Open Questions classification**
- `compound-engineering:workflow:spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps
- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for repo-grounded patterns, conventions, and implementation reality checks
<thinking>
Dynamically discover all available skills and match them to plan sections. Don't assume what skills exist - discover them at runtime.
</thinking>
**Context & Research / Sources & References gaps**
- `compound-engineering:research:learnings-researcher` for institutional knowledge and past solved problems
- `compound-engineering:research:framework-docs-researcher` for official framework or library behavior
- `compound-engineering:research:best-practices-researcher` for current external patterns and industry guidance
- Add `compound-engineering:research:git-history-analyzer` only when historical rationale or prior art is materially missing
**Step 1: Discover ALL available skills from ALL sources**
**Key Technical Decisions**
- `compound-engineering:review:architecture-strategist` for design integrity, boundaries, and architectural tradeoffs
- Add `compound-engineering:research:framework-docs-researcher` or `compound-engineering:research:best-practices-researcher` when the decision needs external grounding beyond repo evidence
**High-Level Technical Design**
- `compound-engineering:review:architecture-strategist` for validating that the technical design accurately represents the intended approach and identifying gaps
- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for grounding the technical design in existing repo patterns and conventions
- Add `compound-engineering:research:best-practices-researcher` when the technical design involves a DSL, API surface, or pattern that benefits from external validation
```bash
# 1. Project-local skills (highest priority - project-specific)
ls .claude/skills/
**Implementation Units / Verification**
- `compound-engineering:research:repo-research-analyst` (Scope: `patterns`) for concrete file targets, patterns to follow, and repo-specific sequencing clues
- `compound-engineering:review:pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns
- Add `compound-engineering:workflow:spec-flow-analyzer` when sequencing depends on user flow or handoff completeness
# 2. User's global skills (~/.claude/)
ls ~/.claude/skills/
**System-Wide Impact**
- `compound-engineering:review:architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact
- Add the specific specialist that matches the risk:
- `compound-engineering:review:performance-oracle` for scalability, latency, throughput, and resource-risk analysis
- `compound-engineering:review:security-sentinel` for auth, validation, exploit surfaces, and security boundary review
- `compound-engineering:review:data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks
# 3. compound-engineering plugin skills
ls ~/.claude/plugins/cache/*/compound-engineering/*/skills/
**Risks & Dependencies / Operational Notes**
- Use the specialist that matches the actual risk:
- `compound-engineering:review:security-sentinel` for security, auth, privacy, and exploit risk
- `compound-engineering:review:data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries
- `compound-engineering:review:data-migration-expert` for migration realism, backfills, and production data transformation risk
- `compound-engineering:review:deployment-verification-agent` for rollout checklists, rollback planning, and launch verification
- `compound-engineering:review:performance-oracle` for capacity, latency, and scaling concerns
#### 3.2 Agent Prompt Shape
For each selected section, pass:
- The scope prefix from section 3.1 (e.g., `Scope: architecture, patterns.`) when the agent supports scoped invocation
- A short plan summary
- The exact section text
- Why the section was selected, including which checklist triggers fired
- The plan depth and risk profile
- A specific question to answer
Instruct the agent to return:
- findings that change planning quality
- stronger rationale, sequencing, verification, risk treatment, or references
- no implementation code
- no shell commands
#### 3.3 Choose Research Execution Mode
Use the lightest mode that will work:
- **Direct mode** - Default. Use when the selected section set is small and the parent can safely read the agent outputs inline.
- **Artifact-backed mode** - Use only when the selected research scope is large enough that inline returns would create unnecessary context pressure.
Signals that justify artifact-backed mode:
- More than 5 agents are likely to return meaningful findings
- The selected section excerpts are long enough that repeating them in multiple agent outputs would be wasteful
- The topic is high-risk and likely to attract bulky source-backed analysis
- The platform has a history of parent-context instability on large parallel returns
If artifact-backed mode is not clearly warranted, stay in direct mode.
### Phase 4: Run Targeted Research and Review
Launch the selected agents in parallel using the execution mode chosen in Step 3.3. If the current platform does not support parallel dispatch, run them sequentially instead.
Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources.
If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents.
# 4. ALL other installed plugins - check every plugin for skills
find ~/.claude/plugins/cache -type d -name "skills" 2>/dev/null
#### 4.1 Direct Mode
Have each selected agent return its findings directly to the parent.
# 5. Also check installed_plugins.json for all plugin locations
cat ~/.claude/plugins/installed_plugins.json
```
Keep the return payload focused:
- strongest findings only
- the evidence or sources that matter
- the concrete planning improvement implied by the finding
**Important:** Check EVERY source. Don't assume compound-engineering is the only plugin. Use skills from ANY installed plugin that's relevant.
If a direct-mode agent starts producing bulky or repetitive output, stop and switch the remaining research to artifact-backed mode instead of letting the parent context bloat.
**Step 2: For each discovered skill, read its SKILL.md to understand what it does**
#### 4.2 Artifact-Backed Mode
```bash
# For each skill directory found, read its documentation
cat [skill-path]/SKILL.md
```
Use a per-run scratch directory under `.context/compound-engineering/deepen-plan/`, for example `.context/compound-engineering/deepen-plan/<run-id>/` or `.context/compound-engineering/deepen-plan/<plan-filename-stem>/`.
Use the scratch directory only for the current deepening pass.
For each selected agent:
- give it the same plan summary, section text, trigger rationale, depth, and risk profile described in Step 3.2
- instruct it to write one compact artifact file for its assigned section or sections
- have it return only a short completion summary to the parent
Prefer a compact markdown artifact unless machine-readable structure is clearly useful. Each artifact should contain:
- target section id and title
- why the section was selected
- 3-7 findings that materially improve planning quality
- source-backed rationale, including whether the evidence came from repo context, origin context, institutional learnings, official docs, or external best practices
- the specific plan change implied by each finding
- any unresolved tradeoff that should remain explicit in the plan
Artifact rules:
- no implementation code
- no shell commands
- no checkpoint logs or self-diagnostics
- no duplicated boilerplate across files
- no judge or merge sub-pipeline
Before synthesis:
- quickly verify that each selected section has at least one usable artifact
- if an artifact is missing or clearly malformed, re-run that agent or fall back to direct-mode reasoning for that section instead of building a validation pipeline
If agent outputs conflict:
- Prefer repo-grounded and origin-grounded evidence over generic advice
- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior
- If a real tradeoff remains, record it explicitly in the plan rather than pretending the conflict does not exist
### Phase 5: Synthesize and Rewrite the Plan
Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure.
If artifact-backed mode was used:
- read the plan, origin document if present, and the selected section artifacts
- also incorporate any findings already returned inline from direct-mode agents before a mid-run switch, so early results are not silently dropped
- synthesize in one pass
- do not create a separate judge, merge, or quality-review phase unless the user explicitly asks for another pass
**Step 3: Match skills to plan content**
Allowed changes:
- Clarify or strengthen decision rationale
- Tighten requirements trace or origin fidelity
- Reorder or split implementation units when sequencing is weak
- Add missing pattern references, file/test paths, or verification outcomes
- Expand system-wide impact, risks, or rollout treatment where justified
- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change
- Strengthen, replace, or add a High-Level Technical Design section when the work warrants it and the current representation is weak, uses the wrong medium, or is absent where it would help. Preserve the non-prescriptive framing
- Strengthen or add per-unit technical design fields where the unit's approach is non-obvious and the current approach notes are thin
- Add an optional deep-plan section only when it materially improves execution quality
- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved
For each skill discovered:
- Read its SKILL.md description
- Check if any plan sections match the skill's domain
- If there's a match, spawn a sub-agent to apply that skill's knowledge
Do **not**:
- Add implementation code — no imports, exact method signatures, or framework-specific syntax. Pseudo-code sketches and DSL grammars are allowed in both the top-level High-Level Technical Design section and per-unit technical design fields
- Add git commands, commit choreography, or exact test command recipes
- Add generic `Research Insights` subsections everywhere
- Rewrite the entire plan from scratch
- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly
**Step 4: Spawn a sub-agent for EVERY matched skill**
If research reveals a product-level ambiguity that should change behavior or scope:
- Do not silently decide it here
- Record it under `Open Questions`
- Recommend `ce:brainstorm` if the gap is truly product-defining
**CRITICAL: For EACH skill that matches, spawn a separate sub-agent and instruct it to USE that skill.**
### Phase 6: Final Checks and Write the File
For each matched skill:
```
Task general-purpose: "You have the [skill-name] skill available at [skill-path].
Before writing:
- Confirm the plan is stronger in specific ways, not merely longer
- Confirm the planning boundary is intact
- Confirm the selected sections were actually the weakest ones
- Confirm origin decisions were preserved when an origin document exists
- Confirm the final plan still feels right-sized for its depth
- If artifact-backed mode was used, confirm the scratch artifacts did not become a second hidden plan format
YOUR JOB: Use this skill on the plan.
Update the plan file in place by default.
1. Read the skill: cat [skill-path]/SKILL.md
2. Follow the skill's instructions exactly
3. Apply the skill to this content:
[relevant plan section or full plan]
4. Return the skill's full output
The skill tells you what to do - follow it. Execute the skill completely."
```
**Spawn ALL skill sub-agents in PARALLEL:**
- 1 sub-agent per matched skill
- Each sub-agent reads and uses its assigned skill
- All run simultaneously
- 10, 20, 30 skill sub-agents is fine
**Each sub-agent:**
1. Reads its skill's SKILL.md
2. Follows the skill's workflow/instructions
3. Applies the skill to the plan
4. Returns whatever the skill produces (code, recommendations, patterns, reviews, etc.)
**Example spawns:**
```
Task general-purpose: "Use the dhh-rails-style skill at ~/.claude/plugins/.../dhh-rails-style. Read SKILL.md and apply it to: [Rails sections of plan]"
Task general-purpose: "Use the frontend-design skill at ~/.claude/plugins/.../frontend-design. Read SKILL.md and apply it to: [UI sections of plan]"
Task general-purpose: "Use the agent-native-architecture skill at ~/.claude/plugins/.../agent-native-architecture. Read SKILL.md and apply it to: [agent/tool sections of plan]"
Task general-purpose: "Use the security-patterns skill at ~/.claude/skills/security-patterns. Read SKILL.md and apply it to: [full plan]"
```
**No limit on skill sub-agents. Spawn one for every skill that could possibly be relevant.**
### 3. Discover and Apply Learnings/Solutions
<thinking>
Check for documented learnings from /ce:compound. These are solved problems stored as markdown files. Spawn a sub-agent for each learning to check if it's relevant.
</thinking>
**LEARNINGS LOCATION - Check these exact folders:**
```
docs/solutions/ <-- PRIMARY: Project-level learnings (created by /ce:compound)
├── performance-issues/
│ └── *.md
├── debugging-patterns/
│ └── *.md
├── configuration-fixes/
│ └── *.md
├── integration-issues/
│ └── *.md
├── deployment-issues/
│ └── *.md
└── [other-categories]/
└── *.md
```
**Step 1: Find ALL learning markdown files**
Run these commands to get every learning file:
```bash
# PRIMARY LOCATION - Project learnings
find docs/solutions -name "*.md" -type f 2>/dev/null
# If docs/solutions doesn't exist, check alternate locations:
find .claude/docs -name "*.md" -type f 2>/dev/null
find ~/.claude/docs -name "*.md" -type f 2>/dev/null
```
**Step 2: Read frontmatter of each learning to filter**
Each learning file has YAML frontmatter with metadata. Read the first ~20 lines of each file to get:
```yaml
---
title: "N+1 Query Fix for Briefs"
category: performance-issues
tags: [activerecord, n-plus-one, includes, eager-loading]
module: Briefs
symptom: "Slow page load, multiple queries in logs"
root_cause: "Missing includes on association"
---
```
**For each .md file, quickly scan its frontmatter:**
```bash
# Read first 20 lines of each learning (frontmatter + summary)
head -20 docs/solutions/**/*.md
```
**Step 3: Filter - only spawn sub-agents for LIKELY relevant learnings**
Compare each learning's frontmatter against the plan:
- `tags:` - Do any tags match technologies/patterns in the plan?
- `category:` - Is this category relevant? (e.g., skip deployment-issues if plan is UI-only)
- `module:` - Does the plan touch this module?
- `symptom:` / `root_cause:` - Could this problem occur with the plan?
**SKIP learnings that are clearly not applicable:**
- Plan is frontend-only → skip `database-migrations/` learnings
- Plan is Python → skip `rails-specific/` learnings
- Plan has no auth → skip `authentication-issues/` learnings
**SPAWN sub-agents for learnings that MIGHT apply:**
- Any tag overlap with plan technologies
- Same category as plan domain
- Similar patterns or concerns
**Step 4: Spawn sub-agents for filtered learnings**
For each learning that passes the filter:
```
Task general-purpose: "
LEARNING FILE: [full path to .md file]
1. Read this learning file completely
2. This learning documents a previously solved problem
Check if this learning applies to this plan:
---
[full plan content]
---
If relevant:
- Explain specifically how it applies
- Quote the key insight or solution
- Suggest where/how to incorporate it
If NOT relevant after deeper analysis:
- Say 'Not applicable: [reason]'
"
```
**Example filtering:**
```
# Found 15 learning files, plan is about "Rails API caching"
# SPAWN (likely relevant):
docs/solutions/performance-issues/n-plus-one-queries.md # tags: [activerecord] ✓
docs/solutions/performance-issues/redis-cache-stampede.md # tags: [caching, redis] ✓
docs/solutions/configuration-fixes/redis-connection-pool.md # tags: [redis] ✓
# SKIP (clearly not applicable):
docs/solutions/deployment-issues/heroku-memory-quota.md # not about caching
docs/solutions/frontend-issues/stimulus-race-condition.md # plan is API, not frontend
docs/solutions/authentication-issues/jwt-expiry.md # plan has no auth
```
**Spawn sub-agents in PARALLEL for all filtered learnings.**
**These learnings are institutional knowledge - applying them prevents repeating past mistakes.**
### 4. Launch Per-Section Research Agents
<thinking>
For each major section in the plan, spawn dedicated sub-agents to research improvements. Use the Explore agent type for open-ended research.
</thinking>
**For each identified section, launch parallel research:**
```
Task Explore: "Research best practices, patterns, and real-world examples for: [section topic].
Find:
- Industry standards and conventions
- Performance considerations
- Common pitfalls and how to avoid them
- Documentation and tutorials
Return concrete, actionable recommendations."
```
**Also use Context7 MCP for framework documentation:**
For any technologies/frameworks mentioned in the plan, query Context7:
```
mcp__plugin_compound-engineering_context7__resolve-library-id: Find library ID for [framework]
mcp__plugin_compound-engineering_context7__query-docs: Query documentation for specific patterns
```
**Use WebSearch for current best practices:**
Search for recent (2024-2026) articles, blog posts, and documentation on topics in the plan.
### 5. Discover and Run ALL Review Agents
<thinking>
Dynamically discover every available agent and run them ALL against the plan. Don't filter, don't skip, don't assume relevance. 40+ parallel agents is fine. Use everything available.
</thinking>
**Step 1: Discover ALL available agents from ALL sources**
```bash
# 1. Project-local agents (highest priority - project-specific)
find .claude/agents -name "*.md" 2>/dev/null
# 2. User's global agents (~/.claude/)
find ~/.claude/agents -name "*.md" 2>/dev/null
# 3. compound-engineering plugin agents (all subdirectories)
find ~/.claude/plugins/cache/*/compound-engineering/*/agents -name "*.md" 2>/dev/null
# 4. ALL other installed plugins - check every plugin for agents
find ~/.claude/plugins/cache -path "*/agents/*.md" 2>/dev/null
# 5. Check installed_plugins.json to find all plugin locations
cat ~/.claude/plugins/installed_plugins.json
# 6. For local plugins (isLocal: true), check their source directories
# Parse installed_plugins.json and find local plugin paths
```
**Important:** Check EVERY source. Include agents from:
- Project `.claude/agents/`
- User's `~/.claude/agents/`
- compound-engineering plugin (but SKIP workflow/ agents - only use review/, research/, design/, docs/)
- ALL other installed plugins (agent-sdk-dev, frontend-design, etc.)
- Any local plugins
**For compound-engineering plugin specifically:**
- USE: `agents/review/*` (all reviewers)
- USE: `agents/research/*` (all researchers)
- USE: `agents/design/*` (design agents)
- USE: `agents/docs/*` (documentation agents)
- SKIP: `agents/workflow/*` (these are workflow orchestrators, not reviewers)
**Step 2: For each discovered agent, read its description**
Read the first few lines of each agent file to understand what it reviews/analyzes.
**Step 3: Launch ALL agents in parallel**
For EVERY agent discovered, launch a Task in parallel:
```
Task [agent-name]: "Review this plan using your expertise. Apply all your checks and patterns. Plan content: [full plan content]"
```
**CRITICAL RULES:**
- Do NOT filter agents by "relevance" - run them ALL
- Do NOT skip agents because they "might not apply" - let them decide
- Launch ALL agents in a SINGLE message with multiple Task tool calls
- 20, 30, 40 parallel agents is fine - use everything
- Each agent may catch something others miss
- The goal is MAXIMUM coverage, not efficiency
**Step 4: Also discover and run research agents**
Research agents (like `best-practices-researcher`, `framework-docs-researcher`, `git-history-analyzer`, `repo-research-analyst`) should also be run for relevant plan sections.
### 6. Wait for ALL Agents and Synthesize Everything
<thinking>
Wait for ALL parallel agents to complete - skills, research agents, review agents, everything. Then synthesize all findings into a comprehensive enhancement.
</thinking>
**Collect outputs from ALL sources:**
1. **Skill-based sub-agents** - Each skill's full output (code examples, patterns, recommendations)
2. **Learnings/Solutions sub-agents** - Relevant documented learnings from /ce:compound
3. **Research agents** - Best practices, documentation, real-world examples
4. **Review agents** - All feedback from every reviewer (architecture, security, performance, simplicity, etc.)
5. **Context7 queries** - Framework documentation and patterns
6. **Web searches** - Current best practices and articles
**For each agent's findings, extract:**
- [ ] Concrete recommendations (actionable items)
- [ ] Code patterns and examples (copy-paste ready)
- [ ] Anti-patterns to avoid (warnings)
- [ ] Performance considerations (metrics, benchmarks)
- [ ] Security considerations (vulnerabilities, mitigations)
- [ ] Edge cases discovered (handling strategies)
- [ ] Documentation links (references)
- [ ] Skill-specific patterns (from matched skills)
- [ ] Relevant learnings (past solutions that apply - prevent repeating mistakes)
**Deduplicate and prioritize:**
- Merge similar recommendations from multiple agents
- Prioritize by impact (high-value improvements first)
- Flag conflicting advice for human review
- Group by plan section
### 7. Enhance Plan Sections
<thinking>
Merge research findings back into the plan, adding depth without changing the original structure.
</thinking>
**Enhancement format for each section:**
```markdown
## [Original Section Title]
[Original content preserved]
### Research Insights
**Best Practices:**
- [Concrete recommendation 1]
- [Concrete recommendation 2]
**Performance Considerations:**
- [Optimization opportunity]
- [Benchmark or metric to target]
**Implementation Details:**
```[language]
// Concrete code example from research
```
**Edge Cases:**
- [Edge case 1 and how to handle]
- [Edge case 2 and how to handle]
**References:**
- [Documentation URL 1]
- [Documentation URL 2]
```
### 8. Add Enhancement Summary
At the top of the plan, add a summary section:
```markdown
## Enhancement Summary
**Deepened on:** [Date]
**Sections enhanced:** [Count]
**Research agents used:** [List]
### Key Improvements
1. [Major improvement 1]
2. [Major improvement 2]
3. [Major improvement 3]
### New Considerations Discovered
- [Important finding 1]
- [Important finding 2]
```
### 9. Update Plan File
**Write the enhanced plan:**
- Preserve original filename
- Add `-deepened` suffix if user prefers a new file
- Update any timestamps or metadata
## Output Format
Update the plan file in place (or if user requests a separate file, append `-deepened` after `-plan`, e.g., `2026-01-15-feat-auth-plan-deepened.md`).
## Quality Checks
Before finalizing:
- [ ] All original content preserved
- [ ] Research insights clearly marked and attributed
- [ ] Code examples are syntactically correct
- [ ] Links are valid and relevant
- [ ] No contradictions between sections
- [ ] Enhancement summary accurately reflects changes
If the user explicitly requests a separate file, append `-deepened` before `.md`, for example:
- `docs/plans/2026-03-15-001-feat-example-plan-deepened.md`
If artifact-backed mode was used and the user did not ask to inspect the scratch files:
- clean up the temporary scratch directory after the plan is safely written
- if cleanup is not practical on the current platform, say where the artifacts were left and that they are temporary workflow output
## Post-Enhancement Options
After writing the enhanced plan, use the **AskUserQuestion tool** to present these options:
If substantive changes were made, present next steps using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
**Question:** "Plan deepened at `[plan_path]`. What would you like to do next?"
**Options:**
1. **View diff** - Show what was added/changed
2. **Start `/ce:work`** - Begin implementing this enhanced plan
3. **Deepen further** - Run another round of research on specific sections
4. **Revert** - Restore original plan (if backup exists)
1. **View diff** - Show what changed
2. **Run `document-review` skill** - Improve the updated plan through structured document review
3. **Start `ce:work` skill** - Begin implementing the plan
4. **Deepen specific sections further** - Run another targeted deepening pass on named sections
Based on selection:
- **View diff** → Run `git diff [plan_path]` or show before/after
- **`/ce:work`** → Call the /ce:work command with the plan file path
- **Deepen further** → Ask which sections need more research, then re-run those agents
- **Revert** → Restore from git or backup
- **View diff** -> Show the important additions and changed sections
- **`document-review` skill** -> Load the `document-review` skill with the plan path
- **Start `ce:work` skill** -> Call the `ce:work` skill with the plan path
- **Deepen specific sections further** -> Ask which sections still feel weak and run another targeted pass only for those sections
## Example Enhancement
If no substantive changes were warranted:
- Say that the plan already appears sufficiently grounded
- Offer the `document-review` skill or `/ce:work` as the next step instead
**Before (from /workflows:plan):**
```markdown
## Technical Approach
Use React Query for data fetching with optimistic updates.
```
**After (from /workflows:deepen-plan):**
```markdown
## Technical Approach
Use React Query for data fetching with optimistic updates.
### Research Insights
**Best Practices:**
- Configure `staleTime` and `cacheTime` based on data freshness requirements
- Use `queryKey` factories for consistent cache invalidation
- Implement error boundaries around query-dependent components
**Performance Considerations:**
- Enable `refetchOnWindowFocus: false` for stable data to reduce unnecessary requests
- Use `select` option to transform and memoize data at query level
- Consider `placeholderData` for instant perceived loading
**Implementation Details:**
```typescript
// Recommended query configuration
const queryClient = new QueryClient({
defaultOptions: {
queries: {
staleTime: 5 * 60 * 1000, // 5 minutes
retry: 2,
refetchOnWindowFocus: false,
},
},
});
```
**Edge Cases:**
- Handle race conditions with `cancelQueries` on component unmount
- Implement retry logic for transient network failures
- Consider offline support with `persistQueryClient`
**References:**
- https://tanstack.com/query/latest/docs/react/guides/optimistic-updates
- https://tkdodo.eu/blog/practical-react-query
```
NEVER CODE! Just research and enhance the plan.
NEVER CODE! Research, challenge, and strengthen the plan.

View File

@@ -1,83 +1,191 @@
---
name: document-review
description: This skill should be used to refine brainstorm or plan documents before proceeding to the next workflow step. It applies when a brainstorm or plan document exists and the user wants to improve it.
description: Review requirements or plan documents using parallel persona agents that surface role-specific issues. Use when a requirements document or plan document exists and the user wants to improve it.
---
# Document Review
Improve brainstorm or plan documents through structured review.
Review requirements or plan documents through multi-persona analysis. Dispatches specialized reviewer agents in parallel, auto-fixes quality issues, and presents strategic questions for user decision.
## Step 1: Get the Document
## Phase 1: Get and Analyze Document
**If a document path is provided:** Read it, then proceed to Step 2.
**If a document path is provided:** Read it, then proceed.
**If no document is specified:** Ask which document to review, or look for the most recent brainstorm/plan in `docs/brainstorms/` or `docs/plans/`.
**If no document is specified:** Ask which document to review, or find the most recent in `docs/brainstorms/` or `docs/plans/` using a file-search/glob tool (e.g., Glob in Claude Code).
## Step 2: Assess
### Classify Document Type
Read through the document and ask:
After reading, classify the document:
- **requirements** -- from `docs/brainstorms/`, focuses on what to build and why
- **plan** -- from `docs/plans/`, focuses on how to build it with implementation details
- What is unclear?
- What is unnecessary?
- What decision is being avoided?
- What assumptions are unstated?
- Where could scope accidentally expand?
### Select Conditional Personas
These questions surface issues. Don't fix yet—just note what you find.
Analyze the document content to determine which conditional personas to activate. Check for these signals:
## Step 3: Evaluate
**product-lens** -- activate when the document contains:
- User-facing features, user stories, or customer-focused language
- Market claims, competitive positioning, or business justification
- Scope decisions, prioritization language, or priority tiers with feature assignments
- Requirements with user/customer/business outcome focus
Score the document against these criteria:
**design-lens** -- activate when the document contains:
- UI/UX references, frontend components, or visual design language
- User flows, wireframes, screen/page/view mentions
- Interaction descriptions (forms, buttons, navigation, modals)
- References to responsive behavior or accessibility
| Criterion | What to Check |
|-----------|---------------|
| **Clarity** | Problem statement is clear, no vague language ("probably," "consider," "try to") |
| **Completeness** | Required sections present, constraints stated, open questions flagged |
| **Specificity** | Concrete enough for next step (brainstorm → can plan, plan → can implement) |
| **YAGNI** | No hypothetical features, simplest approach chosen |
**security-lens** -- activate when the document contains:
- Auth/authorization mentions, login flows, session management
- API endpoints exposed to external clients
- Data handling, PII, payments, tokens, credentials, encryption
- Third-party integrations with trust boundary implications
If invoked within a workflow (after `/ce:brainstorm` or `/ce:plan`), also check:
- **User intent fidelity** — Document reflects what was discussed, assumptions validated
**scope-guardian** -- activate when the document contains:
- Multiple priority tiers (P0/P1/P2, must-have/should-have/nice-to-have)
- Large requirement count (>8 distinct requirements or implementation units)
- Stretch goals, nice-to-haves, or "future work" sections
- Scope boundary language that seems misaligned with stated goals
- Goals that don't clearly connect to requirements
## Step 4: Identify the Critical Improvement
## Phase 2: Announce and Dispatch Personas
Among everything found in Steps 2-3, does one issue stand out? If something would significantly improve the document's quality, this is the "must address" item. Highlight it prominently.
### Announce the Review Team
## Step 5: Make Changes
Tell the user which personas will review and why. For conditional personas, include the justification:
Present your findings, then:
```
Reviewing with:
- coherence-reviewer (always-on)
- feasibility-reviewer (always-on)
- scope-guardian-reviewer -- plan has 12 requirements across 3 priority levels
- security-lens-reviewer -- plan adds API endpoints with auth flow
```
1. **Auto-fix** minor issues (vague language, formatting) without asking
2. **Ask approval** before substantive changes (restructuring, removing sections, changing meaning)
3. **Update** the document inline—no separate files, no metadata sections
### Build Agent List
### Simplification Guidance
Always include:
- `compound-engineering:document-review:coherence-reviewer`
- `compound-engineering:document-review:feasibility-reviewer`
Simplification is purposeful removal of unnecessary complexity, not shortening for its own sake.
Add activated conditional personas:
- `compound-engineering:document-review:product-lens-reviewer`
- `compound-engineering:document-review:design-lens-reviewer`
- `compound-engineering:document-review:security-lens-reviewer`
- `compound-engineering:document-review:scope-guardian-reviewer`
**Simplify when:**
- Content serves hypothetical future needs, not current ones
- Sections repeat information already covered elsewhere
- Detail exceeds what's needed to take the next step
- Abstractions or structure add overhead without clarity
### Dispatch
**Don't simplify:**
- Constraints or edge cases that affect implementation
- Rationale that explains why alternatives were rejected
- Open questions that need resolution
Dispatch all agents in **parallel** using the platform's task/agent tool (e.g., Agent tool in Claude Code, spawn in Codex). Each agent receives the prompt built from the [subagent template](./references/subagent-template.md) with these variables filled:
## Step 6: Offer Next Action
| Variable | Value |
|----------|-------|
| `{persona_file}` | Full content of the agent's markdown file |
| `{schema}` | Content of [findings-schema.json](./references/findings-schema.json) |
| `{document_type}` | "requirements" or "plan" from Phase 1 classification |
| `{document_path}` | Path to the document |
| `{document_content}` | Full text of the document |
After changes are complete, ask:
Pass each agent the **full document** -- do not split into sections.
1. **Refine again** - Another review pass
2. **Review complete** - Document is ready
**Error handling:** If an agent fails or times out, proceed with findings from agents that completed. Note the failed agent in the Coverage section. Do not block the entire review on a single agent failure.
### Iteration Guidance
**Dispatch limit:** Even at maximum (6 agents), use parallel dispatch. These are document reviewers with bounded scope reading a single document -- parallel is safe and fast.
After 2 refinement passes, recommend completion—diminishing returns are likely. But if the user wants to continue, allow it.
## Phase 3: Synthesize Findings
Return control to the caller (workflow or user) after selection.
Process findings from all agents through this pipeline. **Order matters** -- each step depends on the previous.
### 3.1 Validate
Check each agent's returned JSON against [findings-schema.json](./references/findings-schema.json):
- Drop findings missing any required field defined in the schema
- Drop findings with invalid enum values
- Note the agent name for any malformed output in the Coverage section
### 3.2 Confidence Gate
Suppress findings below 0.50 confidence. Store them as residual concerns for potential promotion in step 3.4.
### 3.3 Deduplicate
Fingerprint each finding using `normalize(section) + normalize(title)`. Normalization: lowercase, strip punctuation, collapse whitespace.
When fingerprints match across personas:
- If the findings recommend **opposing actions** (e.g., one says cut, the other says keep), do not merge -- preserve both for contradiction resolution in 3.5
- Otherwise merge: keep the highest severity, keep the highest confidence, union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility")
### 3.4 Promote Residual Concerns
Scan the residual concerns (findings suppressed in 3.2) for:
- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65.
- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55.
### 3.5 Resolve Contradictions
When personas disagree on the same section:
- Create a **combined finding** presenting both perspectives
- Set `autofix_class: present`
- Frame as a tradeoff, not a verdict
Specific conflict patterns:
- Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" -> combined finding, let user decide
- Feasibility says "this is impossible" + product-lens says "this is essential" -> P1 finding framed as a tradeoff
- Multiple personas flag the same issue -> merge into single finding, note consensus, increase confidence
### 3.6 Route by Autofix Class
| Autofix Class | Route |
|---------------|-------|
| `auto` | Apply automatically -- local deterministic fix (terminology, formatting, cross-references) |
| `present` | Present to user for judgment |
Demote any `auto` finding that lacks a `suggested_fix` to `present` -- the orchestrator cannot apply a fix without concrete replacement text.
### 3.7 Sort
Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by confidence (descending), then by document order (section position).
## Phase 4: Apply and Present
### Apply Auto-fixes
Apply all `auto` findings to the document in a **single pass**:
- Edit the document inline using the platform's edit tool
- Track what was changed for the "Auto-fixes Applied" section
- Do not ask for approval -- these are unambiguously correct (terminology fixes, formatting, cross-references)
### Present Remaining Findings
Present all other findings to the user using the format from [review-output-template.md](./references/review-output-template.md):
- Group by severity (P0 -> P3)
- Include the Coverage table showing which personas ran
- Show auto-fixes that were applied
- Include residual concerns and deferred questions if any
Brief summary at the top: "Applied N auto-fixes. M findings to consider (X at P0/P1)."
### Protected Artifacts
During synthesis, discard any finding that recommends deleting or removing files in:
- `docs/brainstorms/`
- `docs/plans/`
- `docs/solutions/`
These are pipeline artifacts and must not be flagged for removal.
## Phase 5: Next Action
Use the platform's blocking question tool when available (AskUserQuestion in Claude Code, request_user_input in Codex, ask_user in Gemini). Otherwise present numbered options and wait for the user's reply.
Offer:
1. **Refine again** -- another review pass
2. **Review complete** -- document is ready
After 2 refinement passes, recommend completion -- diminishing returns are likely. But if the user wants to continue, allow it.
Return "Review complete" as the terminal signal for callers.
## What NOT to Do
@@ -85,3 +193,8 @@ Return control to the caller (workflow or user) after selection.
- Do not add new sections or requirements the user didn't discuss
- Do not over-engineer or add complexity
- Do not create separate review files or add metadata sections
- Do not modify any of the 4 caller skills (ce-brainstorm, ce-plan, ce-plan-beta, deepen-plan-beta)
## Iteration Guidance
On subsequent passes, re-dispatch personas and re-synthesize. The auto-fix mechanism and confidence gating prevent the same findings from recurring once fixed. If findings are repetitive across passes, recommend completion.

View File

@@ -0,0 +1,98 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Document Review Findings",
"description": "Structured output schema for document review persona agents",
"type": "object",
"required": ["reviewer", "findings", "residual_risks", "deferred_questions"],
"properties": {
"reviewer": {
"type": "string",
"description": "Persona name that produced this output (e.g., 'coherence', 'feasibility', 'product-lens')"
},
"findings": {
"type": "array",
"description": "List of document review findings. Empty array if no issues found.",
"items": {
"type": "object",
"required": [
"title",
"severity",
"section",
"why_it_matters",
"autofix_class",
"confidence",
"evidence"
],
"properties": {
"title": {
"type": "string",
"description": "Short, specific issue title. 10 words or fewer.",
"maxLength": 100
},
"severity": {
"type": "string",
"enum": ["P0", "P1", "P2", "P3"],
"description": "Issue severity level"
},
"section": {
"type": "string",
"description": "Document section where the issue appears (e.g., 'Requirements Trace', 'Implementation Unit 3', 'Overview')"
},
"why_it_matters": {
"type": "string",
"description": "Impact statement -- not 'what is wrong' but 'what goes wrong if not addressed'"
},
"autofix_class": {
"type": "string",
"enum": ["auto", "present"],
"description": "How this issue should be handled. auto = local deterministic fix the orchestrator can apply without asking (terminology, formatting, cross-references). present = requires user judgment."
},
"suggested_fix": {
"type": ["string", "null"],
"description": "Concrete fix text. Omit or null if no good fix is obvious -- a bad suggestion is worse than none."
},
"confidence": {
"type": "number",
"description": "Reviewer confidence in this finding, calibrated per persona",
"minimum": 0.0,
"maximum": 1.0
},
"evidence": {
"type": "array",
"description": "Quoted text from the document that supports this finding. At least 1 item.",
"items": { "type": "string" },
"minItems": 1
}
}
}
},
"residual_risks": {
"type": "array",
"description": "Risks the reviewer noticed but could not confirm as findings (below confidence threshold)",
"items": { "type": "string" }
},
"deferred_questions": {
"type": "array",
"description": "Questions that should be resolved in a later workflow stage (planning, implementation)",
"items": { "type": "string" }
}
},
"_meta": {
"confidence_thresholds": {
"suppress": "Below 0.50 -- do not report. Finding is speculative noise.",
"flag": "0.50-0.69 -- include only when the persona's calibration says the issue is actionable at that confidence.",
"report": "0.70+ -- report with full confidence."
},
"severity_definitions": {
"P0": "Contradictions or gaps that would cause building the wrong thing. Must fix before proceeding.",
"P1": "Significant gap likely hit during planning or implementation. Should fix.",
"P2": "Moderate issue with meaningful downside. Fix if straightforward.",
"P3": "Minor improvement. User's discretion."
},
"autofix_classes": {
"auto": "Local, deterministic document fix: terminology consistency, formatting, cross-reference correction. Must be unambiguous and not change the document's meaning.",
"present": "Requires user judgment -- strategic questions, tradeoffs, meaning-changing fixes, or informational findings."
}
}
}

View File

@@ -0,0 +1,78 @@
# Document Review Output Template
Use this **exact format** when presenting synthesized review findings. Findings are grouped by severity, not by reviewer.
**IMPORTANT:** Use pipe-delimited markdown tables (`| col | col |`). Do NOT use ASCII box-drawing characters.
## Example
```markdown
## Document Review Results
**Document:** docs/plans/2026-03-15-feat-user-auth-plan.md
**Type:** plan
**Reviewers:** coherence, feasibility, security-lens, scope-guardian
- security-lens -- plan adds public API endpoint with auth flow
- scope-guardian -- plan has 15 requirements across 3 priority levels
### Auto-fixes Applied
- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence, auto)
- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence, auto)
### P0 -- Must Fix
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 | `present` |
### P1 -- Should Fix
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 2 | Implementation Unit 3 | Plan proposes custom auth when codebase already uses Devise | feasibility | 0.85 | `present` |
| 3 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 | `present` |
### P2 -- Consider Fixing
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 | `present` |
### P3 -- Minor
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 5 | Overview | "Service" used to mean both microservice and business class | coherence | 0.65 | `auto` |
### Residual Concerns
| # | Concern | Source |
|---|---------|--------|
| 1 | Migration rollback strategy not addressed for Phase 2 data changes | feasibility |
### Deferred Questions
| # | Question | Source |
|---|---------|--------|
| 1 | Should the API use versioned endpoints from launch? | feasibility, security-lens |
### Coverage
| Persona | Status | Findings | Residual |
|---------|--------|----------|----------|
| coherence | completed | 2 | 0 |
| feasibility | completed | 1 | 1 |
| security-lens | completed | 1 | 0 |
| scope-guardian | completed | 1 | 0 |
| product-lens | not activated | -- | -- |
| design-lens | not activated | -- | -- |
```
## Section Rules
- **Auto-fixes Applied**: List fixes that were applied automatically (auto class). Omit section if none.
- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels.
- **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none.
- **Deferred Questions**: Questions for later workflow stages. Omit if none.
- **Coverage**: Always include. Shows which personas ran and their output counts.

View File

@@ -0,0 +1,50 @@
# Document Review Sub-agent Prompt Template
This template is used by the document-review orchestrator to spawn each reviewer sub-agent. Variable substitution slots are filled at dispatch time.
---
## Template
```
You are a specialist document reviewer.
<persona>
{persona_file}
</persona>
<output-contract>
Return ONLY valid JSON matching the findings schema below. No prose, no markdown, no explanation outside the JSON object.
{schema}
Rules:
- Suppress any finding below your stated confidence floor (see your Confidence calibration section).
- Every finding MUST include at least one evidence item -- a direct quote from the document.
- You are operationally read-only. Analyze the document and produce findings. Do not edit the document, create files, or make changes. You may use non-mutating tools (file reads, glob, grep, git log) to gather context about the codebase when evaluating feasibility or existing patterns.
- Set `autofix_class` conservatively:
- `auto`: Only for local, deterministic fixes -- terminology corrections, formatting fixes, cross-reference repairs. The fix must be unambiguous and not change the document's meaning.
- `present`: Everything else -- strategic questions, tradeoffs, meaning-changing fixes, informational findings.
- `suggested_fix` is optional. Only include it when the fix is obvious and correct. For `present` findings, frame as a question instead.
- If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
- Use your suppress conditions. Do not flag issues that belong to other personas.
</output-contract>
<review-context>
Document type: {document_type}
Document path: {document_path}
Document content:
{document_content}
</review-context>
```
## Variable Reference
| Variable | Source | Description |
|----------|--------|-------------|
| `{persona_file}` | Agent markdown file content | The full persona definition (identity, analysis protocol, calibration, suppress conditions) |
| `{schema}` | `references/findings-schema.json` content | The JSON schema reviewers must conform to |
| `{document_type}` | Orchestrator classification | Either "requirements" or "plan" |
| `{document_path}` | Skill input | Path to the document being reviewed |
| `{document_content}` | File read | The full document text |

View File

@@ -1,96 +1,111 @@
---
name: feature-video
description: Record a video walkthrough of a feature and add it to the PR description
argument-hint: "[PR number or 'current'] [optional: base URL, default localhost:3000]"
description: Record a video walkthrough of a feature and add it to the PR description. Use when a PR needs a visual demo for reviewers, when the user asks to demo a feature, create a PR video, record a walkthrough, show what changed visually, or add a video to a pull request.
argument-hint: "[PR number or 'current' or path/to/video.mp4] [optional: base URL, default localhost:3000]"
---
# Feature Video Walkthrough
<command_purpose>Record a video walkthrough demonstrating a feature, upload it, and add it to the PR description.</command_purpose>
## Introduction
<role>Developer Relations Engineer creating feature demo videos</role>
This command creates professional video walkthroughs of features for PR documentation:
- Records browser interactions using agent-browser CLI
- Demonstrates the complete user flow
- Uploads the video for easy sharing
- Updates the PR description with an embedded video
Record browser interactions demonstrating a feature, stitch screenshots into an MP4 video, upload natively to GitHub, and embed in the PR description as an inline video player.
## Prerequisites
<requirements>
- Local development server running (e.g., `bin/dev`, `rails server`)
- agent-browser CLI installed
- Git repository with a PR to document
- Local development server running (e.g., `bin/dev`, `npm run dev`, `rails server`)
- `agent-browser` CLI installed (load the `agent-browser` skill for details)
- `ffmpeg` installed (for video conversion)
- `rclone` configured (optional, for cloud upload - see rclone skill)
- Public R2 base URL known (for example, `https://<public-domain>.r2.dev`)
</requirements>
## Setup
**Check installation:**
```bash
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
```
**Install if needed:**
```bash
npm install -g agent-browser && agent-browser install
```
See the `agent-browser` skill for detailed usage.
- `gh` CLI authenticated with push access to the repo
- Git repository on a feature branch (PR optional -- skill can create a draft or record-only)
- One-time GitHub browser auth (see Step 6 auth check)
## Main Tasks
### 1. Parse Arguments
<parse_args>
### 1. Parse Arguments & Resolve PR
**Arguments:** $ARGUMENTS
Parse the input:
- First argument: PR number or "current" (defaults to current branch's PR)
- First argument: PR number, "current" (defaults to current branch's PR), or path to an existing `.mp4` file (upload-only resume mode)
- Second argument: Base URL (defaults to `http://localhost:3000`)
**Upload-only resume:** If the first argument ends in `.mp4` and the file exists, skip Steps 2-5 and proceed directly to Step 6 using that file. Resolve the PR number from the current branch (`gh pr view --json number -q '.number'`).
If an explicit PR number was provided, verify it exists and use it directly:
```bash
gh pr view [number] --json number -q '.number'
```
If no explicit PR number was provided (or "current" was specified), check if a PR exists for the current branch:
```bash
# Get PR number for current branch if needed
gh pr view --json number -q '.number'
```
</parse_args>
If no PR exists for the current branch, ask the user how to proceed. **Use the platform's blocking question tool** (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini):
```
No PR found for the current branch.
1. Create a draft PR now and continue (recommended)
2. Record video only -- save locally and upload later when a PR exists
3. Cancel
```
If option 1: create a draft PR with a placeholder title derived from the branch name, then continue with the new PR number:
```bash
gh pr create --draft --title "[branch-name-humanized]" --body "Draft PR for video walkthrough"
```
If option 2: set `RECORD_ONLY=true`. Proceed through Steps 2-5 (record and encode), skip Steps 6-7 (upload and PR update), and report the local video path and `[RUN_ID]` at the end.
**Upload-only resume:** To upload a previously recorded video, pass an existing video file path as the first argument (e.g., `/feature-video .context/compound-engineering/feature-video/1711234567/videos/feature-demo.mp4`). When the first argument is a path to an `.mp4` file, skip Steps 2-5 and proceed directly to Step 6 using that file for upload.
### 1b. Verify Required Tools
Before proceeding, check that required CLI tools are installed. Fail early with a clear message rather than failing mid-workflow after screenshots have been recorded:
```bash
command -v ffmpeg
```
```bash
command -v agent-browser
```
```bash
command -v gh
```
If any tool is missing, stop and report which tools need to be installed:
- `ffmpeg`: `brew install ffmpeg` (macOS) or equivalent
- `agent-browser`: load the `agent-browser` skill for installation instructions
- `gh`: `brew install gh` (macOS) or see https://cli.github.com
Do not proceed to Step 2 until all tools are available.
### 2. Gather Feature Context
<gather_context>
**If a PR is available**, get PR details and changed files:
**Get PR details:**
```bash
gh pr view [number] --json title,body,files,headRefName -q '.'
```
**Get changed files:**
```bash
gh pr view [number] --json files -q '.files[].path'
```
**Map files to testable routes** (same as playwright-test):
**If in record-only mode (no PR)**, detect the default branch and derive context from the branch diff. Run both commands in a single block so the variable persists:
| File Pattern | Route(s) |
|-------------|----------|
| `app/views/users/*` | `/users`, `/users/:id`, `/users/new` |
| `app/controllers/settings_controller.rb` | `/settings` |
| `app/javascript/controllers/*_controller.js` | Pages using that Stimulus controller |
| `app/components/*_component.rb` | Pages rendering that component |
```bash
DEFAULT_BRANCH=$(gh repo view --json defaultBranchRef -q '.defaultBranchRef.name') && git diff --name-only "$DEFAULT_BRANCH"...HEAD && git log --oneline "$DEFAULT_BRANCH"...HEAD
```
</gather_context>
Map changed files to routes/pages that should be demonstrated. Examine the project's routing configuration (e.g., `routes.rb`, `next.config.js`, `app/` directory structure) to determine which URLs correspond to the changed files.
### 3. Plan the Video Flow
<plan_flow>
Before recording, create a shot list:
1. **Opening shot**: Homepage or starting point (2-3 seconds)
@@ -99,12 +114,12 @@ Before recording, create a shot list:
4. **Edge cases**: Error states, validation, etc. (if applicable)
5. **Success state**: Completed action/result
Ask user to confirm or adjust the flow:
Present the proposed flow to the user for confirmation before recording.
```markdown
**Proposed Video Flow**
**Use the platform's blocking question tool when available** (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options and wait for the user's reply before proceeding:
Based on PR #[number]: [title]
```
Proposed Video Flow for PR #[number]: [title]
1. Start at: /[starting-route]
2. Navigate to: /[feature-route]
@@ -116,218 +131,221 @@ Based on PR #[number]: [title]
Estimated duration: ~[X] seconds
Does this look right?
1. Yes, start recording
1. Start recording
2. Modify the flow (describe changes)
3. Add specific interactions to demonstrate
```
</plan_flow>
### 4. Record the Walkthrough
### 4. Setup Video Recording
Generate a unique run ID (e.g., timestamp) and create per-run output directories. This prevents stale screenshots from prior runs being spliced into the new video.
<setup_recording>
**Create videos directory:**
```bash
mkdir -p tmp/videos
```
**Recording approach: Use browser screenshots as frames**
agent-browser captures screenshots at key moments, then combine into video using ffmpeg:
**Important:** Shell variables do not persist across separate code blocks. After generating the run ID, substitute the concrete value into all subsequent commands in this workflow. For example, if the timestamp is `1711234567`, use that literal value in all paths below -- do not rely on `[RUN_ID]` expanding in later blocks.
```bash
ffmpeg -framerate 2 -pattern_type glob -i 'tmp/screenshots/*.png' -vf "scale=1280:-1" tmp/videos/feature-demo.gif
date +%s
```
</setup_recording>
Use the output as RUN_ID. Create the directories with the concrete value:
### 5. Record the Walkthrough
```bash
mkdir -p .context/compound-engineering/feature-video/[RUN_ID]/screenshots
mkdir -p .context/compound-engineering/feature-video/[RUN_ID]/videos
```
<record_walkthrough>
Execute the planned flow, capturing each step with agent-browser. Number screenshots sequentially for correct frame ordering:
Execute the planned flow, capturing each step:
**Step 1: Navigate to starting point**
```bash
agent-browser open "[base-url]/[start-route]"
agent-browser wait 2000
agent-browser screenshot tmp/screenshots/01-start.png
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/01-start.png
```
**Step 2: Perform navigation/interactions**
```bash
agent-browser snapshot -i # Get refs
agent-browser click @e1 # Click navigation element
agent-browser snapshot -i
agent-browser click @e1
agent-browser wait 1000
agent-browser screenshot tmp/screenshots/02-navigate.png
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/02-navigate.png
```
**Step 3: Demonstrate feature**
```bash
agent-browser snapshot -i # Get refs for feature elements
agent-browser click @e2 # Click feature element
agent-browser snapshot -i
agent-browser click @e2
agent-browser wait 1000
agent-browser screenshot tmp/screenshots/03-feature.png
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/03-feature.png
```
**Step 4: Capture result**
```bash
agent-browser wait 2000
agent-browser screenshot tmp/screenshots/04-result.png
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/04-result.png
```
**Create video/GIF from screenshots:**
### 5. Create Video
Stitch screenshots into an MP4 using the same `[RUN_ID]` from Step 4:
```bash
# Create directories
mkdir -p tmp/videos tmp/screenshots
# Create MP4 video (RECOMMENDED - better quality, smaller size)
# -framerate 0.5 = 2 seconds per frame (slower playback)
# -framerate 1 = 1 second per frame
ffmpeg -y -framerate 0.5 -pattern_type glob -i 'tmp/screenshots/*.png' \
ffmpeg -y -framerate 0.5 -pattern_type glob -i ".context/compound-engineering/feature-video/[RUN_ID]/screenshots/*.png" \
-c:v libx264 -pix_fmt yuv420p -vf "scale=1280:-2" \
tmp/videos/feature-demo.mp4
# Create low-quality GIF for preview (small file, for GitHub embed)
ffmpeg -y -framerate 0.5 -pattern_type glob -i 'tmp/screenshots/*.png' \
-vf "scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=128[p];[s1][p]paletteuse" \
-loop 0 tmp/videos/feature-demo-preview.gif
".context/compound-engineering/feature-video/[RUN_ID]/videos/feature-demo.mp4"
```
**Note:**
- The `-2` in MP4 scale ensures height is divisible by 2 (required for H.264)
- Preview GIF uses 640px width and 128 colors to keep file size small (~100-200KB)
Notes:
- `-framerate 0.5` = 2 seconds per frame. Adjust for faster/slower playback.
- `-2` in scale ensures height is divisible by 2 (required for H.264).
</record_walkthrough>
### 6. Authenticate & Upload to GitHub
### 6. Upload the Video
Upload produces a `user-attachments/assets/` URL that GitHub renders as a native inline video player -- the same result as pasting a video into the PR editor manually.
<upload_video>
The approach: close any existing agent-browser session, start a Chrome-engine session with saved GitHub auth, navigate to the PR page, set the video file on the comment form's hidden file input, wait for GitHub to process the upload, extract the resulting URL, then clear the textarea without submitting.
**Upload with rclone:**
#### Check for existing session
First, check if a saved GitHub session already exists:
```bash
# Check rclone is configured
rclone listremotes
# Set your public base URL (NO trailing slash)
PUBLIC_BASE_URL="https://<your-public-r2-domain>.r2.dev"
# Upload video, preview GIF, and screenshots to cloud storage
# Use --s3-no-check-bucket to avoid permission errors
rclone copy tmp/videos/ r2:kieran-claude/pr-videos/pr-[number]/ --s3-no-check-bucket --progress
rclone copy tmp/screenshots/ r2:kieran-claude/pr-videos/pr-[number]/screenshots/ --s3-no-check-bucket --progress
# List uploaded files
rclone ls r2:kieran-claude/pr-videos/pr-[number]/
# Build and validate public URLs BEFORE updating PR
VIDEO_URL="$PUBLIC_BASE_URL/pr-videos/pr-[number]/feature-demo.mp4"
PREVIEW_URL="$PUBLIC_BASE_URL/pr-videos/pr-[number]/feature-demo-preview.gif"
curl -I "$VIDEO_URL"
curl -I "$PREVIEW_URL"
# Require HTTP 200 for both URLs; stop if either fails
curl -I "$VIDEO_URL" | head -n 1 | grep -q ' 200 ' || exit 1
curl -I "$PREVIEW_URL" | head -n 1 | grep -q ' 200 ' || exit 1
agent-browser close
agent-browser --engine chrome --session-name github open https://github.com/settings/profile
agent-browser get title
```
</upload_video>
If the page title contains the user's GitHub username or "Profile", the session is still valid -- skip to "Upload the video" below. If it redirects to the login page, the session has expired or was never created -- proceed to "Auth setup".
#### Auth setup (one-time)
Establish an authenticated GitHub session. This only needs to happen once -- session cookies persist across runs via the `--session-name` flag.
Close the current session and open the GitHub login page in a headed Chrome window:
```bash
agent-browser close
agent-browser --engine chrome --headed --session-name github open https://github.com/login
```
The user must log in manually in the browser window (handles 2FA, SSO, OAuth -- any login method). **Use the platform's blocking question tool** (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present the message and wait for the user's reply before proceeding:
```
GitHub login required for video upload.
A Chrome window has opened to github.com/login. Please log in manually
(this handles 2FA/SSO/OAuth automatically). Reply when done.
```
After login, verify the session works:
```bash
agent-browser open https://github.com/settings/profile
```
If the profile page loads, auth is confirmed. The `github` session is now saved and reusable.
#### Upload the video
Navigate to the PR page and scroll to the comment form:
```bash
agent-browser open "https://github.com/[owner]/[repo]/pull/[number]"
agent-browser scroll down 5000
```
Save any existing textarea content before uploading (the comment box may contain an unsent draft):
```bash
agent-browser eval "document.getElementById('new_comment_field').value"
```
Store this value as `SAVED_TEXTAREA`. If non-empty, it will be restored after extracting the upload URL.
Upload the video via the hidden file input. Use the caller-provided `.mp4` path if in upload-only resume mode, otherwise use the current run's encoded video:
```bash
agent-browser upload '#fc-new_comment_field' [VIDEO_FILE_PATH]
```
Where `[VIDEO_FILE_PATH]` is either:
- The `.mp4` path passed as the first argument (upload-only resume mode)
- `.context/compound-engineering/feature-video/[RUN_ID]/videos/feature-demo.mp4` (normal recording flow)
Wait for GitHub to process the upload (typically 3-5 seconds), then read the textarea value:
```bash
agent-browser wait 5000
agent-browser eval "document.getElementById('new_comment_field').value"
```
**Validate the extracted URL.** The value must contain `user-attachments/assets/` to confirm a successful native upload. If the textarea is empty, contains only placeholder text, or the URL does not match, do not proceed to Step 7. Instead:
1. Check `agent-browser get url` -- if it shows `github.com/login`, the session expired. Re-run auth setup.
2. If still on the PR page, wait an additional 5 seconds and re-read the textarea (GitHub processing can be slow).
3. If validation still fails after retry, report the failure and the local video path so the user can upload manually.
Restore the original textarea content (or clear if it was empty). A JSON-encoded string is also a valid JavaScript string literal, so assign it directly without `JSON.parse`:
```bash
agent-browser eval "const ta = document.getElementById('new_comment_field'); ta.value = [SAVED_TEXTAREA_AS_JS_STRING]; ta.dispatchEvent(new Event('input', { bubbles: true }))"
```
To prepare the value: take the SAVED_TEXTAREA string and produce a JS string literal from it -- escape backslashes, double quotes, and newlines (e.g., `"text with \"quotes\" and\nnewlines"`). If SAVED_TEXTAREA was empty, use `""`. The result is embedded directly as the right-hand side of the assignment -- no `JSON.parse` call needed.
### 7. Update PR Description
<update_pr>
Get the current PR body:
**Get current PR body:**
```bash
gh pr view [number] --json body -q '.body'
```
**Add video section to PR description:**
If the PR already has a video section, replace it. Otherwise, append:
**IMPORTANT:** GitHub cannot embed external MP4s directly. Use a clickable GIF that links to the video:
Append a Demo section (or replace an existing one). The video URL renders as an inline player when placed on its own line:
```markdown
## Demo
[![Feature Demo]([preview-gif-url])]([video-mp4-url])
https://github.com/user-attachments/assets/[uuid]
*Click to view full video*
*Automated video walkthrough*
```
Example:
```markdown
[![Feature Demo](https://<your-public-r2-domain>.r2.dev/pr-videos/pr-137/feature-demo-preview.gif)](https://<your-public-r2-domain>.r2.dev/pr-videos/pr-137/feature-demo.mp4)
```
Update the PR:
**Update the PR:**
```bash
gh pr edit [number] --body "[updated body with video section]"
gh pr edit [number] --body "[updated body with demo section]"
```
**Or add as a comment if preferred:**
```bash
gh pr comment [number] --body "## Feature Demo
![Demo]([video-url])
_Automated walkthrough of the changes in this PR_"
```
</update_pr>
### 8. Cleanup
<cleanup>
Ask the user before removing temporary files. If confirmed, clean up only the current run's scratch directory (other runs may still be in progress or awaiting upload).
**If the video was successfully uploaded**, remove the entire run directory:
```bash
# Optional: Clean up screenshots
rm -rf tmp/screenshots
# Keep videos for reference
echo "Video retained at: tmp/videos/feature-demo.gif"
rm -r .context/compound-engineering/feature-video/[RUN_ID]
```
</cleanup>
**If in record-only mode or upload failed**, remove only the screenshots but preserve the video so the user can upload later:
### 9. Summary
<summary>
Present completion summary:
```markdown
## Feature Video Complete
**PR:** #[number] - [title]
**Video:** [url or local path]
**Duration:** ~[X] seconds
**Format:** [GIF/MP4]
### Shots Captured
1. [Starting point] - [description]
2. [Navigation] - [description]
3. [Feature demo] - [description]
4. [Result] - [description]
### PR Updated
- [x] Video section added to PR description
- [ ] Ready for review
**Next steps:**
- Review the video to ensure it accurately demonstrates the feature
- Share with reviewers for context
```bash
rm -r .context/compound-engineering/feature-video/[RUN_ID]/screenshots
```
</summary>
Present a completion summary:
## Quick Usage Examples
```
Feature Video Complete
PR: #[number] - [title]
Video: [VIDEO_URL]
Shots captured:
1. [description]
2. [description]
3. [description]
4. [description]
PR description updated with demo section.
```
## Usage Examples
```bash
# Record video for current branch's PR
@@ -345,7 +363,20 @@ Present completion summary:
## Tips
- **Keep it short**: 10-30 seconds is ideal for PR demos
- **Focus on the change**: Don't include unrelated UI
- **Show before/after**: If fixing a bug, show the broken state first (if possible)
- **Annotate if needed**: Add text overlays for complex features
- Keep it short: 10-30 seconds is ideal for PR demos
- Focus on the change: don't include unrelated UI
- Show before/after: if fixing a bug, show the broken state first (if possible)
- The `--session-name github` session expires when GitHub invalidates the cookies (typically weeks). If upload fails with a login redirect, re-run the auth setup.
- GitHub DOM selectors (`#fc-new_comment_field`, `#new_comment_field`) may change if GitHub updates its UI. If the upload silently fails, inspect the PR page for updated selectors.
## Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| `ffmpeg: command not found` | ffmpeg not installed | Install via `brew install ffmpeg` (macOS) or equivalent |
| `agent-browser: command not found` | agent-browser not installed | Load the `agent-browser` skill for installation instructions |
| Textarea empty after upload wait | Session expired, or GitHub processing slow | Check session validity (Step 6 auth check). If valid, increase wait time and retry. |
| Textarea empty, URL is `github.com/login` | Session expired | Re-run auth setup (Step 6) |
| `gh pr view` fails | No PR for current branch | Step 1 handles this -- choose to create a draft PR or record-only mode |
| Video file too large for upload | Exceeds GitHub's 10MB (free) or 100MB (paid) limit | Re-encode: lower framerate (`-framerate 0.33`), reduce resolution (`scale=960:-2`), or increase CRF (`-crf 28`) |
| Upload URL does not contain `user-attachments/assets/` | Wrong upload method or GitHub change | Verify the file input selector is still correct by inspecting the PR page |

View File

@@ -1,259 +0,0 @@
---
name: file-todos
description: This skill should be used when managing the file-based todo tracking system in the todos/ directory. It provides workflows for creating todos, managing status and dependencies, conducting triage, and integrating with slash commands and code review processes.
disable-model-invocation: true
---
# File-Based Todo Tracking Skill
## Overview
The `todos/` directory contains a file-based tracking system for managing code review feedback, technical debt, feature requests, and work items. Each todo is a markdown file with YAML frontmatter and structured sections.
This skill should be used when:
- Creating new todos from findings or feedback
- Managing todo lifecycle (pending → ready → complete)
- Triaging pending items for approval
- Checking or managing dependencies
- Converting PR comments or code findings into tracked work
- Updating work logs during todo execution
## File Naming Convention
Todo files follow this naming pattern:
```
{issue_id}-{status}-{priority}-{description}.md
```
**Components:**
- **issue_id**: Sequential number (001, 002, 003...) - never reused
- **status**: `pending` (needs triage), `ready` (approved), `complete` (done)
- **priority**: `p1` (critical), `p2` (important), `p3` (nice-to-have)
- **description**: kebab-case, brief description
**Examples:**
```
001-pending-p1-mailer-test.md
002-ready-p1-fix-n-plus-1.md
005-complete-p2-refactor-csv.md
```
## File Structure
Each todo is a markdown file with YAML frontmatter and structured sections. Use the template at [todo-template.md](./assets/todo-template.md) as a starting point when creating new todos.
**Required sections:**
- **Problem Statement** - What is broken, missing, or needs improvement?
- **Assessment (Pressure Test)** - For code review findings: verification results and engineering judgment
- **Findings** - Investigation results, root cause, key discoveries
- **Proposed Solutions** - Multiple options with pros/cons, effort, risk
- **Recommended Action** - Clear plan (filled during triage)
- **Acceptance Criteria** - Testable checklist items
- **Work Log** - Chronological record with date, actions, learnings
**Optional sections:**
- **Technical Details** - Affected files, related components, DB changes
- **Resources** - Links to errors, tests, PRs, documentation
- **Notes** - Additional context or decisions
**Assessment section fields (for code review findings):**
- Assessment: Clear & Correct | Unclear | Likely Incorrect | YAGNI
- Recommended Action: Fix now | Clarify | Push back | Skip
- Verified: Code, Tests, Usage, Prior Decisions (Yes/No with details)
- Technical Justification: Why this finding is valid or should be skipped
**YAML frontmatter fields:**
```yaml
---
status: ready # pending | ready | complete
priority: p1 # p1 | p2 | p3
issue_id: "002"
tags: [rails, performance, database]
dependencies: ["001"] # Issue IDs this is blocked by
---
```
## Common Workflows
### Creating a New Todo
**To create a new todo from findings or feedback:**
1. Determine next issue ID: `ls todos/ | grep -o '^[0-9]\+' | sort -n | tail -1`
2. Copy template: `cp assets/todo-template.md todos/{NEXT_ID}-pending-{priority}-{description}.md`
3. Edit and fill required sections:
- Problem Statement
- Findings (if from investigation)
- Proposed Solutions (multiple options)
- Acceptance Criteria
- Add initial Work Log entry
4. Determine status: `pending` (needs triage) or `ready` (pre-approved)
5. Add relevant tags for filtering
**When to create a todo:**
- Requires more than 15-20 minutes of work
- Needs research, planning, or multiple approaches considered
- Has dependencies on other work
- Requires manager approval or prioritization
- Part of larger feature or refactor
- Technical debt needing documentation
**When to act immediately instead:**
- Issue is trivial (< 15 minutes)
- Complete context available now
- No planning needed
- User explicitly requests immediate action
- Simple bug fix with obvious solution
### Triaging Pending Items
**To triage pending todos:**
1. List pending items: `ls todos/*-pending-*.md`
2. For each todo:
- Read Problem Statement and Findings
- Review Proposed Solutions
- Make decision: approve, defer, or modify priority
3. Update approved todos:
- Rename file: `mv {file}-pending-{pri}-{desc}.md {file}-ready-{pri}-{desc}.md`
- Update frontmatter: `status: pending``status: ready`
- Fill "Recommended Action" section with clear plan
- Adjust priority if different from initial assessment
4. Deferred todos stay in `pending` status
**Use slash command:** `/triage` for interactive approval workflow
### Managing Dependencies
**To track dependencies:**
```yaml
dependencies: ["002", "005"] # This todo blocked by issues 002 and 005
dependencies: [] # No blockers - can work immediately
```
**To check what blocks a todo:**
```bash
grep "^dependencies:" todos/003-*.md
```
**To find what a todo blocks:**
```bash
grep -l 'dependencies:.*"002"' todos/*.md
```
**To verify blockers are complete before starting:**
```bash
for dep in 001 002 003; do
[ -f "todos/${dep}-complete-*.md" ] || echo "Issue $dep not complete"
done
```
### Updating Work Logs
**When working on a todo, always add a work log entry:**
```markdown
### YYYY-MM-DD - Session Title
**By:** Claude Code / Developer Name
**Actions:**
- Specific changes made (include file:line references)
- Commands executed
- Tests run
- Results of investigation
**Learnings:**
- What worked / what didn't
- Patterns discovered
- Key insights for future work
```
Work logs serve as:
- Historical record of investigation
- Documentation of approaches attempted
- Knowledge sharing for team
- Context for future similar work
### Completing a Todo
**To mark a todo as complete:**
1. Verify all acceptance criteria checked off
2. Update Work Log with final session and results
3. Rename file: `mv {file}-ready-{pri}-{desc}.md {file}-complete-{pri}-{desc}.md`
4. Update frontmatter: `status: ready``status: complete`
5. Check for unblocked work: `grep -l 'dependencies:.*"002"' todos/*-ready-*.md`
6. Commit with issue reference: `feat: resolve issue 002`
## Integration with Development Workflows
| Trigger | Flow | Tool |
|---------|------|------|
| Code review | `/ce:review` → Findings → `/triage` → Todos | Review agent + skill |
| PR comments | `/resolve_pr_parallel` → Individual fixes → Todos | gh CLI + skill |
| Code TODOs | `/resolve_todo_parallel` → Fixes + Complex todos | Agent + skill |
| Planning | Brainstorm → Create todo → Work → Complete | Skill |
| Feedback | Discussion → Create todo → Triage → Work | Skill + slash |
## Quick Reference Commands
**Finding work:**
```bash
# List highest priority unblocked work
grep -l 'dependencies: \[\]' todos/*-ready-p1-*.md
# List all pending items needing triage
ls todos/*-pending-*.md
# Find next issue ID
ls todos/ | grep -o '^[0-9]\+' | sort -n | tail -1 | awk '{printf "%03d", $1+1}'
# Count by status
for status in pending ready complete; do
echo "$status: $(ls -1 todos/*-$status-*.md 2>/dev/null | wc -l)"
done
```
**Dependency management:**
```bash
# What blocks this todo?
grep "^dependencies:" todos/003-*.md
# What does this todo block?
grep -l 'dependencies:.*"002"' todos/*.md
```
**Searching:**
```bash
# Search by tag
grep -l "tags:.*rails" todos/*.md
# Search by priority
ls todos/*-p1-*.md
# Full-text search
grep -r "payment" todos/
```
## Key Distinctions
**File-todos system (this skill):**
- Markdown files in `todos/` directory
- Development/project tracking
- Standalone markdown files with YAML frontmatter
- Used by humans and agents
**Rails Todo model:**
- Database model in `app/models/todo.rb`
- User-facing feature in the application
- Active Record CRUD operations
- Different from this file-based system
**TodoWrite tool:**
- In-memory task tracking during agent sessions
- Temporary tracking for single conversation
- Not persisted to disk
- Different from both systems above

View File

@@ -1,42 +1,258 @@
---
name: frontend-design
description: This skill should be used when creating distinctive, production-grade frontend interfaces with high design quality. It applies when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
license: Complete terms in LICENSE.txt
description: 'Build web interfaces with genuine design quality, not AI slop. Use for any frontend work - landing pages, web apps, dashboards, admin panels, components, interactive experiences. Activates for both greenfield builds and modifications to existing applications. Detects existing design systems and respects them. Covers composition, typography, color, motion, and copy. Verifies results via screenshots before declaring done.'
---
This skill guides creation of distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Implement real working code with exceptional attention to aesthetic details and creative choices.
# Frontend Design
The user provides frontend requirements: a component, page, application, or interface to build. They may include context about the purpose, audience, or technical constraints.
Guide creation of distinctive, production-grade frontend interfaces that avoid generic AI aesthetics. This skill covers the full lifecycle: detect what exists, plan the design, build with intention, and verify visually.
## Design Thinking
## Authority Hierarchy
Before coding, understand the context and commit to a BOLD aesthetic direction:
- **Purpose**: What problem does this interface solve? Who uses it?
- **Tone**: Pick an extreme: brutally minimal, maximalist chaos, retro-futuristic, organic/natural, luxury/refined, playful/toy-like, editorial/magazine, brutalist/raw, art deco/geometric, soft/pastel, industrial/utilitarian, etc. There are so many flavors to choose from. Use these for inspiration but design one that is true to the aesthetic direction.
- **Constraints**: Technical requirements (framework, performance, accessibility).
- **Differentiation**: What makes this UNFORGETTABLE? What's the one thing someone will remember?
Every rule in this skill is a default, not a mandate.
**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work - the key is intentionality, not intensity.
1. **Existing design system / codebase patterns** -- highest priority, always respected
2. **User's explicit instructions** -- override skill defaults
3. **Skill defaults** -- apply in greenfield work or when the user asks for design guidance
Then implement working code (HTML/CSS/JS, React, Vue, etc.) that is:
- Production-grade and functional
- Visually striking and memorable
- Cohesive with a clear aesthetic point-of-view
- Meticulously refined in every detail
When working in an existing codebase with established patterns, follow those patterns. When the user specifies a direction that contradicts a default, follow the user.
## Frontend Aesthetics Guidelines
## Workflow
Focus on:
- **Typography**: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font.
- **Color & Theme**: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes.
- **Motion**: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. Use scroll-triggering and hover states that surprise.
- **Spatial Composition**: Unexpected layouts. Asymmetry. Overlap. Diagonal flow. Grid-breaking elements. Generous negative space OR controlled density.
- **Backgrounds & Visual Details**: Create atmosphere and depth rather than defaulting to solid colors. Add contextual effects and textures that match the overall aesthetic. Apply creative forms like gradient meshes, noise textures, geometric patterns, layered transparencies, dramatic shadows, decorative borders, custom cursors, and grain overlays.
```
Detect context -> Plan the design -> Build -> Verify visually
```
NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character.
---
Interpret creatively and make unexpected choices that feel genuinely designed for the context. No design should be the same. Vary between light and dark themes, different fonts, different aesthetics. NEVER converge on common choices (Space Grotesk, for example) across generations.
## Layer 0: Context Detection
**IMPORTANT**: Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist or refined designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well.
Before any design work, examine the codebase for existing design signals. This determines how much of the skill's opinionated guidance applies.
Remember: Claude is capable of extraordinary creative work. Don't hold back, show what can truly be created when thinking outside the box and committing fully to a distinctive vision.
### What to Look For
- **Design tokens / CSS variables**: `--color-*`, `--spacing-*`, `--font-*` custom properties, theme files
- **Component libraries**: shadcn/ui, Material UI, Chakra, Ant Design, Radix, or project-specific component directories
- **CSS frameworks**: `tailwind.config.*`, `styled-components` theme, Bootstrap imports, CSS modules with consistent naming
- **Typography**: Font imports in HTML/CSS, `@font-face` declarations, Google Fonts links
- **Color palette**: Defined color scales, brand color files, design token exports
- **Animation libraries**: Framer Motion, GSAP, anime.js, Motion One, Vue Transition imports
- **Spacing / layout patterns**: Consistent spacing scale usage, grid systems, layout components
Use the platform's native file-search and content-search tools (e.g., Glob/Grep in Claude Code) to scan for these signals. Do not use shell commands for routine file exploration.
### Mode Classification
Based on detected signals, choose a mode:
- **Existing system** (4+ signals across multiple categories): Defer to it. The skill's aesthetic opinions (typography, color, motion) yield to the established system. Structural guidance (composition, copy, accessibility, verification) still applies.
- **Partial system** (1-3 signals): Follow what exists; apply skill defaults only for areas where no convention was detected. For example, if Tailwind is configured but no component library exists, follow the Tailwind tokens and apply skill guidance for component structure.
- **Greenfield** (no signals detected): Full skill guidance applies.
- **Ambiguous** (signals are contradictory or unclear): Ask the user before proceeding.
### Asking the User
When context is ambiguous, use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, assume "partial" mode and proceed conservatively.
Example question: "I found [detected signals]. Should I follow your existing design patterns or create something distinctive?"
---
## Layer 1: Pre-Build Planning
Before writing code, write three short statements. These create coherence and give the user a checkpoint to redirect before code is written.
1. **Visual thesis** -- one sentence describing the mood, material, and energy
- Greenfield examples: "Clean editorial feel, lots of whitespace, serif headlines, muted earth tones" or "Dense data-forward dashboard, monospace accents, dark surface hierarchy"
- Existing codebase: Describe the *existing* aesthetic and how the new work will extend it
2. **Content plan** -- what goes on the page and in what order
- Landing page: hero, support, detail, CTA
- App: primary workspace, nav, secondary context
- Component: what states it has, what it communicates
3. **Interaction plan** -- 2-3 specific motion ideas that change the feel
- Not "add animations" but "staggered fade-in on hero load, parallax on scroll between sections, scale-up on card hover"
- In an existing codebase, describe only the interactions being added, using the existing motion library
---
## Layer 2: Design Guidance Core
These principles apply across all context types. Each yields to existing design systems and user instructions per the authority hierarchy.
### Typography
- Choose distinctive, characterful fonts. Avoid the usual suspects (Inter, Roboto, Arial, system defaults) unless the existing codebase uses them.
- Two typefaces maximum without a clear reason for more. Pair a display/headline font with a body font.
- *Yields to existing font choices when detected in Layer 0.*
### Color & Theme
- Commit to a cohesive palette using CSS variables. A dominant color with sharp accents outperforms timid, evenly-distributed palettes.
- No purple-on-white bias, no dark-mode bias. Vary between light and dark based on context.
- One accent color by default unless the product already has a multi-color system.
- *Yields to existing color tokens when detected.*
### Composition
- Start with composition, not components. Treat the first viewport as a poster, not a document.
- Use whitespace, alignment, scale, cropping, and contrast before adding chrome (borders, shadows, cards).
- Default to cardless layouts. Cards are allowed when they serve as the container for a user interaction (clickable item, draggable unit, selectable option). If removing the card styling would not hurt comprehension, it should not be a card.
- *All composition rules are defaults. The user can override them.*
### Motion
- Ship 2-3 intentional motions for visually-led work: one entrance sequence, one scroll-linked or depth effect, one hover/reveal transition.
- Use the project's existing animation library if one is present.
- When no existing library is found, use framework-conditional defaults:
- **CSS animations** as the universal baseline
- **Framer Motion** for React projects
- **Vue Transition / Motion One** for Vue projects
- **Svelte transitions** for Svelte projects
- Motion should be noticeable in a quick recording, smooth on mobile, and consistent across the page. Remove if purely ornamental.
### Accessibility
- Semantic HTML by default: `nav`, `main`, `section`, `article`, `button` -- not divs for everything.
- Color contrast meeting WCAG AA minimum.
- Focus states on all interactive elements.
- Accessibility and aesthetics are not in tension when done well.
### Imagery
- When images are needed, prefer real or realistic photography over abstract gradients or fake 3D objects.
- Choose or generate images with a stable tonal area for text overlay.
- If image generation tools are available in the environment, use them to create contextually appropriate visuals rather than placeholder stock.
---
## Context Modules
Select the module that fits what is being built. When working inside an existing application, default to Module C regardless of what the feature is.
### Module A: Landing Pages & Marketing (Greenfield)
**Default section sequence:**
1. Hero -- brand/product, promise, CTA, one dominant visual
2. Support -- one concrete feature, offer, or proof point
3. Detail -- atmosphere, workflow, product depth, or story
4. Final CTA -- convert, start, visit, or contact
**Hero rules (defaults):**
- One composition, not a dashboard. Full-bleed image or dominant visual plane.
- Brand first, headline second, body third, CTA fourth.
- Keep the text column narrow and anchored to a calm area of the image.
- No more than 6 sections total without a clear reason.
- One H1 headline. One primary CTA above the fold.
**Copy:**
- Let the headline carry the meaning. Supporting copy is usually one short sentence.
- Write in product language, not design commentary. No prompt language or AI commentary in the UI.
- Each section gets one job: explain, prove, deepen, or convert.
- Every sentence should earn its place. Default to less copy, not more.
### Module B: Apps & Dashboards (Greenfield)
**Default patterns:**
- Calm surface hierarchy, strong typography and spacing, few colors, dense but readable information, minimal chrome.
- Organize around: primary workspace, navigation, secondary context/inspector, one clear accent for action or state.
- Cards only when the card is the interaction (clickable item, draggable unit, selectable option). If a panel can become plain layout without losing meaning, remove the card treatment.
**Copy (utility, not marketing):**
- Prioritize orientation, status, and action over promise, mood, or brand voice.
- Section headings should say what the area is or what the user can do there. Good: "Plan status", "Search metrics". Bad: "Unlock Your Potential".
- If a sentence could appear in a homepage hero, rewrite it until it sounds like product UI.
- Litmus: if an operator scans only headings, labels, and numbers, can they understand the page immediately?
### Module C: Components & Features (Default in Existing Apps)
For adding to an existing application:
- Match the existing visual language. This module is about making something that belongs, not something that stands out.
- Inherit spacing scale, border radius, color tokens, and typography from surrounding code.
- Focus on interaction quality: clear states (default, hover, active, disabled, loading, error), smooth transitions between states, obvious affordances.
- One new component should not introduce a new design system. If the existing app uses 4px border radius, do not add a component with 8px.
---
## Hard Rules & Anti-Patterns
### Default Against (Overridable)
These are the skill being opinionated. The user can override any of them.
- Generic SaaS card grid as the first impression
- Purple-on-white color schemes, dark-mode bias
- Overused fonts (Inter, Roboto, Arial, Space Grotesk, system defaults) in greenfield work
- Hero sections cluttered with stats, schedules, pill clusters, logo clouds
- Sections that repeat the same mood statement in different words
- Carousel with no narrative purpose
- Multiple competing accent colors
- Decorative gradients or abstract backgrounds standing in for real visual content
- Copy that sounds like design commentary ("Experience the seamless integration")
- Split-screen heroes where text sits on the busy side of an image
### Always Avoid (Quality Floor)
These are genuine quality failures no user would want.
- Prompt language or AI commentary leaking into the UI
- Broken contrast -- text unreadable over images or backgrounds
- Interactive elements without visible focus states
- Semantic div soup when proper HTML elements exist
---
## Litmus Checks
Quick self-review before moving to visual verification. Not all checks apply in every context -- apply judgment about which are relevant.
- Is the brand or product unmistakable in the first screen?
- Is there one strong visual anchor?
- Can the page be understood by scanning headlines only?
- Does each section have one job?
- Are cards actually necessary where they are used?
- Does motion improve hierarchy or atmosphere, or is it just there?
- Would the design feel premium if all decorative shadows were removed?
- Does the copy sound like the product, not like a prompt?
- Does the new work match the existing design system? (Module C)
---
## Visual Verification
After implementing, verify visually. This is a sanity check, not a pixel-perfect review. One pass. If there is a glaring issue, fix it. If it looks solid, move on.
### Tool Preference Cascade
Use the first available option:
1. **Existing project browser tooling** -- if Playwright, Puppeteer, Cypress, or similar is already in the project's dependencies, use it. Do not introduce new dependencies just for verification.
2. **Browser MCP tools** -- if browser automation tools (e.g., claude-in-chrome) are available in the agent's environment, use them.
3. **agent-browser CLI** -- if nothing else is available, this is the default. Load the `agent-browser` skill for installation and usage instructions.
4. **Mental review** -- if no browser access is possible (headless CI, no permissions to install), apply the litmus checks as a self-review and note that visual verification was skipped.
### What to Assess
- Does the output match the visual thesis from the pre-build plan?
- Are there obvious visual problems (broken layout, unreadable text, missing images)?
- Does it look like the context module intended (landing page feels like a landing page, dashboard feels like a dashboard, component fits its surroundings)?
### Scope Control
One iteration. Take a screenshot, assess against the litmus checks, fix any glaring issues, and move on. Include the screenshot in the deliverable (PR description, conversation output, etc.).
For iterative refinement beyond a single pass (multiple rounds of screenshot-assess-fix), see the `compound-engineering:design:design-iterator` agent.
---
## Creative Energy
This skill provides structure, but the goal is distinctive work that avoids AI slop -- not formulaic output.
For greenfield work, commit to a bold aesthetic direction. Consider the tone: brutally minimal, maximalist, retro-futuristic, organic/natural, luxury/refined, playful, editorial, brutalist, art deco, soft/pastel, industrial -- or invent something that fits the context. There are endless flavors. Use these for inspiration but design one that is true to the project.
Ask: what makes this unforgettable? What is the one thing someone will remember?
Match implementation complexity to the aesthetic vision. Maximalist designs need elaborate code with extensive animations and effects. Minimalist designs need restraint, precision, and careful attention to spacing, typography, and subtle details. Elegance comes from executing the vision well, not from intensity.

View File

@@ -93,7 +93,7 @@ argument-hint: "[what arguments the command accepts]"
## Tips for Effective Commands
- **Use $ARGUMENTS** placeholder for dynamic inputs
- **Reference CLAUDE.md** patterns and conventions
- **Reference AGENTS.md** patterns and conventions
- **Include verification steps** - tests, linting, visual checks
- **Be explicit about constraints** - don't modify X, use pattern Y
- **Use XML tags** for structured prompts: `<task>`, `<requirements>`, `<constraints>`
@@ -114,7 +114,7 @@ Implement #$ARGUMENTS following these steps:
3. Implement
- Follow existing code patterns (reference specific files)
- Write tests first if doing TDD
- Ensure code follows CLAUDE.md conventions
- Ensure code follows AGENTS.md conventions
4. Verify
- Run tests: `bin/rails test`

View File

@@ -16,6 +16,7 @@ This skill provides a unified interface for managing Git worktrees across your d
- **Interactive confirmations** at each step
- **Automatic .gitignore management** for worktree directory
- **Automatic .env file copying** from main repo to new worktrees
- **Automatic dev tool trusting** for mise and direnv configs with review-safe guardrails
## CRITICAL: Always Use the Manager Script
@@ -23,8 +24,11 @@ This skill provides a unified interface for managing Git worktrees across your d
The script handles critical setup that raw git commands don't:
1. Copies `.env`, `.env.local`, `.env.test`, etc. from main repo
2. Ensures `.worktrees` is in `.gitignore`
3. Creates consistent directory structure
2. Trusts dev tool configs with branch-aware safety rules:
- mise: auto-trust only when unchanged from a trusted baseline branch
- direnv: auto-allow only for trusted base branches; review worktrees stay manual
3. Ensures `.worktrees` is in `.gitignore`
4. Creates consistent directory structure
```bash
# ✅ CORRECT - Always use the script
@@ -95,7 +99,11 @@ bash ${CLAUDE_PLUGIN_ROOT}/skills/git-worktree/scripts/worktree-manager.sh creat
2. Updates the base branch from remote
3. Creates new worktree and branch
4. **Copies all .env files from main repo** (.env, .env.local, .env.test, etc.)
5. Shows path for cd-ing to the worktree
5. **Trusts dev tool configs** with branch-aware safety rules:
- trusted bases (`main`, `develop`, `dev`, `trunk`, `staging`, `release/*`) compare against themselves
- other branches compare against the default branch
- direnv auto-allow is skipped on non-trusted bases because `.envrc` can source unchecked files
6. Shows path for cd-ing to the worktree
### `list` or `ls`

View File

@@ -65,6 +65,137 @@ copy_env_files() {
echo -e " ${GREEN}✓ Copied $copied environment file(s)${NC}"
}
# Resolve the repository default branch, falling back to main when origin/HEAD
# is unavailable (for example in single-branch clones).
get_default_branch() {
local head_ref
head_ref=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null || true)
if [[ -n "$head_ref" ]]; then
echo "${head_ref#refs/remotes/origin/}"
else
echo "main"
fi
}
# Auto-trust is only safe when the worktree is created from a long-lived branch
# the developer already controls. Review/PR branches should fall back to the
# default branch baseline and require manual direnv approval.
is_trusted_base_branch() {
local branch="$1"
local default_branch="$2"
[[ "$branch" == "$default_branch" ]] && return 0
case "$branch" in
develop|dev|trunk|staging|release/*)
return 0
;;
*)
return 1
;;
esac
}
# Trust development tool configs in a new worktree.
# Worktrees get a new filesystem path that tools like mise and direnv
# have never seen. Without trusting, these tools block with interactive
# prompts or refuse to load configs, which breaks hooks and scripts.
#
# Safety: auto-trusts only configs unchanged from a trusted baseline branch.
# Review/PR branches fall back to the default-branch baseline, and direnv
# auto-allow is limited to trusted base branches because .envrc can source
# additional files that direnv does not validate.
#
# TOCTOU between hash-check and trust is acceptable for local dev use.
trust_dev_tools() {
local worktree_path="$1"
local base_ref="$2"
local allow_direnv_auto="$3"
local trusted=0
local skipped_messages=()
local manual_commands=()
# mise: trust the specific config file if present and unchanged
if command -v mise &>/dev/null; then
for f in .mise.toml mise.toml .tool-versions; do
if [[ -f "$worktree_path/$f" ]]; then
if _config_unchanged "$f" "$base_ref" "$worktree_path"; then
if (cd "$worktree_path" && mise trust "$f" --quiet); then
trusted=$((trusted + 1))
else
echo -e " ${YELLOW}Warning: 'mise trust $f' failed -- run manually in $worktree_path${NC}"
fi
else
skipped_messages+=("mise trust $f (config differs from $base_ref)")
manual_commands+=("mise trust $f")
fi
break
fi
done
fi
# direnv: allow .envrc
if command -v direnv &>/dev/null; then
if [[ -f "$worktree_path/.envrc" ]]; then
if [[ "$allow_direnv_auto" != "true" ]]; then
skipped_messages+=("direnv allow (.envrc auto-allow is disabled for non-trusted base branches)")
manual_commands+=("direnv allow")
elif _config_unchanged ".envrc" "$base_ref" "$worktree_path"; then
if (cd "$worktree_path" && direnv allow); then
trusted=$((trusted + 1))
else
echo -e " ${YELLOW}Warning: 'direnv allow' failed -- run manually in $worktree_path${NC}"
fi
else
skipped_messages+=("direnv allow (.envrc differs from $base_ref)")
manual_commands+=("direnv allow")
fi
fi
fi
if [[ $trusted -gt 0 ]]; then
echo -e " ${GREEN}✓ Trusted $trusted dev tool config(s)${NC}"
fi
if [[ ${#skipped_messages[@]} -gt 0 ]]; then
echo -e " ${YELLOW}Skipped auto-trust for config(s) requiring manual review:${NC}"
for item in "${skipped_messages[@]}"; do
echo -e " - $item"
done
if [[ ${#manual_commands[@]} -gt 0 ]]; then
local joined
joined=$(printf ' && %s' "${manual_commands[@]}")
echo -e " ${BLUE}Review the diff, then run manually: cd $worktree_path${joined}${NC}"
fi
fi
}
# Check if a config file is unchanged from the base branch.
# Returns 0 (true) if the file is identical to the base branch version.
# Returns 1 (false) if the file was added or modified by this branch.
#
# Note: rev-parse returns the stored blob hash; hash-object on a path applies
# gitattributes filters. A mismatch causes a false negative (trust skipped),
# which is the safe direction.
_config_unchanged() {
local file="$1"
local base_ref="$2"
local worktree_path="$3"
# Reject symlinks -- trust only regular files with verifiable content
[[ -L "$worktree_path/$file" ]] && return 1
# Get the blob hash directly from git's object database via rev-parse
local base_hash
base_hash=$(git rev-parse "$base_ref:$file" 2>/dev/null) || return 1
local worktree_hash
worktree_hash=$(git hash-object "$worktree_path/$file") || return 1
[[ "$base_hash" == "$worktree_hash" ]]
}
# Create a new worktree
create_worktree() {
local branch_name="$1"
@@ -107,6 +238,29 @@ create_worktree() {
# Copy environment files
copy_env_files "$worktree_path"
# Trust dev tool configs (mise, direnv) so hooks and scripts work immediately.
# Long-lived integration branches can use themselves as the trust baseline,
# while review/PR branches fall back to the default branch and require manual
# direnv approval.
local default_branch
default_branch=$(get_default_branch)
local trust_branch="$default_branch"
local allow_direnv_auto="false"
if is_trusted_base_branch "$from_branch" "$default_branch"; then
trust_branch="$from_branch"
allow_direnv_auto="true"
fi
if ! git fetch origin "$trust_branch" --quiet; then
echo -e " ${YELLOW}Warning: could not fetch origin/$trust_branch -- trust check may use stale data${NC}"
fi
# Skip trust entirely if the baseline ref doesn't exist locally.
if git rev-parse --verify "origin/$trust_branch" &>/dev/null; then
trust_dev_tools "$worktree_path" "origin/$trust_branch" "$allow_direnv_auto"
else
echo -e " ${YELLOW}Skipping dev tool trust -- origin/$trust_branch not found locally${NC}"
fi
echo -e "${GREEN}✓ Worktree created successfully!${NC}"
echo ""
echo "To switch to this worktree:"
@@ -321,6 +475,15 @@ Environment Files:
- Creates .backup files if destination already exists
- Use 'copy-env' to refresh env files after main repo changes
Dev Tool Trust:
- Trusts mise config (.mise.toml, mise.toml, .tool-versions) and direnv (.envrc)
- Uses trusted base branches directly (main, develop, dev, trunk, staging, release/*)
- Other branches fall back to the default branch as the trust baseline
- direnv auto-allow is skipped on non-trusted base branches; review manually first
- Modified configs are flagged for manual review
- Only runs if the tool is installed and config exists
- Prevents hooks/scripts from hanging on interactive trust prompts
Examples:
worktree-manager.sh create feature-login
worktree-manager.sh create feature-auth develop

View File

@@ -1,143 +0,0 @@
---
name: heal-skill
description: Fix incorrect SKILL.md files when a skill has wrong instructions or outdated API references
argument-hint: "[optional: specific issue to fix]"
allowed-tools: [Read, Edit, Bash(ls:*), Bash(git:*)]
disable-model-invocation: true
---
<objective>
Update a skill's SKILL.md and related files based on corrections discovered during execution.
Analyze the conversation to detect which skill is running, reflect on what went wrong, propose specific fixes, get user approval, then apply changes with optional commit.
</objective>
<context>
Skill detection: !`ls -1 ./skills/*/SKILL.md | head -5`
</context>
<quick_start>
<workflow>
1. **Detect skill** from conversation context (invocation messages, recent SKILL.md references)
2. **Reflect** on what went wrong and how you discovered the fix
3. **Present** proposed changes with before/after diffs
4. **Get approval** before making any edits
5. **Apply** changes and optionally commit
</workflow>
</quick_start>
<process>
<step_1 name="detect_skill">
Identify the skill from conversation context:
- Look for skill invocation messages
- Check which SKILL.md was recently referenced
- Examine current task context
Set: `SKILL_NAME=[skill-name]` and `SKILL_DIR=./skills/$SKILL_NAME`
If unclear, ask the user.
</step_1>
<step_2 name="reflection_and_analysis">
Focus on $ARGUMENTS if provided, otherwise analyze broader context.
Determine:
- **What was wrong**: Quote specific sections from SKILL.md that are incorrect
- **Discovery method**: Context7, error messages, trial and error, documentation lookup
- **Root cause**: Outdated API, incorrect parameters, wrong endpoint, missing context
- **Scope of impact**: Single section or multiple? Related files affected?
- **Proposed fix**: Which files, which sections, before/after for each
</step_2>
<step_3 name="scan_affected_files">
```bash
ls -la $SKILL_DIR/
ls -la $SKILL_DIR/references/ 2>/dev/null
ls -la $SKILL_DIR/scripts/ 2>/dev/null
```
</step_3>
<step_4 name="present_proposed_changes">
Present changes in this format:
```
**Skill being healed:** [skill-name]
**Issue discovered:** [1-2 sentence summary]
**Root cause:** [brief explanation]
**Files to be modified:**
- [ ] SKILL.md
- [ ] references/[file].md
- [ ] scripts/[file].py
**Proposed changes:**
### Change 1: SKILL.md - [Section name]
**Location:** Line [X] in SKILL.md
**Current (incorrect):**
```
[exact text from current file]
```
**Corrected:**
```
[new text]
```
**Reason:** [why this fixes the issue]
[repeat for each change across all files]
**Impact assessment:**
- Affects: [authentication/API endpoints/parameters/examples/etc.]
**Verification:**
These changes will prevent: [specific error that prompted this]
```
</step_4>
<step_5 name="request_approval">
```
Should I apply these changes?
1. Yes, apply and commit all changes
2. Apply but don't commit (let me review first)
3. Revise the changes (I'll provide feedback)
4. Cancel (don't make changes)
Choose (1-4):
```
**Wait for user response. Do not proceed without approval.**
</step_5>
<step_6 name="apply_changes">
Only after approval (option 1 or 2):
1. Use Edit tool for each correction across all files
2. Read back modified sections to verify
3. If option 1, commit with structured message showing what was healed
4. Confirm completion with file list
</step_6>
</process>
<success_criteria>
- Skill correctly detected from conversation context
- All incorrect sections identified with before/after
- User approved changes before application
- All edits applied across SKILL.md and related files
- Changes verified by reading back
- Commit created if user chose option 1
- Completion confirmed with file list
</success_criteria>
<verification>
Before completing:
- Read back each modified section to confirm changes applied
- Ensure cross-file consistency (SKILL.md examples match references/)
- Verify git commit created if option 1 was selected
- Check no unintended files were modified
</verification>

View File

@@ -5,25 +5,27 @@ argument-hint: "[feature description]"
disable-model-invocation: true
---
CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any step. Do NOT jump ahead to coding or implementation. The plan phase (steps 2-3) MUST be completed and verified BEFORE any work begins. Violating this order produces bad output.
CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any required step. Do NOT jump ahead to coding or implementation. The plan phase (step 2, and step 3 when warranted) MUST be completed and verified BEFORE any work begins. Violating this order produces bad output.
1. **Optional:** If the `ralph-wiggum` skill is available, run `/ralph-wiggum:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.
1. **Optional:** If the `ralph-loop` skill is available, run `/ralph-loop:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.
2. `/ce:plan $ARGUMENTS`
GATE: STOP. Verify that `/ce:plan` produced a plan file in `docs/plans/`. If no plan file was created, run `/ce:plan $ARGUMENTS` again. Do NOT proceed to step 3 until a written plan exists.
GATE: STOP. Verify that the `ce:plan` workflow produced a plan file in `docs/plans/`. If no plan file was created, run `/ce:plan $ARGUMENTS` again. Do NOT proceed to step 3 until a written plan exists.
3. `/compound-engineering:deepen-plan`
3. **Conditionally** run `/compound-engineering:deepen-plan`
GATE: STOP. Confirm the plan has been deepened and updated. The plan file in `docs/plans/` should now contain additional detail. Do NOT proceed to step 4 without a deepened plan.
Run the `deepen-plan` workflow only if the plan is `Standard` or `Deep`, touches a high-risk area (auth, security, payments, migrations, external APIs, significant rollout concerns), or still has obvious confidence gaps in decisions, sequencing, system-wide impact, risks, or verification.
GATE: STOP. If you ran the `deepen-plan` workflow, confirm the plan was deepened or explicitly judged sufficiently grounded. If you skipped it, briefly note why and proceed to step 4.
4. `/ce:work`
GATE: STOP. Verify that implementation work was performed - files were created or modified beyond the plan. Do NOT proceed to step 5 if no code changes were made.
5. `/ce:review`
5. `/ce:review mode:autofix`
6. `/compound-engineering:resolve_todo_parallel`
6. `/compound-engineering:todo-resolve`
7. `/compound-engineering:test-browser`
@@ -31,4 +33,4 @@ CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any step. Do N
9. Output `<promise>DONE</promise>` when video is in PR
Start with step 2 now (or step 1 if ralph-wiggum is available). Remember: plan FIRST, then work. Never skip the plan.
Start with step 2 now (or step 1 if ralph-loop is available). Remember: plan FIRST, then work. Never skip the plan.

View File

@@ -1,17 +1,17 @@
---
name: report-bug
name: report-bug-ce
description: Report a bug in the compound-engineering plugin
argument-hint: "[optional: brief description of the bug]"
disable-model-invocation: true
---
# Report a Compounding Engineering Plugin Bug
# Report a Compound Engineering Plugin Bug
Report bugs encountered while using the compound-engineering plugin. This command gathers structured information and creates a GitHub issue for the maintainer.
Report bugs encountered while using the compound-engineering plugin. This skill gathers structured information and creates a GitHub issue for the maintainer.
## Step 1: Gather Bug Information
Use the AskUserQuestion tool to collect the following information:
Ask the user the following questions (using the platform's blocking question tool — e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini — or present numbered options and wait for a reply):
**Question 1: Bug Category**
- What type of issue are you experiencing?
@@ -39,18 +39,25 @@ Use the AskUserQuestion tool to collect the following information:
## Step 2: Collect Environment Information
Automatically gather:
Automatically gather environment details. Detect the coding agent platform and collect what is available:
**OS info (all platforms):**
```bash
# Get plugin version
cat ~/.claude/plugins/installed_plugins.json 2>/dev/null | grep -A5 "compound-engineering" | head -10 || echo "Plugin info not found"
# Get Claude Code version
claude --version 2>/dev/null || echo "Claude CLI version unknown"
# Get OS info
uname -a
```
**Plugin version:** Read the plugin manifest or installed plugin metadata. Common locations:
- Claude Code: `~/.claude/plugins/installed_plugins.json`
- Codex: `.codex/plugins/` or project config
- Other platforms: check the platform's plugin registry
**Agent CLI version:** Run the platform's version command:
- Claude Code: `claude --version`
- Codex: `codex --version`
- Other platforms: use the appropriate CLI version flag
If any of these fail, note "unknown" and continue — do not block the report.
## Step 3: Format the Bug Report
Create a well-structured bug report with:
@@ -63,8 +70,9 @@ Create a well-structured bug report with:
## Environment
- **Plugin Version:** [from installed_plugins.json]
- **Claude Code Version:** [from claude --version]
- **Plugin Version:** [from plugin manifest/registry]
- **Agent Platform:** [e.g., Claude Code, Codex, Copilot, Pi, Kilo]
- **Agent Version:** [from CLI version command]
- **OS:** [from uname]
## What Happened
@@ -83,16 +91,14 @@ Create a well-structured bug report with:
## Error Messages
```
[Any error output]
```
## Additional Context
[Any other relevant information]
---
*Reported via `/report-bug` command*
*Reported via `/report-bug-ce` skill*
```
## Step 4: Create GitHub Issue
@@ -125,7 +131,7 @@ After the issue is created:
## Output Format
```
Bug report submitted successfully!
Bug report submitted successfully!
Issue: https://github.com/EveryInc/compound-engineering-plugin/issues/[NUMBER]
Title: [compound-engineering] Bug: [description]
@@ -136,16 +142,16 @@ The maintainer will review your report and respond as soon as possible.
## Error Handling
- If `gh` CLI is not authenticated: Prompt user to run `gh auth login` first
- If issue creation fails: Display the formatted report so user can manually create the issue
- If required information is missing: Re-prompt for that specific field
- If `gh` CLI is not installed or not authenticated: prompt the user to install/authenticate first
- If issue creation fails: display the formatted report so the user can manually create the issue
- If required information is missing: re-prompt for that specific field
## Privacy Notice
This command does NOT collect:
This skill does NOT collect:
- Personal information
- API keys or credentials
- Private code from your projects
- Private code from projects
- File paths beyond basic OS info
Only technical information about the bug is included in the report.

View File

@@ -1,100 +1,194 @@
---
name: reproduce-bug
description: Reproduce and investigate a bug using logs, console inspection, and browser screenshots
argument-hint: "[GitHub issue number]"
disable-model-invocation: true
description: Systematically reproduce and investigate a bug from a GitHub issue. Use when the user provides a GitHub issue number or URL for a bug they want reproduced or investigated.
argument-hint: "[GitHub issue number or URL]"
---
# Reproduce Bug Command
# Reproduce Bug
Look at github issue #$ARGUMENTS and read the issue description and comments.
A framework-agnostic, hypothesis-driven workflow for reproducing and investigating bugs from issue reports. Works across any language, framework, or project type.
## Phase 1: Log Investigation
## Phase 1: Understand the Issue
Run the following agents in parallel to investigate the bug:
Fetch and analyze the bug report to extract structured information before touching the codebase.
1. Task rails-console-explorer(issue_description)
2. Task appsignal-log-investigator(issue_description)
### Fetch the issue
Think about the places it could go wrong looking at the codebase. Look for logging output we can look for.
If no issue number or URL was provided as an argument, ask the user for one before proceeding (using the platform's question tool -- e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini -- or present a prompt and wait for a reply).
Run the agents again to find any logs that could help us reproduce the bug.
Keep running these agents until you have a good idea of what is going on.
## Phase 2: Visual Reproduction with Playwright
If the bug is UI-related or involves user flows, use Playwright to visually reproduce it:
### Step 1: Verify Server is Running
```
mcp__plugin_compound-engineering_pw__browser_navigate({ url: "http://localhost:3000" })
mcp__plugin_compound-engineering_pw__browser_snapshot({})
```bash
gh issue view $ARGUMENTS --json title,body,comments,labels,assignees
```
If server not running, inform user to start `bin/dev`.
If the argument is a URL rather than a number, extract the issue number or pass the URL directly to `gh`.
### Step 2: Navigate to Affected Area
### Extract key details
Based on the issue description, navigate to the relevant page:
Read the issue and comments, then identify:
```
mcp__plugin_compound-engineering_pw__browser_navigate({ url: "http://localhost:3000/[affected_route]" })
mcp__plugin_compound-engineering_pw__browser_snapshot({})
- **Reported symptoms** -- what the user observed (error message, wrong output, visual glitch, crash)
- **Expected behavior** -- what should have happened instead
- **Reproduction steps** -- any steps the reporter provided
- **Environment clues** -- browser, OS, version, user role, data conditions
- **Frequency** -- always reproducible, intermittent, or one-time
If the issue lacks reproduction steps or is ambiguous, note what is missing -- this shapes the investigation strategy.
## Phase 2: Hypothesize
Before running anything, form theories about the root cause. This focuses the investigation and prevents aimless exploration.
### Search for relevant code
Use the native content-search tool (e.g., Grep in Claude Code) to find code paths related to the reported symptoms. Search for:
- Error messages or strings mentioned in the issue
- Feature names, route paths, or UI labels described in the report
- Related model/service/controller names
### Form hypotheses
Based on the issue details and code search results, write down 2-3 plausible hypotheses. Each should identify:
- **What** might be wrong (e.g., "race condition in session refresh", "nil check missing on optional field")
- **Where** in the codebase (specific files and line ranges)
- **Why** it would produce the reported symptoms
Rank hypotheses by likelihood. Start investigating the most likely one first.
## Phase 3: Reproduce
Attempt to trigger the bug. The reproduction strategy depends on the bug type.
### Route A: Test-based reproduction (backend, logic, data bugs)
Write or find an existing test that exercises the suspected code path:
1. Search for existing test files covering the affected code using the native file-search tool (e.g., Glob in Claude Code)
2. Run existing tests to see if any already fail
3. If no test covers the scenario, write a minimal failing test that demonstrates the reported behavior
4. A failing test that matches the reported symptoms confirms the bug
### Route B: Browser-based reproduction (UI, visual, interaction bugs)
Use the `agent-browser` CLI for browser automation. Do not use any alternative browser MCP integration or built-in browser-control tool. See the `agent-browser` skill for setup and detailed CLI usage.
#### Verify server is running
```bash
agent-browser open http://localhost:${PORT:-3000}
agent-browser snapshot -i
```
### Step 3: Capture Screenshots
If the server is not running, ask the user to start their development server and provide the correct port.
Take screenshots at each step of reproducing the bug:
To detect the correct port, check project instruction files (`AGENTS.md`, `CLAUDE.md`) for port references, then `package.json` dev scripts, then `.env` files, falling back to `3000`.
```
mcp__plugin_compound-engineering_pw__browser_take_screenshot({ filename: "bug-[issue]-step-1.png" })
#### Follow reproduction steps
Navigate to the affected area and execute the steps from the issue:
```bash
agent-browser open "http://localhost:${PORT}/[affected_route]"
agent-browser snapshot -i
```
### Step 4: Follow User Flow
Use `agent-browser` commands to interact with the page:
- `agent-browser click @ref` -- click elements
- `agent-browser fill @ref "text"` -- fill form fields
- `agent-browser snapshot -i` -- capture current state
- `agent-browser screenshot bug-evidence.png` -- save visual evidence
Reproduce the exact steps from the issue:
#### Capture the bug state
1. **Read the issue's reproduction steps**
2. **Execute each step using Playwright:**
- `browser_click` for clicking elements
- `browser_type` for filling forms
- `browser_snapshot` to see the current state
- `browser_take_screenshot` to capture evidence
When the bug is reproduced:
1. Take a screenshot of the error state
2. Check for console errors: look at browser output and any visible error messages
3. Record the exact sequence of steps that triggered it
3. **Check for console errors:**
```
mcp__plugin_compound-engineering_pw__browser_console_messages({ level: "error" })
```
### Route C: Manual / environment-specific reproduction
### Step 5: Capture Bug State
For bugs that require specific data conditions, user roles, external service state, or cannot be automated:
When you reproduce the bug:
1. Document what conditions are needed
2. Ask the user (using the platform's question tool -- e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini -- or present options and wait for a reply) whether they can set up the required conditions
3. Guide them through manual reproduction steps if needed
1. Take a screenshot of the bug state
2. Capture console errors
3. Document the exact steps that triggered it
### If reproduction fails
If the bug cannot be reproduced after trying the most likely hypotheses:
1. Revisit the remaining hypotheses
2. Check if the bug is environment-specific (version, OS, browser, data-dependent)
3. Search the codebase for recent changes to the affected area: `git log --oneline -20 -- [affected_files]`
4. Document what was tried and what conditions might be missing
## Phase 4: Investigate
Dig deeper into the root cause using whatever observability the project offers.
### Check logs and traces
Search for errors, warnings, or unexpected behavior around the time of reproduction. What to check depends on the bug and what the project has available:
- **Application logs** -- search local log output (dev server stdout, log files) for error patterns, stack traces, or warnings using the native content-search tool
- **Error tracking** -- check for related exceptions in the project's error tracker (Sentry, AppSignal, Bugsnag, Datadog, etc.)
- **Browser console** -- for UI bugs, check developer console output for JavaScript errors, failed network requests, or CORS issues
- **Database state** -- if the bug involves data, inspect relevant records for unexpected values, missing associations, or constraint violations
- **Request/response cycle** -- check server logs for the specific request: status codes, params, timing, middleware behavior
### Trace the code path
Starting from the entry point identified in Phase 2, trace the execution path:
1. Read the relevant source files using the native file-read tool
2. Identify where the behavior diverges from expectations
3. Check edge cases: nil/null values, empty collections, boundary conditions, race conditions
4. Look for recent changes that may have introduced the bug: `git log --oneline -10 -- [file]`
## Phase 5: Document Findings
Summarize everything discovered during the investigation.
### Compile the report
Organize findings into:
1. **Root cause** -- what is actually wrong and where (with file paths and line numbers, e.g., `app/services/example_service.rb:42`)
2. **Reproduction steps** -- verified steps to trigger the bug (mark as confirmed or unconfirmed)
3. **Evidence** -- screenshots, test output, log excerpts, console errors
4. **Suggested fix** -- if a fix is apparent, describe it with the specific code changes needed
5. **Open questions** -- anything still unclear or needing further investigation
### Present to user before any external action
Present the full report to the user. Do not post comments to the GitHub issue or take any external action without explicit confirmation.
Ask the user (using the platform's question tool, or present options and wait):
```
mcp__plugin_compound-engineering_pw__browser_take_screenshot({ filename: "bug-[issue]-reproduced.png" })
Investigation complete. How to proceed?
1. Post findings to the issue as a comment
2. Start working on a fix
3. Just review the findings (no external action)
```
## Phase 3: Document Findings
If the user chooses to post to the issue:
**Reference Collection:**
```bash
gh issue comment $ARGUMENTS --body "$(cat <<'EOF'
## Bug Investigation
- [ ] Document all research findings with specific file paths (e.g., `app/services/example_service.rb:42`)
- [ ] Include screenshots showing the bug reproduction
- [ ] List console errors if any
- [ ] Document the exact reproduction steps
**Root Cause:** [summary]
## Phase 4: Report Back
**Reproduction Steps (verified):**
1. [step]
2. [step]
Add a comment to the issue with:
**Relevant Code:** [file:line references]
1. **Findings** - What you discovered about the cause
2. **Reproduction Steps** - Exact steps to reproduce (verified)
3. **Screenshots** - Visual evidence of the bug (upload captured screenshots)
4. **Relevant Code** - File paths and line numbers
5. **Suggested Fix** - If you have one
**Suggested Fix:** [description if applicable]
EOF
)"
```

View File

@@ -12,7 +12,7 @@ Resolve all unresolved PR review comments by spawning parallel agents for each t
## Context Detection
Claude Code automatically detects git context:
Detect git context from the current working directory:
- Current branch and associated PR
- All PR comments and review threads
- Works with any PR by specifying the number
@@ -21,10 +21,10 @@ Claude Code automatically detects git context:
### 1. Analyze
Fetch unresolved review threads using the GraphQL script:
Fetch unresolved review threads using the GraphQL script at [scripts/get-pr-comments](scripts/get-pr-comments):
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/get-pr-comments PR_NUMBER
bash scripts/get-pr-comments PR_NUMBER
```
This returns only **unresolved, non-outdated** threads with file paths, line numbers, and comment bodies.
@@ -37,7 +37,7 @@ gh api repos/{owner}/{repo}/pulls/PR_NUMBER/comments
### 2. Plan
Create a TodoWrite list of all unresolved items grouped by type:
Create a task list of all unresolved items grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex):
- Code changes requested
- Questions to answer
- Style/convention fixes
@@ -45,23 +45,27 @@ Create a TodoWrite list of all unresolved items grouped by type:
### 3. Implement (PARALLEL)
Spawn a `pr-comment-resolver` agent for each unresolved item in parallel.
Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each unresolved item.
If there are 3 comments, spawn 3 agents:
If there are 3 comments, spawn 3 agents — one per comment. Prefer running all agents in parallel; if the platform does not support parallel dispatch, run them sequentially.
1. Task pr-comment-resolver(comment1)
2. Task pr-comment-resolver(comment2)
3. Task pr-comment-resolver(comment3)
Keep parent-context pressure bounded:
- If there are 1-4 unresolved items, direct parallel returns are fine
- If there are 5+ unresolved items, launch in batches of at most 4 agents at a time
- Require each resolver agent to return a short status summary to the parent: comment/thread handled, files changed, tests run or skipped, any blocker that still needs human attention, and for question-only threads the substantive reply text so the parent can post or verify it
Always run all in parallel subagents/Tasks for each Todo item.
If the PR is large enough that even batched short returns are likely to get noisy, use a per-run scratch directory such as `.context/compound-engineering/resolve-pr-parallel/<run-id>/`:
- Have each resolver write a compact artifact for its thread there
- Return only a completion summary to the parent
- Re-read only the artifacts that are needed to resolve threads, answer reviewer questions, or summarize the batch
### 4. Commit & Resolve
- Commit changes with a clear message referencing the PR feedback
- Resolve each thread programmatically:
- Resolve each thread programmatically using [scripts/resolve-pr-thread](scripts/resolve-pr-thread):
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/resolve-pr-thread THREAD_ID
bash scripts/resolve-pr-thread THREAD_ID
```
- Push to remote
@@ -71,11 +75,13 @@ bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/resolve-pr-thread
Re-fetch comments to confirm all threads are resolved:
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/get-pr-comments PR_NUMBER
bash scripts/get-pr-comments PR_NUMBER
```
Should return an empty array `[]`. If threads remain, repeat from step 1.
If a scratch directory was used and the user did not ask to inspect it, clean it up after verification succeeds.
## Scripts
- [scripts/get-pr-comments](scripts/get-pr-comments) - GraphQL query for unresolved review threads

View File

@@ -1,35 +0,0 @@
---
name: resolve_parallel
description: Resolve all TODO comments using parallel processing
argument-hint: "[optional: specific TODO pattern or file]"
disable-model-invocation: true
---
Resolve all TODO comments using parallel processing.
## Workflow
### 1. Analyze
Gather the things todo from above.
### 2. Plan
Create a TodoWrite list of all unresolved items grouped by type.Make sure to look at dependencies that might occur and prioritize the ones needed by others. For example, if you need to change a name, you must wait to do the others. Output a mermaid flow diagram showing how we can do this. Can we do everything in parallel? Do we need to do one first that leads to others in parallel? I'll put the to-dos in the mermaid diagram flowwise so the agent knows how to proceed in order.
### 3. Implement (PARALLEL)
Spawn a pr-comment-resolver agent for each unresolved item in parallel.
So if there are 3 comments, it will spawn 3 pr-comment-resolver agents in parallel. liek this
1. Task pr-comment-resolver(comment1)
2. Task pr-comment-resolver(comment2)
3. Task pr-comment-resolver(comment3)
Always run all in parallel subagents/Tasks for each Todo item.
### 4. Commit & Resolve
- Commit changes
- Push to remote

View File

@@ -1,39 +0,0 @@
---
name: resolve_todo_parallel
description: Resolve all pending CLI todos using parallel processing
argument-hint: "[optional: specific todo ID or pattern]"
---
Resolve all TODO comments using parallel processing.
## Workflow
### 1. Analyze
Get all unresolved TODOs from the /todos/\*.md directory
If any todo recommends deleting, removing, or gitignoring files in `docs/plans/` or `docs/solutions/`, skip it and mark it as `wont_fix`. These are compound-engineering pipeline artifacts that are intentional and permanent.
### 2. Plan
Create a TodoWrite list of all unresolved items grouped by type.Make sure to look at dependencies that might occur and prioritize the ones needed by others. For example, if you need to change a name, you must wait to do the others. Output a mermaid flow diagram showing how we can do this. Can we do everything in parallel? Do we need to do one first that leads to others in parallel? I'll put the to-dos in the mermaid diagram flowwise so the agent knows how to proceed in order.
### 3. Implement (PARALLEL)
**IMPORTANT: Do NOT create worktrees per todo item.** A worktree or branch was already set up before this command was invoked (typically by `/ce:work`). If a worktree path was provided in the original prompt, `cd` into it. Otherwise, find the worktree where the working branch is checked out using `git worktree list`. All agents work in that single checkout — never pass `isolation: "worktree"` when spawning agents.
Spawn a pr-comment-resolver agent for each unresolved item in parallel.
So if there are 3 comments, it will spawn 3 pr-comment-resolver agents in parallel. Like this:
1. Task pr-comment-resolver(comment1)
2. Task pr-comment-resolver(comment2)
3. Task pr-comment-resolver(comment3)
Always run all in parallel subagents/Tasks for each Todo item.
### 4. Commit & Resolve
- Commit changes
- Remove the TODO from the file, and mark it as resolved.
- Push to remote

View File

@@ -8,26 +8,20 @@ disable-model-invocation: true
## Interaction Method
If `AskUserQuestion` is available, use it for all prompts below.
Ask the user each question below using the platform's blocking question tool (e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no structured question tool is available, present each question as a numbered list and wait for a reply before proceeding. For multiSelect questions, accept comma-separated numbers (e.g. `1, 3`). Never skip or auto-configure.
If not, present each question as a numbered list and wait for a reply before proceeding to the next step. For multiSelect questions, accept comma-separated numbers (e.g. `1, 3`). Never skip or auto-configure.
Interactive setup for `compound-engineering.local.md` — configures which agents run during `/ce:review` and `/ce:work`.
Interactive setup for `compound-engineering.local.md` — configures which agents run during `ce:review` and `ce:work`.
## Step 1: Check Existing Config
Read `compound-engineering.local.md` in the project root. If it exists, display current settings summary and use AskUserQuestion:
Read `compound-engineering.local.md` in the project root. If it exists, display current settings and ask:
```
question: "Settings file already exists. What would you like to do?"
header: "Config"
options:
- label: "Reconfigure"
description: "Run the interactive setup again from scratch"
- label: "View current"
description: "Show the file contents, then stop"
- label: "Cancel"
description: "Keep current settings"
Settings file already exists. What would you like to do?
1. Reconfigure - Run the interactive setup again from scratch
2. View current - Show the file contents, then stop
3. Cancel - Keep current settings
```
If "View current": read and display the file, then stop.
@@ -47,16 +41,13 @@ test -f requirements.txt && echo "python" || \
echo "general"
```
Use AskUserQuestion:
Ask:
```
question: "Detected {type} project. How would you like to configure?"
header: "Setup"
options:
- label: "Auto-configure (Recommended)"
description: "Use smart defaults for {type}. Done in one click."
- label: "Customize"
description: "Choose stack, focus areas, and review depth."
Detected {type} project. How would you like to configure?
1. Auto-configure (Recommended) - Use smart defaults for {type}. Done in one click.
2. Customize - Choose stack, focus areas, and review depth.
```
### If Auto-configure → Skip to Step 4 with defaults:
@@ -73,50 +64,35 @@ options:
**a. Stack** — confirm or override:
```
question: "Which stack should we optimize for?"
header: "Stack"
options:
- label: "{detected_type} (Recommended)"
description: "Auto-detected from project files"
- label: "Rails"
description: "Ruby on Rails — adds DHH-style and Rails-specific reviewers"
- label: "Python"
description: "Python — adds Pythonic pattern reviewer"
- label: "TypeScript"
description: "TypeScript — adds type safety reviewer"
Which stack should we optimize for?
1. {detected_type} (Recommended) - Auto-detected from project files
2. Rails - Ruby on Rails, adds DHH-style and Rails-specific reviewers
3. Python - Adds Pythonic pattern reviewer
4. TypeScript - Adds type safety reviewer
```
Only show options that differ from the detected type.
**b. Focus areas** — multiSelect:
**b. Focus areas** — multiSelect (user picks one or more):
```
question: "Which review areas matter most?"
header: "Focus"
multiSelect: true
options:
- label: "Security"
description: "Vulnerability scanning, auth, input validation (security-sentinel)"
- label: "Performance"
description: "N+1 queries, memory leaks, complexity (performance-oracle)"
- label: "Architecture"
description: "Design patterns, SOLID, separation of concerns (architecture-strategist)"
- label: "Code simplicity"
description: "Over-engineering, YAGNI violations (code-simplicity-reviewer)"
Which review areas matter most? (comma-separated, e.g. 1, 3)
1. Security - Vulnerability scanning, auth, input validation (security-sentinel)
2. Performance - N+1 queries, memory leaks, complexity (performance-oracle)
3. Architecture - Design patterns, SOLID, separation of concerns (architecture-strategist)
4. Code simplicity - Over-engineering, YAGNI violations (code-simplicity-reviewer)
```
**c. Depth:**
```
question: "How thorough should reviews be?"
header: "Depth"
options:
- label: "Thorough (Recommended)"
description: "Stack reviewers + all selected focus agents."
- label: "Fast"
description: "Stack reviewers + code simplicity only. Less context, quicker."
- label: "Comprehensive"
description: "All above + git history, data integrity, agent-native checks."
How thorough should reviews be?
1. Thorough (Recommended) - Stack reviewers + all selected focus agents.
2. Fast - Stack reviewers + code simplicity only. Less context, quicker.
3. Comprehensive - All above + git history, data integrity, agent-native checks.
```
## Step 4: Build Agent List and Write File
@@ -151,7 +127,7 @@ plan_review_agents: [{computed plan agent list}]
# Review Context
Add project-specific review instructions here.
These notes are passed to all review agents during /ce:review and /ce:work.
These notes are passed to all review agents during ce:review and ce:work.
Examples:
- "We use Turbo Frames heavily — check for frame-busting issues"

View File

@@ -9,24 +9,31 @@ Swarm-enabled LFG. Run these steps in order, parallelizing where indicated. Do n
## Sequential Phase
1. **Optional:** If the `ralph-wiggum` skill is available, run `/ralph-wiggum:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.
1. **Optional:** If the `ralph-loop` skill is available, run `/ralph-loop:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.
2. `/ce:plan $ARGUMENTS`
3. `/compound-engineering:deepen-plan`
3. **Conditionally** run `/compound-engineering:deepen-plan`
- Run the `deepen-plan` workflow only if the plan is `Standard` or `Deep`, touches a high-risk area (auth, security, payments, migrations, external APIs, significant rollout concerns), or still has obvious confidence gaps in decisions, sequencing, system-wide impact, risks, or verification
- If you run the `deepen-plan` workflow, confirm the plan was deepened or explicitly judged sufficiently grounded before moving on
- If you skip it, note why and continue to step 4
4. `/ce:work`**Use swarm mode**: Make a Task list and launch an army of agent swarm subagents to build the plan
## Parallel Phase
After work completes, launch steps 5 and 6 as **parallel swarm agents** (both only need code to be written):
5. `/ce:review` — spawn as background Task agent
5. `/ce:review mode:report-only` — spawn as background Task agent
6. `/compound-engineering:test-browser` — spawn as background Task agent
Wait for both to complete before continuing.
## Autofix Phase
7. `/ce:review mode:autofix` — run sequentially after the parallel phase so it can safely mutate the checkout, apply `safe_auto` fixes, and emit residual todos for step 8
## Finalize Phase
7. `/compound-engineering:resolve_todo_parallel` — resolve any findings from the review
8. `/compound-engineering:feature-video` — record the final walkthrough and add to PR
9. Output `<promise>DONE</promise>` when video is in PR
8. `/compound-engineering:todo-resolve` — resolve findings, compound on learnings, clean up completed todos
9. `/compound-engineering:feature-video` — record the final walkthrough and add to PR
10. Output `<promise>DONE</promise>` when video is in PR
Start with step 1 now.

View File

@@ -4,56 +4,45 @@ description: Run browser tests on pages affected by current PR or branch
argument-hint: "[PR number, branch name, 'current', or --port PORT]"
---
# Browser Test Command
# Browser Test Skill
<command_purpose>Run end-to-end browser tests on pages affected by a PR or branch changes using agent-browser CLI.</command_purpose>
Run end-to-end browser tests on pages affected by a PR or branch changes using the `agent-browser` CLI.
## CRITICAL: Use agent-browser CLI Only
## Use `agent-browser` Only For Browser Automation
**DO NOT use Chrome MCP tools (mcp__claude-in-chrome__*).**
This workflow uses the `agent-browser` CLI exclusively. Do not use any alternative browser automation system, browser MCP integration, or built-in browser-control tool. If the platform offers multiple ways to control a browser, always choose `agent-browser`.
This command uses the `agent-browser` CLI exclusively. The agent-browser CLI is a Bash-based tool from Vercel that runs headless Chromium. It is NOT the same as Chrome browser automation via MCP.
Use `agent-browser` for: opening pages, clicking elements, filling forms, taking screenshots, and scraping rendered content.
If you find yourself calling `mcp__claude-in-chrome__*` tools, STOP. Use `agent-browser` Bash commands instead.
## Introduction
<role>QA Engineer specializing in browser-based end-to-end testing</role>
This command tests affected pages in a real browser, catching issues that unit tests miss:
- JavaScript integration bugs
- CSS/layout regressions
- User workflow breakages
- Console errors
Platform-specific hints:
- In Claude Code, do not use Chrome MCP tools (`mcp__claude-in-chrome__*`).
- In Codex, do not substitute unrelated browsing tools.
## Prerequisites
<requirements>
- Local development server running (e.g., `bin/dev`, `rails server`, `npm run dev`)
- agent-browser CLI installed (see Setup below)
- `agent-browser` CLI installed (see Setup below)
- Git repository with changes to test
</requirements>
## Setup
**Check installation:**
```bash
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
```
**Install if needed:**
Install if needed:
```bash
npm install -g agent-browser
agent-browser install # Downloads Chromium (~160MB)
agent-browser install
```
See the `agent-browser` skill for detailed usage.
## Main Tasks
## Workflow
### 0. Verify agent-browser Installation
### 1. Verify Installation
Before starting ANY browser testing, verify agent-browser is installed:
Before starting, verify `agent-browser` is available:
```bash
command -v agent-browser >/dev/null 2>&1 && echo "Ready" || (echo "Installing..." && npm install -g agent-browser && agent-browser install)
@@ -61,27 +50,20 @@ command -v agent-browser >/dev/null 2>&1 && echo "Ready" || (echo "Installing...
If installation fails, inform the user and stop.
### 1. Ask Browser Mode
### 2. Ask Browser Mode
<ask_browser_mode>
Ask the user whether to run headed or headless (using the platform's question tool — e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini — or present options and wait for a reply):
Before starting tests, ask user if they want to watch the browser:
```
Do you want to watch the browser tests run?
Use AskUserQuestion with:
- Question: "Do you want to watch the browser tests run?"
- Options:
1. **Headed (watch)** - Opens visible browser window so you can see tests run
2. **Headless (faster)** - Runs in background, faster but invisible
1. Headed (watch) - Opens visible browser window so you can see tests run
2. Headless (faster) - Runs in background, faster but invisible
```
Store the choice and use `--headed` flag when user selects "Headed".
Store the choice and use the `--headed` flag when the user selects option 1.
</ask_browser_mode>
### 2. Determine Test Scope
<test_target> $ARGUMENTS </test_target>
<determine_scope>
### 3. Determine Test Scope
**If PR number provided:**
```bash
@@ -98,11 +80,7 @@ git diff --name-only main...HEAD
git diff --name-only main...[branch]
```
</determine_scope>
### 3. Map Files to Routes
<file_to_route_mapping>
### 4. Map Files to Routes
Map changed files to testable routes:
@@ -120,45 +98,23 @@ Map changed files to testable routes:
Build a list of URLs to test based on the mapping.
</file_to_route_mapping>
### 5. Detect Dev Server Port
### 4. Detect Dev Server Port
Determine the dev server port using this priority:
<detect_port>
Determine the dev server port using this priority order:
**Priority 1: Explicit argument**
If the user passed a port number (e.g., `/test-browser 5000` or `/test-browser --port 5000`), use that port directly.
**Priority 2: CLAUDE.md / project instructions**
```bash
# Check CLAUDE.md for port references
grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' CLAUDE.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1
```
**Priority 3: package.json scripts**
```bash
# Check dev/start scripts for --port flags
grep -Eo '\-\-port[= ]+[0-9]{4,5}' package.json 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1
```
**Priority 4: Environment files**
```bash
# Check .env, .env.local, .env.development for PORT=
grep -h '^PORT=' .env .env.local .env.development 2>/dev/null | tail -1 | cut -d= -f2
```
**Priority 5: Default fallback**
If none of the above yields a port, default to `3000`.
Store the result in a `PORT` variable for use in all subsequent steps.
1. **Explicit argument** — if the user passed `--port 5000`, use that directly
2. **Project instructions** — check `AGENTS.md`, `CLAUDE.md`, or other instruction files for port references
3. **package.json** — check dev/start scripts for `--port` flags
4. **Environment files** — check `.env`, `.env.local`, `.env.development` for `PORT=`
5. **Default** — fall back to `3000`
```bash
# Combined detection (run this)
PORT="${EXPLICIT_PORT:-}"
if [ -z "$PORT" ]; then
PORT=$(grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' CLAUDE.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1)
PORT=$(grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' AGENTS.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1)
if [ -z "$PORT" ]; then
PORT=$(grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' CLAUDE.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1)
fi
fi
if [ -z "$PORT" ]; then
PORT=$(grep -Eo '\-\-port[= ]+[0-9]{4,5}' package.json 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1)
@@ -170,77 +126,64 @@ PORT="${PORT:-3000}"
echo "Using dev server port: $PORT"
```
</detect_port>
### 5. Verify Server is Running
<check_server>
Before testing, verify the local server is accessible using the detected port:
### 6. Verify Server is Running
```bash
agent-browser open http://localhost:${PORT}
agent-browser snapshot -i
```
If server is not running, inform user:
```markdown
**Server not running on port ${PORT}**
If the server is not running, inform the user:
```
Server not running on port ${PORT}
Please start your development server:
- Rails: `bin/dev` or `rails server`
- Node/Next.js: `npm run dev`
- Custom port: `/test-browser --port <your-port>`
- Custom port: run this skill again with `--port <your-port>`
Then run `/test-browser` again.
Then re-run this skill.
```
</check_server>
### 7. Test Each Affected Page
### 6. Test Each Affected Page
For each affected route:
<test_pages>
For each affected route, use agent-browser CLI commands (NOT Chrome MCP):
**Step 1: Navigate and capture snapshot**
**Navigate and capture snapshot:**
```bash
agent-browser open "http://localhost:${PORT}/[route]"
agent-browser snapshot -i
```
**Step 2: For headed mode (visual debugging)**
**For headed mode:**
```bash
agent-browser --headed open "http://localhost:${PORT}/[route]"
agent-browser --headed snapshot -i
```
**Step 3: Verify key elements**
**Verify key elements:**
- Use `agent-browser snapshot -i` to get interactive elements with refs
- Page title/heading present
- Primary content rendered
- No error messages visible
- Forms have expected fields
**Step 4: Test critical interactions**
**Test critical interactions:**
```bash
agent-browser click @e1 # Use ref from snapshot
agent-browser click @e1
agent-browser snapshot -i
```
**Step 5: Take screenshots**
**Take screenshots:**
```bash
agent-browser screenshot page-name.png
agent-browser screenshot --full page-name-full.png # Full page
agent-browser screenshot --full page-name-full.png
```
</test_pages>
### 8. Human Verification (When Required)
### 7. Human Verification (When Required)
<human_verification>
Pause for human input when testing touches:
Pause for human input when testing touches flows that require external interaction:
| Flow Type | What to Ask |
|-----------|-------------|
@@ -250,11 +193,12 @@ Pause for human input when testing touches:
| SMS | "Verify you received the SMS code" |
| External APIs | "Confirm the [service] integration is working" |
Use AskUserQuestion:
```markdown
**Human Verification Needed**
Ask the user (using the platform's question tool, or present numbered options and wait):
This test touches the [flow type]. Please:
```
Human Verification Needed
This test touches [flow type]. Please:
1. [Action to take]
2. [What to verify]
@@ -263,11 +207,7 @@ Did it work correctly?
2. No - describe the issue
```
</human_verification>
### 8. Handle Failures
<failure_handling>
### 9. Handle Failures
When a test fails:
@@ -275,40 +215,27 @@ When a test fails:
- Screenshot the error state: `agent-browser screenshot error.png`
- Note the exact reproduction steps
2. **Ask user how to proceed:**
```markdown
**Test Failed: [route]**
2. **Ask the user how to proceed:**
```
Test Failed: [route]
Issue: [description]
Console errors: [if any]
How to proceed?
1. Fix now - I'll help debug and fix
2. Create todo - Add to todos/ for later
2. Create todo - Add a todo for later (using the todo-create skill)
3. Skip - Continue testing other pages
```
3. **If "Fix now":**
- Investigate the issue
- Propose a fix
- Apply fix
- Re-run the failing test
3. **If "Fix now":** investigate, propose a fix, apply, re-run the failing test
4. **If "Create todo":** load the `todo-create` skill and create a todo with priority p1 and description `browser-test-{description}`, continue
5. **If "Skip":** log as skipped, continue
4. **If "Create todo":**
- Create `{id}-pending-p1-browser-test-{description}.md`
- Continue testing
### 10. Test Summary
5. **If "Skip":**
- Log as skipped
- Continue testing
</failure_handling>
### 9. Test Summary
<test_summary>
After all tests complete, present summary:
After all tests complete, present a summary:
```markdown
## Browser Test Results
@@ -341,8 +268,6 @@ After all tests complete, present summary:
### Result: [PASS / FAIL / PARTIAL]
```
</test_summary>
## Quick Usage Examples
```bash
@@ -361,8 +286,6 @@ After all tests complete, present summary:
## agent-browser CLI Reference
**ALWAYS use these Bash commands. NEVER use mcp__claude-in-chrome__* tools.**
```bash
# Navigation
agent-browser open <url> # Navigate to URL

View File

@@ -5,167 +5,81 @@ argument-hint: "[scheme name or 'current' to use default]"
disable-model-invocation: true
---
# Xcode Test Command
# Xcode Test Skill
<command_purpose>Build, install, and test iOS apps on the simulator using XcodeBuildMCP. Captures screenshots, logs, and verifies app behavior.</command_purpose>
## Introduction
<role>iOS QA Engineer specializing in simulator-based testing</role>
This command tests iOS/macOS apps by:
- Building for simulator
- Installing and launching the app
- Taking screenshots of key screens
- Capturing console logs for errors
- Supporting human verification for external flows
Build, install, and test iOS apps on the simulator using XcodeBuildMCP. Captures screenshots, logs, and verifies app behavior.
## Prerequisites
<requirements>
- Xcode installed with command-line tools
- XcodeBuildMCP server connected
- XcodeBuildMCP MCP server connected
- Valid Xcode project or workspace
- At least one iOS Simulator available
</requirements>
## Main Tasks
## Workflow
### 0. Verify XcodeBuildMCP is Installed
### 0. Verify XcodeBuildMCP is Available
<check_mcp_installed>
Check that the XcodeBuildMCP MCP server is connected by calling its `list_simulators` tool.
**First, check if XcodeBuildMCP tools are available.**
MCP tool names vary by platform:
- Claude Code: `mcp__xcodebuildmcp__list_simulators`
- Other platforms: use the equivalent MCP tool call for the `XcodeBuildMCP` server's `list_simulators` method
If the tool is not found or errors, inform the user they need to add the XcodeBuildMCP MCP server:
Try calling:
```
mcp__xcodebuildmcp__list_simulators({})
XcodeBuildMCP not installed
Install via Homebrew:
brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp
Or via npx (no global install needed):
npx -y xcodebuildmcp@latest mcp
Then add "XcodeBuildMCP" as an MCP server in your agent configuration
and restart your agent.
```
**If the tool is not found or errors:**
Tell the user:
```markdown
**XcodeBuildMCP not installed**
Please install the XcodeBuildMCP server first:
\`\`\`bash
claude mcp add XcodeBuildMCP -- npx xcodebuildmcp@latest
\`\`\`
Then restart Claude Code and run `/xcode-test` again.
```
**Do NOT proceed** until XcodeBuildMCP is confirmed working.
</check_mcp_installed>
Do NOT proceed until XcodeBuildMCP is confirmed working.
### 1. Discover Project and Scheme
<discover_project>
Call XcodeBuildMCP's `discover_projs` tool to find available projects, then `list_schemes` with the project path to get available schemes.
**Find available projects:**
```
mcp__xcodebuildmcp__discover_projs({})
```
**List schemes for the project:**
```
mcp__xcodebuildmcp__list_schemes({ project_path: "/path/to/Project.xcodeproj" })
```
**If argument provided:**
- Use the specified scheme name
- Or "current" to use the default/last-used scheme
</discover_project>
If an argument was provided, use that scheme name. If "current", use the default/last-used scheme.
### 2. Boot Simulator
<boot_simulator>
Call `list_simulators` to find available simulators. Boot the preferred simulator (iPhone 15 Pro recommended) using `boot_simulator` with the simulator's UUID.
**List available simulators:**
```
mcp__xcodebuildmcp__list_simulators({})
```
**Boot preferred simulator (iPhone 15 Pro recommended):**
```
mcp__xcodebuildmcp__boot_simulator({ simulator_id: "[uuid]" })
```
**Wait for simulator to be ready:**
Check simulator state before proceeding with installation.
</boot_simulator>
Wait for the simulator to be ready before proceeding.
### 3. Build the App
<build_app>
Call `build_ios_sim_app` with the project path and scheme name.
**Build for iOS Simulator:**
```
mcp__xcodebuildmcp__build_ios_sim_app({
project_path: "/path/to/Project.xcodeproj",
scheme: "[scheme_name]"
})
```
**Handle build failures:**
**On failure:**
- Capture build errors
- Create P1 todo for each build error
- Create a P1 todo for each build error
- Report to user with specific error details
**On success:**
- Note the built app path for installation
- Proceed to installation step
</build_app>
- Proceed to step 4
### 4. Install and Launch
<install_launch>
**Install app on simulator:**
```
mcp__xcodebuildmcp__install_app_on_simulator({
app_path: "/path/to/built/App.app",
simulator_id: "[uuid]"
})
```
**Launch the app:**
```
mcp__xcodebuildmcp__launch_app_on_simulator({
bundle_id: "[app.bundle.id]",
simulator_id: "[uuid]"
})
```
**Start capturing logs:**
```
mcp__xcodebuildmcp__capture_sim_logs({
simulator_id: "[uuid]",
bundle_id: "[app.bundle.id]"
})
```
</install_launch>
1. Call `install_app_on_simulator` with the built app path and simulator UUID
2. Call `launch_app_on_simulator` with the bundle ID and simulator UUID
3. Call `capture_sim_logs` with the simulator UUID and bundle ID to start log capture
### 5. Test Key Screens
<test_screens>
For each key screen in the app:
**Take screenshot:**
```
mcp__xcodebuildmcp__take_screenshot({
simulator_id: "[uuid]",
filename: "screen-[name].png"
})
```
Call `take_screenshot` with the simulator UUID and a descriptive filename (e.g., `screen-home.png`).
**Review screenshot for:**
- UI elements rendered correctly
@@ -174,23 +88,15 @@ mcp__xcodebuildmcp__take_screenshot({
- Layout looks correct
**Check logs for errors:**
```
mcp__xcodebuildmcp__get_sim_logs({ simulator_id: "[uuid]" })
```
Look for:
Call `get_sim_logs` with the simulator UUID. Look for:
- Crashes
- Exceptions
- Error-level log messages
- Failed network requests
</test_screens>
### 6. Human Verification (When Required)
<human_verification>
Pause for human input when testing touches:
Pause for human input when testing touches flows that require device interaction.
| Flow Type | What to Ask |
|-----------|-------------|
@@ -200,9 +106,10 @@ Pause for human input when testing touches:
| Camera/Photos | "Grant permissions and verify camera works" |
| Location | "Allow location access and verify map updates" |
Use AskUserQuestion:
```markdown
**Human Verification Needed**
Ask the user (using the platform's question tool — e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini — or present numbered options and wait):
```
Human Verification Needed
This test requires [flow type]. Please:
1. [Action to take on simulator]
@@ -213,12 +120,8 @@ Did it work correctly?
2. No - describe the issue
```
</human_verification>
### 7. Handle Failures
<failure_handling>
When a test fails:
1. **Document the failure:**
@@ -226,60 +129,52 @@ When a test fails:
- Capture console logs
- Note reproduction steps
2. **Ask user how to proceed:**
```markdown
**Test Failed: [screen/feature]**
2. **Ask the user how to proceed:**
```
Test Failed: [screen/feature]
Issue: [description]
Logs: [relevant error messages]
How to proceed?
1. Fix now - I'll help debug and fix
2. Create todo - Add to todos/ for later
2. Create todo - Add a todo for later (using the todo-create skill)
3. Skip - Continue testing other screens
```
3. **If "Fix now":**
- Investigate the issue in code
- Propose a fix
- Rebuild and retest
4. **If "Create todo":**
- Create `{id}-pending-p1-xcode-{description}.md`
- Continue testing
</failure_handling>
3. **If "Fix now":** investigate, propose a fix, rebuild and retest
4. **If "Create todo":** load the `todo-create` skill and create a todo with priority p1 and description `xcode-{description}`, continue
5. **If "Skip":** log as skipped, continue
### 8. Test Summary
<test_summary>
After all tests complete, present summary:
After all tests complete, present a summary:
```markdown
## 📱 Xcode Test Results
## Xcode Test Results
**Project:** [project name]
**Scheme:** [scheme name]
**Simulator:** [simulator name]
### Build: Success / Failed
### Build: Success / Failed
### Screens Tested: [count]
| Screen | Status | Notes |
|--------|--------|-------|
| Launch | Pass | |
| Home | Pass | |
| Settings | Fail | Crash on tap |
| Profile | ⏭️ Skip | Requires login |
| Launch | Pass | |
| Home | Pass | |
| Settings | Fail | Crash on tap |
| Profile | Skip | Requires login |
### Console Errors: [count]
- [List any errors found]
### Human Verifications: [count]
- Sign in with Apple: Confirmed
- Push notifications: Confirmed
- Sign in with Apple: Confirmed
- Push notifications: Confirmed
### Failures: [count]
- Settings screen - crash on navigation
@@ -290,43 +185,26 @@ After all tests complete, present summary:
### Result: [PASS / FAIL / PARTIAL]
```
</test_summary>
### 9. Cleanup
<cleanup>
After testing:
**Stop log capture:**
```
mcp__xcodebuildmcp__stop_log_capture({ simulator_id: "[uuid]" })
```
**Optionally shut down simulator:**
```
mcp__xcodebuildmcp__shutdown_simulator({ simulator_id: "[uuid]" })
```
</cleanup>
1. Call `stop_log_capture` with the simulator UUID
2. Optionally call `shutdown_simulator` with the simulator UUID
## Quick Usage Examples
```bash
# Test with default scheme
/xcode-test
/test-xcode
# Test specific scheme
/xcode-test MyApp-Debug
/test-xcode MyApp-Debug
# Test after making changes
/xcode-test current
/test-xcode current
```
## Integration with /ce:review
## Integration with ce:review
When reviewing PRs that touch iOS code, the `/ce:review` command can spawn this as a subagent:
```
Task general-purpose("Run /xcode-test for scheme [name]. Build, install on simulator, test key screens, check for crashes.")
```
When reviewing PRs that touch iOS code, the `ce:review` workflow can spawn an agent to run this skill, build on the simulator, test key screens, and check for crashes.

View File

@@ -0,0 +1,110 @@
---
name: todo-create
description: Use when creating durable work items, managing todo lifecycle, or tracking findings across sessions in the file-based todo system
disable-model-invocation: true
---
# File-Based Todo Tracking
## Overview
The `.context/compound-engineering/todos/` directory is a file-based tracking system for code review feedback, technical debt, feature requests, and work items. Each todo is a markdown file with YAML frontmatter.
> **Legacy support:** Always check both `.context/compound-engineering/todos/` (canonical) and `todos/` (legacy) when reading. Write new todos only to the canonical path. This directory has a multi-session lifecycle -- do not clean it up as scratch.
## Directory Paths
| Purpose | Path |
|---------|------|
| **Canonical (write here)** | `.context/compound-engineering/todos/` |
| **Legacy (read-only)** | `todos/` |
## File Naming Convention
```
{issue_id}-{status}-{priority}-{description}.md
```
- **issue_id**: Sequential number (001, 002, ...) -- never reused
- **status**: `pending` | `ready` | `complete`
- **priority**: `p1` (critical) | `p2` (important) | `p3` (nice-to-have)
- **description**: kebab-case, brief
**Example:** `002-ready-p1-fix-n-plus-1.md`
## File Structure
Each todo has YAML frontmatter and structured sections. Use the template at [todo-template.md](./assets/todo-template.md) when creating new todos.
```yaml
---
status: ready
priority: p1
issue_id: "002"
tags: [rails, performance]
dependencies: ["001"] # Issue IDs this is blocked by
---
```
**Required sections:** Problem Statement, Findings, Proposed Solutions, Recommended Action (filled during triage), Acceptance Criteria, Work Log.
**Required for code review findings:** Assessment (Pressure Test) — verify the finding before acting on it.
- **Assessment**: Clear & Correct | Unclear | Likely Incorrect | YAGNI
- **Recommended Action**: Fix now | Clarify | Push back | Skip
- **Verified**: Code, Tests, Usage, Prior Decisions (Yes/No with details)
- **Technical Justification**: Why this finding is valid or should be skipped
**Optional sections:** Technical Details, Resources, Notes.
## Workflows
> **Tool preference:** Use native file-search/glob and content-search tools instead of shell commands for finding and reading todo files. Shell only for operations with no native equivalent (`mv`, `mkdir -p`).
### Creating a New Todo
1. `mkdir -p .context/compound-engineering/todos/`
2. Search both paths for `[0-9]*-*.md`, find the highest numeric prefix, increment, zero-pad to 3 digits.
3. Read [todo-template.md](./assets/todo-template.md), write to canonical path as `{NEXT_ID}-pending-{priority}-{description}.md`.
4. Fill Problem Statement, Findings, Proposed Solutions, Acceptance Criteria, and initial Work Log entry.
5. Set status: `pending` (needs triage) or `ready` (pre-approved).
**Create a todo when** the work needs more than ~15 minutes, has dependencies, requires planning, or needs prioritization. **Act immediately instead** when the fix is trivial, obvious, and self-contained.
### Triaging Pending Items
1. Glob `*-pending-*.md` in both paths.
2. Review each todo's Problem Statement, Findings, and Proposed Solutions.
3. Approve: rename `pending` -> `ready` in filename and frontmatter, fill Recommended Action.
4. Defer: leave as `pending`.
Load the `todo-triage` skill for an interactive approval workflow.
### Managing Dependencies
```yaml
dependencies: ["002", "005"] # Blocked by these issues
dependencies: [] # No blockers
```
To check blockers: search for `{dep_id}-complete-*.md` in both paths. Missing matches = incomplete blockers.
### Completing a Todo
1. Verify all acceptance criteria.
2. Update Work Log with final session.
3. Rename `ready` -> `complete` in filename and frontmatter.
4. Check for unblocked work: search for files containing `dependencies:.*"{issue_id}"`.
## Integration with Workflows
| Trigger | Flow |
|---------|------|
| Code review | `/ce:review` -> Findings -> `/todo-triage` -> Todos |
| Autonomous review | `/ce:review mode:autofix` -> Residual todos -> `/todo-resolve` |
| Code TODOs | `/todo-resolve` -> Fixes + Complex todos |
| Planning | Brainstorm -> Create todo -> Work -> Complete |
## Key Distinction
This skill manages **durable, cross-session work items** persisted as markdown files. For temporary in-session step tracking, use platform task tools (`TaskCreate`/`TaskUpdate` in Claude Code, `update_plan` in Codex) instead.

View File

@@ -0,0 +1,70 @@
---
name: todo-resolve
description: Use when batch-resolving approved todos, especially after code review or triage sessions
argument-hint: "[optional: specific todo ID or pattern]"
---
Resolve approved todos using parallel processing, document lessons learned, then clean up.
Only `ready` todos are resolved. `pending` todos are skipped — they haven't been triaged yet. If pending todos exist, list them at the end so the user knows what was left behind.
## Workflow
### 1. Analyze
Scan `.context/compound-engineering/todos/*.md` and legacy `todos/*.md`. Partition by status:
- **`ready`** (status field or `-ready-` in filename): resolve these.
- **`pending`**: skip. Report them at the end.
- **`complete`**: ignore, already done.
If a specific todo ID or pattern was passed as an argument, filter to matching todos only (still must be `ready`).
Residual actionable work from `ce:review mode:autofix` after its `safe_auto` pass will already be `ready`.
Skip any todo that recommends deleting, removing, or gitignoring files in `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/` — these are intentional pipeline artifacts.
### 2. Plan
Create a task list grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex). Analyze dependencies -- items that others depend on run first. Output a mermaid diagram showing execution order and parallelism.
### 3. Implement (PARALLEL)
**Do NOT create worktrees per todo item.** A worktree or branch was already set up before this skill was invoked (typically by `/ce:work`). All agents work in the existing single checkout — never pass `isolation: "worktree"` when spawning agents.
Spawn a `compound-engineering:workflow:pr-comment-resolver` agent per item. Prefer parallel; fall back to sequential respecting dependency order.
**Batching:** 1-4 items: direct parallel returns. 5+ items: batches of 4, each returning only a short status summary (todo handled, files changed, tests run/skipped, blockers).
For large sets, use a scratch directory at `.context/compound-engineering/todo-resolve/<run-id>/` for per-resolver artifacts. Return only completion summaries to parent.
### 4. Commit & Resolve
Commit changes, mark todos resolved, push to remote.
GATE: STOP. Verify todos resolved and changes committed before proceeding.
### 5. Compound on Lessons Learned
Load the `ce:compound` skill to document what was learned. Todo resolutions often surface patterns and architectural insights worth capturing.
GATE: STOP. Verify the compound skill produced a solution document in `docs/solutions/`. If none (user declined or no learnings), continue.
### 6. Clean Up
Delete completed/resolved todo files from both paths. If a scratch directory was created at `.context/compound-engineering/todo-resolve/<run-id>/`, delete it (unless user asked to inspect).
```
Todos resolved: [count]
Pending (skipped): [count, or "none"]
Lessons documented: [path to solution doc, or "skipped"]
Todos cleaned up: [count deleted]
```
If pending todos were skipped, list them:
```
Skipped pending todos (run /todo-triage to approve):
- 003-pending-p2-missing-index.md
- 005-pending-p3-rename-variable.md
```

View File

@@ -0,0 +1,70 @@
---
name: todo-triage
description: Use when reviewing pending todos for approval, prioritizing code review findings, or interactively categorizing work items
argument-hint: "[findings list or source type]"
disable-model-invocation: true
---
# Todo Triage
Interactive workflow for reviewing pending todos one by one and deciding whether to approve, skip, or modify each.
**Do not write code during triage.** This is purely for review and prioritization -- implementation happens in `/todo-resolve`.
- First set the /model to Haiku
- Read all pending todos from `.context/compound-engineering/todos/` and legacy `todos/` directories
## Workflow
### 1. Present Each Finding
For each pending todo, present it clearly with severity, category, description, location, problem scenario, proposed solution, and effort estimate. Then ask:
```
Do you want to add this to the todo list?
1. yes - approve and mark ready
2. next - skip (deletes the todo file)
3. custom - modify before approving
```
Use severity levels: 🔴 P1 (CRITICAL), 🟡 P2 (IMPORTANT), 🔵 P3 (NICE-TO-HAVE).
Include progress tracking in each header: `Progress: 3/10 completed`
### 2. Handle Decision
**yes:** Rename file from `pending` -> `ready` in both filename and frontmatter. Fill the Recommended Action section. If creating a new todo (not updating existing), use the naming convention from the `todo-create` skill.
Priority mapping: 🔴 P1 -> `p1`, 🟡 P2 -> `p2`, 🔵 P3 -> `p3`
Confirm: "✅ Approved: `{filename}` (Issue #{issue_id}) - Status: **ready**"
**next:** Delete the todo file. Log as skipped for the final summary.
**custom:** Ask what to modify, update, re-present, ask again.
### 3. Final Summary
After all items processed:
```markdown
## Triage Complete
**Total Items:** [X] | **Approved (ready):** [Y] | **Skipped:** [Z]
### Approved Todos (Ready for Work):
- `042-ready-p1-transaction-boundaries.md` - Transaction boundary issue
### Skipped (Deleted):
- Item #5: [reason]
```
### 4. Next Steps
```markdown
What would you like to do next?
1. run /todo-resolve to resolve the todos
2. commit the todos
3. nothing, go chill
```

View File

@@ -1,311 +0,0 @@
---
name: triage
description: Triage and categorize findings for the CLI todo system
argument-hint: "[findings list or source type]"
disable-model-invocation: true
---
- First set the /model to Haiku
- Then read all pending todos in the todos/ directory
Present all findings, decisions, or issues here one by one for triage. The goal is to go through each item and decide whether to add it to the CLI todo system.
**IMPORTANT: DO NOT CODE ANYTHING DURING TRIAGE!**
This command is for:
- Triaging code review findings
- Processing security audit results
- Reviewing performance analysis
- Handling any other categorized findings that need tracking
## Workflow
### Step 1: Present Each Finding
For each finding, present in this format:
```
---
Issue #X: [Brief Title]
Severity: 🔴 P1 (CRITICAL) / 🟡 P2 (IMPORTANT) / 🔵 P3 (NICE-TO-HAVE)
Category: [Security/Performance/Architecture/Bug/Feature/etc.]
Description:
[Detailed explanation of the issue or improvement]
Location: [file_path:line_number]
Problem Scenario:
[Step by step what's wrong or could happen]
Proposed Solution:
[How to fix it]
Estimated Effort: [Small (< 2 hours) / Medium (2-8 hours) / Large (> 8 hours)]
---
Do you want to add this to the todo list?
1. yes - create todo file
2. next - skip this item
3. custom - modify before creating
```
### Step 2: Handle User Decision
**When user says "yes":**
1. **Update existing todo file** (if it exists) or **Create new filename:**
If todo already exists (from code review):
- Rename file from `{id}-pending-{priority}-{desc}.md``{id}-ready-{priority}-{desc}.md`
- Update YAML frontmatter: `status: pending``status: ready`
- Keep issue_id, priority, and description unchanged
If creating new todo:
```
{next_id}-ready-{priority}-{brief-description}.md
```
Priority mapping:
- 🔴 P1 (CRITICAL) → `p1`
- 🟡 P2 (IMPORTANT) → `p2`
- 🔵 P3 (NICE-TO-HAVE) → `p3`
Example: `042-ready-p1-transaction-boundaries.md`
2. **Update YAML frontmatter:**
```yaml
---
status: ready # IMPORTANT: Change from "pending" to "ready"
priority: p1 # or p2, p3 based on severity
issue_id: "042"
tags: [category, relevant-tags]
dependencies: []
---
```
3. **Populate or update the file:**
```yaml
# [Issue Title]
## Problem Statement
[Description from finding]
## Findings
- [Key discoveries]
- Location: [file_path:line_number]
- [Scenario details]
## Proposed Solutions
### Option 1: [Primary solution]
- **Pros**: [Benefits]
- **Cons**: [Drawbacks if any]
- **Effort**: [Small/Medium/Large]
- **Risk**: [Low/Medium/High]
## Recommended Action
[Filled during triage - specific action plan]
## Technical Details
- **Affected Files**: [List files]
- **Related Components**: [Components affected]
- **Database Changes**: [Yes/No - describe if yes]
## Resources
- Original finding: [Source of this issue]
- Related issues: [If any]
## Acceptance Criteria
- [ ] [Specific success criteria]
- [ ] Tests pass
- [ ] Code reviewed
## Work Log
### {date} - Approved for Work
**By:** Claude Triage System
**Actions:**
- Issue approved during triage session
- Status changed from pending → ready
- Ready to be picked up and worked on
**Learnings:**
- [Context and insights]
## Notes
Source: Triage session on {date}
```
4. **Confirm approval:** "✅ Approved: `{new_filename}` (Issue #{issue_id}) - Status: **ready** → Ready to work on"
**When user says "next":**
- **Delete the todo file** - Remove it from todos/ directory since it's not relevant
- Skip to the next item
- Track skipped items for summary
**When user says "custom":**
- Ask what to modify (priority, description, details)
- Update the information
- Present revised version
- Ask again: yes/next/custom
### Step 3: Continue Until All Processed
- Process all items one by one
- Track using TodoWrite for visibility
- Don't wait for approval between items - keep moving
### Step 4: Final Summary
After all items processed:
````markdown
## Triage Complete
**Total Items:** [X] **Todos Approved (ready):** [Y] **Skipped:** [Z]
### Approved Todos (Ready for Work):
- `042-ready-p1-transaction-boundaries.md` - Transaction boundary issue
- `043-ready-p2-cache-optimization.md` - Cache performance improvement ...
### Skipped Items (Deleted):
- Item #5: [reason] - Removed from todos/
- Item #12: [reason] - Removed from todos/
### Summary of Changes Made:
During triage, the following status updates occurred:
- **Pending → Ready:** Filenames and frontmatter updated to reflect approved status
- **Deleted:** Todo files for skipped findings removed from todos/ directory
- Each approved file now has `status: ready` in YAML frontmatter
### Next Steps:
1. View approved todos ready for work:
```bash
ls todos/*-ready-*.md
```
````
2. Start work on approved items:
```bash
/resolve_todo_parallel # Work on multiple approved items efficiently
```
3. Or pick individual items to work on
4. As you work, update todo status:
- Ready → In Progress (in your local context as you work)
- In Progress → Complete (rename file: ready → complete, update frontmatter)
```
## Example Response Format
```
---
Issue #5: Missing Transaction Boundaries for Multi-Step Operations
Severity: 🔴 P1 (CRITICAL)
Category: Data Integrity / Security
Description: The google_oauth2_connected callback in GoogleOauthCallbacks concern performs multiple database operations without transaction protection. If any step fails midway, the database is left in an inconsistent state.
Location: app/controllers/concerns/google_oauth_callbacks.rb:13-50
Problem Scenario:
1. User.update succeeds (email changed)
2. Account.save! fails (validation error)
3. Result: User has changed email but no associated Account
4. Next login attempt fails completely
Operations Without Transaction:
- User confirmation (line 13)
- Waitlist removal (line 14)
- User profile update (line 21-23)
- Account creation (line 28-37)
- Avatar attachment (line 39-45)
- Journey creation (line 47)
Proposed Solution: Wrap all operations in ApplicationRecord.transaction do ... end block
Estimated Effort: Small (30 minutes)
---
Do you want to add this to the todo list?
1. yes - create todo file
2. next - skip this item
3. custom - modify before creating
```
## Important Implementation Details
### Status Transitions During Triage
**When "yes" is selected:**
1. Rename file: `{id}-pending-{priority}-{desc}.md` → `{id}-ready-{priority}-{desc}.md`
2. Update YAML frontmatter: `status: pending` → `status: ready`
3. Update Work Log with triage approval entry
4. Confirm: "✅ Approved: `{filename}` (Issue #{issue_id}) - Status: **ready**"
**When "next" is selected:**
1. Delete the todo file from todos/ directory
2. Skip to next item
3. No file remains in the system
### Progress Tracking
Every time you present a todo as a header, include:
- **Progress:** X/Y completed (e.g., "3/10 completed")
- **Estimated time remaining:** Based on how quickly you're progressing
- **Pacing:** Monitor time per finding and adjust estimate accordingly
Example:
```
Progress: 3/10 completed | Estimated time: ~2 minutes remaining
```
### Do Not Code During Triage
- ✅ Present findings
- ✅ Make yes/next/custom decisions
- ✅ Update todo files (rename, frontmatter, work log)
- ❌ Do NOT implement fixes or write code
- ❌ Do NOT add detailed implementation details
- ❌ That's for /resolve_todo_parallel phase
```
When done give these options
```markdown
What would you like to do next?
1. run /resolve_todo_parallel to resolve the todos
2. commit the todos
3. nothing, go chill
```

View File

@@ -1,10 +0,0 @@
---
name: workflows:brainstorm
description: "[DEPRECATED] Use /ce:brainstorm instead — renamed for clarity."
argument-hint: "[feature idea or problem to explore]"
disable-model-invocation: true
---
NOTE: /workflows:brainstorm is deprecated. Please use /ce:brainstorm instead. This alias will be removed in a future version.
/ce:brainstorm $ARGUMENTS

View File

@@ -1,10 +0,0 @@
---
name: workflows:compound
description: "[DEPRECATED] Use /ce:compound instead — renamed for clarity."
argument-hint: "[optional: brief context about the fix]"
disable-model-invocation: true
---
NOTE: /workflows:compound is deprecated. Please use /ce:compound instead. This alias will be removed in a future version.
/ce:compound $ARGUMENTS

View File

@@ -1,10 +0,0 @@
---
name: workflows:plan
description: "[DEPRECATED] Use /ce:plan instead — renamed for clarity."
argument-hint: "[feature description, bug report, or improvement idea]"
disable-model-invocation: true
---
NOTE: /workflows:plan is deprecated. Please use /ce:plan instead. This alias will be removed in a future version.
/ce:plan $ARGUMENTS

View File

@@ -1,10 +0,0 @@
---
name: workflows:review
description: "[DEPRECATED] Use /ce:review instead — renamed for clarity."
argument-hint: "[PR number, GitHub URL, branch name, or latest]"
disable-model-invocation: true
---
NOTE: /workflows:review is deprecated. Please use /ce:review instead. This alias will be removed in a future version.
/ce:review $ARGUMENTS

View File

@@ -1,10 +0,0 @@
---
name: workflows:work
description: "[DEPRECATED] Use /ce:work instead — renamed for clarity."
argument-hint: "[plan file, specification, or todo file path]"
disable-model-invocation: true
---
NOTE: /workflows:work is deprecated. Please use /ce:work instead. This alias will be removed in a future version.
/ce:work $ARGUMENTS