diff --git a/docs/solutions/skill-design/compound-refresh-skill-improvements.md b/docs/solutions/skill-design/compound-refresh-skill-improvements.md index 29a50bf..21f0fab 100644 --- a/docs/solutions/skill-design/compound-refresh-skill-improvements.md +++ b/docs/solutions/skill-design/compound-refresh-skill-improvements.md @@ -29,6 +29,7 @@ The initial `ce:compound-refresh` skill had several design issues discovered dur 3. Broad scope (9+ docs) asked the user to choose an area blindly without providing analysis 4. The Replace flow tried to hand off to `ce:compound`, which expects fresh problem-solving context the user doesn't have months later 5. Subagents used shell commands for file existence checks, triggering permission prompts +6. No way to run the skill unattended (e.g., on a schedule) — every run required user interaction ## Root Cause @@ -39,6 +40,7 @@ Five independent design issues, each with a distinct root cause: 3. **Question before evidence.** The skill prompted scope selection before gathering any information about which areas were most stale or interconnected. 4. **Unsatisfied precondition in cross-skill handoff.** `ce:compound` expects a recently solved problem with fresh context. A maintenance refresh has investigation evidence instead — equivalent data, different shape. 5. **No tool preference guidance for subagents.** Without explicit instruction, subagents defaulted to bash for file operations. +6. **Interactive-only design.** Every phase assumed a user was present. No way to run autonomously for scheduled maintenance or hands-off sweeps. ## Solution @@ -96,6 +98,16 @@ not shell commands. This avoids unnecessary permission prompts and is more reliable across platforms. ``` +### 6. Autonomous mode for scheduled/unattended runs + +Added `mode:autonomous` argument support so the skill can run without user interaction (e.g., on a schedule, in CI, or when the user just wants a hands-off sweep). + +Key design decisions: +- **Explicit opt-in only.** `mode:autonomous` must be in the arguments. Auto-detection based on tool availability was rejected because a user in an interactive agent without a question tool (e.g., Cursor, Windsurf) is still interactive — they just use plain-text replies. +- **Conservative confidence.** Borderline cases that would get a user question in interactive mode get marked stale in autonomous mode. Err toward stale-marking over incorrect action. +- **Detailed report as deliverable.** Since no user was present, the output report includes full rationale for each action so a human can review after the fact. +- **Process everything.** No scope narrowing questions — if no scope hint provided, process all docs. For broad scope, process clusters in impact order without asking. + ## Prevention ### Skill review checklist additions @@ -107,6 +119,7 @@ These five patterns should be checked during any skill review: 3. **No blind user questions** — Every question presented to the user is informed by evidence the agent gathered first 4. **No unsatisfied cross-skill preconditions** — Every skill handoff verifies the target skill's preconditions are met by the calling context 5. **No shell commands for file operations in subagents** — Subagent instructions explicitly prefer dedicated tools over shell commands +6. **Autonomous mode for long-running skills** — Any skill that could run unattended should support an explicit opt-in mode with conservative confidence and detailed reporting ### Key anti-patterns @@ -117,6 +130,7 @@ These five patterns should be checked during any skill review: | "Which area should we review?" before any investigation | Triage first, recommend with evidence, let user confirm or redirect | | "Create a successor learning through ce:compound" during a refresh | Replacement subagent writes directly using gathered evidence | | No tool guidance for subagents | "Use dedicated file search and read tools, not shell commands" | +| Auto-detecting "no question tool = headless" | Explicit `mode:autonomous` argument — interactive agents without question tools are still interactive | ## Cross-References diff --git a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md index b552fbc..69b307d 100644 --- a/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md +++ b/plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md @@ -1,7 +1,7 @@ --- name: ce:compound-refresh description: Refresh stale or drifting learnings and pattern docs in docs/solutions/ by reviewing, updating, replacing, or archiving them against the current codebase. Use after refactors, migrations, dependency upgrades, or when a retrieved learning feels outdated or wrong. Also use when reviewing docs/solutions/ for accuracy, when a recently solved problem contradicts an existing learning, or when pattern docs no longer reflect current code. -argument-hint: "[optional: scope hint]" +argument-hint: "[mode:autonomous] [optional: scope hint]" disable-model-invocation: true --- @@ -9,8 +9,28 @@ disable-model-invocation: true Maintain the quality of `docs/solutions/` over time. This workflow reviews existing learnings against the current codebase, then refreshes any derived pattern docs that depend on them. +## Mode Detection + +Check if `$ARGUMENTS` contains `mode:autonomous`. If present, strip it from arguments (use the remainder as a scope hint) and run in **autonomous mode**. + +| Mode | When | Behavior | +|------|------|----------| +| **Interactive** (default) | User is present and can answer questions | Ask for decisions on ambiguous cases, confirm actions | +| **Autonomous** | `mode:autonomous` in arguments | No user interaction. Apply all unambiguous actions (Keep, Update, auto-Archive, Replace with sufficient evidence). Mark ambiguous cases as stale. Generate a summary report at the end. | + +### Autonomous mode rules + +- **Skip all user questions.** Never pause for input. +- **Process all docs in scope.** No scope narrowing questions — if no scope hint was provided, process everything. +- **Apply safe actions directly:** Keep (no-op), Update (fix references), auto-Archive (unambiguous criteria met), Replace (when evidence is sufficient). +- **Mark as stale when uncertain.** If classification is genuinely ambiguous (Update vs Replace vs Archive) or Replace evidence is insufficient, mark as stale with `status: stale`, `stale_reason`, and `stale_date` in the frontmatter. Do not guess. +- **Use conservative confidence.** In interactive mode, borderline cases get a user question. In autonomous mode, borderline cases get marked stale. Err toward stale-marking over incorrect action. +- **Generate a full report.** The output report (see Output Format) lists all actions taken and all items marked stale with reasons, so a human can review the results after the fact. + ## Interaction Principles +**These principles apply to interactive mode only. In autonomous mode, skip all user questions and apply the autonomous mode rules above.** + Follow the same interaction style as `ce:brainstorm`: - Ask questions **one at a time** — use the platform's interactive question tool (e.g. `AskUserQuestion` in Claude Code, `request_user_input` in Codex) and **stop to wait for the answer** before continuing @@ -80,7 +100,7 @@ If `$ARGUMENTS` is provided, use it to narrow scope before proceeding. Try these 3. **Filename match** — match against filenames (partial matches are fine) 4. **Content search** — search file contents for the argument as a keyword (useful for feature names or feature areas) -If no matches are found, report that and ask the user to clarify. +If no matches are found, report that and ask the user to clarify. In autonomous mode, report the miss and stop — do not guess at scope. If no candidate docs are found, report: @@ -112,7 +132,7 @@ When scope is broad (9+ candidate docs), do a lightweight triage before deep inv 1. **Inventory** — read frontmatter of all candidate docs, group by module/component/category 2. **Impact clustering** — identify areas with the densest clusters of learnings + pattern docs. A cluster of 5 learnings and 2 patterns covering the same module is higher-impact than 5 isolated single-doc areas, because staleness in one doc is likely to affect the others. 3. **Spot-check drift** — for each cluster, check whether the primary referenced files still exist. Missing references in a high-impact cluster = strongest signal for where to start. -4. **Recommend a starting area** — present the highest-impact cluster with a brief rationale and ask the user to confirm or redirect. +4. **Recommend a starting area** — present the highest-impact cluster with a brief rationale and ask the user to confirm or redirect. In autonomous mode, skip the question and process all clusters in impact order. Example: @@ -186,7 +206,7 @@ There are two subagent roles: 1. **Investigation subagents** — read-only. They must not edit files, create successors, or archive anything. Each returns: file path, evidence, recommended action, confidence, and open questions. These can run in parallel when artifacts are independent. 2. **Replacement subagents** — write a single new learning to replace a stale one. These run **one at a time, sequentially** (each replacement subagent may need to read significant code, and running multiple in parallel risks context exhaustion). The orchestrator handles all archival and metadata updates after each replacement completes. -The orchestrator merges investigation results, detects contradictions, asks the user questions, coordinates replacement subagents, and performs all archival/metadata edits centrally. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing. +The orchestrator merges investigation results, detects contradictions, coordinates replacement subagents, and performs all archival/metadata edits centrally. In interactive mode, it asks the user questions on ambiguous cases. In autonomous mode, it marks ambiguous cases as stale instead. If two artifacts overlap or discuss the same root issue, investigate them together rather than parallelizing. ## Phase 2: Classify the Right Maintenance Action @@ -255,6 +275,8 @@ If "archive" feels too strong but the pattern should no longer be elevated, redu ## Phase 3: Ask for Decisions +**In autonomous mode, skip this entire phase.** Apply all unambiguous actions directly and mark ambiguous cases as stale (see autonomous mode rules). + Most Updates should be applied directly without asking. Only ask the user when: - The right action is genuinely ambiguous (Update vs Replace vs Archive) @@ -393,15 +415,27 @@ Updated: Y Replaced: Z Archived: W Skipped: V +Marked stale: S ``` Then list the affected files and what changed. For **Keep** outcomes, list them under a reviewed-without-edits section so the result is visible without creating git churn. +### Autonomous mode output + +In autonomous mode, the report is the primary deliverable since no user was present during execution. Include additional detail: + +- For each **Updated** file: what references were fixed and why +- For each **Replaced** file: what the old learning recommended vs what the current code does, and the path to the new successor +- For each **Archived** file: what referenced code/workflow is gone +- For each **Marked stale** file: what evidence was found, what was ambiguous, and what action a human should consider + +This report gives a human reviewer enough context to verify the autonomous run's decisions after the fact. + ## Relationship to ce:compound - `ce:compound` captures a newly solved, verified problem - `ce:compound-refresh` maintains older learnings as the codebase evolves -Use **Replace** only when the refresh process has enough real replacement context to hand off honestly into `ce:compound`. +Use **Replace** only when the refresh process has enough real evidence to write a trustworthy successor. When evidence is insufficient, mark as stale and recommend `ce:compound` for when the user next encounters that problem area.