feat(ce-debug): add systematic debugging skill (#543)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:23:48 -07:00
parent b979143ad0
commit e38223ae91
5 changed files with 444 additions and 195 deletions
--- a/plugins/compound-engineering/skills/ce-debug/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-debug/SKILL.md
@@ -0,0 +1,191 @@
+---
+name: ce-debug
+description: 'Systematically find root causes and fix bugs. Use when debugging errors, investigating test failures, reproducing bugs from issue trackers (GitHub, Linear, Jira), or when stuck on a problem after failed fix attempts. Also use when the user says ''debug this'', ''why is this failing'', ''fix this bug'', ''trace this error'', or pastes stack traces, error messages, or issue references.'
+argument-hint: "[issue reference, error message, test path, or description of broken behavior]"
+---
+
+# Debug and Fix
+
+Find root causes, then fix them. This skill investigates bugs systematically — tracing the full causal chain before proposing a fix — and optionally implements the fix with test-first discipline.
+
+<bug_description> #$ARGUMENTS </bug_description>
+
+## Core Principles
+
+These principles govern every phase. They are repeated at decision points because they matter most when the pressure to skip them is highest.
+
+1. **Investigate before fixing.** Do not propose a fix until you can explain the full causal chain from trigger to symptom with no gaps. "Somehow X leads to Y" is a gap.
+2. **Predictions for uncertain links.** When the causal chain has uncertain or non-obvious links, form a prediction — something in a different code path or scenario that must also be true. If the prediction is wrong but a fix "works," you found a symptom, not the cause. When the chain is obvious (missing import, clear null reference), the chain explanation itself is sufficient.
+3. **One change at a time.** Test one hypothesis, change one thing. If you're changing multiple things to "see if it helps," stop — that is shotgun debugging.
+4. **When stuck, diagnose why — don't just try harder.**
+
+## Execution Flow
+
+| Phase | Name | Purpose |
+|-------|------|---------|
+| 0 | Triage | Parse input, fetch issue if referenced, proceed to investigation |
+| 1 | Investigate | Reproduce the bug, trace the code path |
+| 2 | Root Cause | Form hypotheses with predictions for uncertain links, test them, **causal chain gate**, smart escalation |
+| 3 | Fix | Only if user chose to fix. Test-first fix with workspace safety checks |
+| 4 | Close | Structured summary, handoff options |
+
+All phases self-size — a simple bug flows through them in seconds, a complex bug spends more time in each naturally. No complexity classification, no phase skipping.
+
+---
+
+### Phase 0: Triage
+
+Parse the input and reach a clear problem statement.
+
+**If the input references an issue tracker**, fetch it:
+- GitHub (`#123`, `org/repo#123`, github.com URL): Parse the issue reference from `<bug_description>` and fetch with `gh issue view <number> --json title,body,comments,labels`. For URLs, pass the URL directly to `gh`.
+- Other trackers (Linear URL/ID, Jira URL/key, any tracker URL): Attempt to fetch using available MCP tools or by fetching the URL content. If the fetch fails — auth, missing tool, non-public page — ask the user to paste the relevant issue content.
+
+Extract reported symptoms, expected behavior, reproduction steps, and environment details. Then proceed to Phase 1.
+
+**Everything else** (stack traces, test paths, error messages, descriptions of broken behavior): Proceed directly to Phase 1.
+
+**Questions:**
+- Do not ask questions by default — investigate first (read code, run tests, trace errors)
+- Only ask when a genuine ambiguity blocks investigation and cannot be resolved by reading code or running tests
+- When asking, ask one specific question
+
+**Prior-attempt awareness:** If the user indicates prior failed attempts ("I've been trying", "keeps failing", "stuck"), ask what they have already tried before investigating. This avoids repeating failed approaches and is one of the few cases where asking first is the right call.
+
+---
+
+### Phase 1: Investigate
+
+#### 1.1 Reproduce the bug
+
+Confirm the bug exists and understand its behavior. Run the test, trigger the error, follow reported reproduction steps — whatever matches the input.
+
+- **Browser bugs:** Prefer `agent-browser` if installed. Otherwise use whatever works — MCP browser tools, direct URL testing, screenshot capture, etc.
+- **Manual setup required:** If reproduction needs specific conditions the agent cannot create alone (data states, user roles, external services, environment config), document the exact setup steps and guide the user through them. Clear step-by-step instructions save significant time even when the process is fully manual.
+- **Does not reproduce after 2-3 attempts:** Read `references/investigation-techniques.md` for intermittent-bug techniques.
+- **Cannot reproduce at all in this environment:** Document what was tried and what conditions appear to be missing.
+
+#### 1.2 Trace the code path
+
+Read the relevant source files. Follow the execution path from entry point to where the error manifests. Trace backward through the call chain:
+
+- Start at the error
+- Ask "where did this value come from?" and "who called this?"
+- Keep going upstream until finding the point where valid state first became invalid
+- Do not stop at the first function that looks wrong — the root cause is where bad state originates, not where it is first observed
+
+As you trace:
+- Check recent changes in files you are reading: `git log --oneline -10 -- [file]`
+- If the bug looks like a regression ("it worked before"), use `git bisect` (see `references/investigation-techniques.md`)
+- Check the project's observability tools for additional evidence:
+  - Error trackers (Sentry, AppSignal, Datadog, BetterStack, Bugsnag)
+  - Application logs
+  - Browser console output
+  - Database state
+- Each project has different systems available; use whatever gives a more complete picture
+
+---
+
+### Phase 2: Root Cause
+
+*Reminder: investigate before fixing. Do not propose a fix until you can explain the full causal chain from trigger to symptom with no gaps.*
+
+Read `references/anti-patterns.md` before forming hypotheses.
+
+**Form hypotheses** ranked by likelihood. For each, state:
+- What is wrong and where (file:line)
+- The causal chain: how the trigger leads to the observed symptom, step by step
+- **For uncertain links in the chain**: a prediction — something in a different code path or scenario that must also be true if this link is correct
+
+When the causal chain is obvious and has no uncertain links (missing import, clear type error, explicit null dereference), the chain explanation itself is the gate — no prediction required. Predictions are a tool for testing uncertain links, not a ritual for every hypothesis.
+
+Before forming a new hypothesis, review what has already been ruled out and why.
+
+**Causal chain gate:** Do not proceed to Phase 3 until you can explain the full causal chain — from the original trigger through every step to the observed symptom — with no gaps. The user can explicitly authorize proceeding with the best-available hypothesis if investigation is stuck.
+
+*Reminder: if a prediction was wrong but the fix appears to work, you found a symptom. The real cause is still active.*
+
+#### Present findings
+
+Once the root cause is confirmed, present:
+- The root cause (causal chain summary with file:line references)
+- The proposed fix and which files would change
+- Which tests to add or modify to prevent recurrence (specific test file, test case description, what the assertion should verify)
+- Whether existing tests should have caught this and why they did not
+
+Then offer next steps (use the platform's question tool — `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini — or present numbered options and wait):
+
+1. **Fix it now** — proceed to Phase 3
+2. **View in Proof** (`/proof`) — for easy review and sharing with others
+3. **Rethink the design** (`/ce:brainstorm`) — only when the root cause reveals a design problem (see below)
+
+Do not assume the user wants action right now. The test recommendations are part of the diagnosis regardless of which path is chosen.
+
+**When to suggest brainstorm:** Only when investigation reveals the bug cannot be properly fixed within the current design — the design itself needs to change. Concrete signals observable during debugging:
+
+- **The root cause is a wrong responsibility or interface**, not wrong logic. The module should not be doing this at all, or the boundary between components is in the wrong place. (Observable: the fix requires moving responsibility between modules, not correcting code within one.)
+- **The requirements are wrong or incomplete.** The system behaves as designed, but the design does not match what users actually need. The "bug" is really a product gap. (Observable: the code is doing exactly what it was written to do — the spec is the problem.)
+- **Every fix is a workaround.** You can patch the symptom, but cannot articulate a clean fix because the surrounding code was built on an assumption that no longer holds. (Observable: you keep wanting to add special cases or flags rather than a direct correction.)
+
+Do not suggest brainstorm for bugs that are large but have a clear fix — size alone does not make something a design problem.
+
+#### Smart escalation
+
+If 2-3 hypotheses are exhausted without confirmation, diagnose why:
+
+| Pattern | Diagnosis | Next move |
+|---------|-----------|-----------|
+| Hypotheses point to different subsystems | Architecture/design problem, not a localized bug | Present findings, suggest `/ce:brainstorm` |
+| Evidence contradicts itself | Wrong mental model of the code | Step back, re-read the code path without assumptions |
+| Works locally, fails in CI/prod | Environment problem | Focus on env differences, config, dependencies, timing |
+| Fix works but prediction was wrong | Symptom fix, not root cause | The real cause is still active — keep investigating |
+
+Present the diagnosis to the user before proceeding.
+
+---
+
+### Phase 3: Fix
+
+*Reminder: one change at a time. If you are changing multiple things, stop.*
+
+If the user chose Proof or brainstorm at the end of Phase 2, skip this phase — the skill's job was the diagnosis.
+
+**Workspace check:** Before editing files, check for uncommitted changes (`git status`). If the user has unstaged work in files that need modification, confirm before editing — do not overwrite in-progress changes.
+
+**Test-first:**
+1. Write a failing test that captures the bug (or use the existing failing test)
+2. Verify it fails for the right reason — the root cause, not unrelated setup
+3. Implement the minimal fix — address the root cause and nothing else
+4. Verify the test passes
+5. Run the broader test suite for regressions
+
+**3 failed fix attempts = smart escalation.** Diagnose using the same table from Phase 2. If fixes keep failing, the root cause identification was likely wrong. Return to Phase 2.
+
+**Conditional defense-in-depth** (trigger: grep for the root-cause pattern found it in other files):
+Check whether the same gap exists at those locations. Skip when the root cause is a one-off error.
+
+**Conditional post-mortem** (trigger: the bug was in production, OR the pattern appears in 3+ locations):
+How was this introduced? What allowed it to survive? If a systemic gap was found: "This pattern appears in N other files. Want to capture it with `/ce:compound`?"
+
+---
+
+### Phase 4: Close
+
+**Structured summary:**
+
+```
+## Debug Summary
+**Problem**: [What was broken]
+**Root Cause**: [Full causal chain, with file:line references]
+**Recommended Tests**: [Tests to add/modify to prevent recurrence, with specific file and assertion guidance]
+**Fix**: [What was changed — or "diagnosis only" if Phase 3 was skipped]
+**Prevention**: [Test coverage added; defense-in-depth if applicable]
+**Confidence**: [High/Medium/Low]
+```
+
+**Handoff options** (use platform question tool, or present numbered options and wait):
+1. Commit the fix (if Phase 3 ran)
+2. Document as a learning (`/ce:compound`)
+3. Post findings to the issue (if entry came from an issue tracker) — convey: confirmed root cause, verified reproduction steps, relevant code references, and suggested fix direction; keep it concise and useful for whoever picks up the issue next
+4. View in Proof (`/proof`) — for easy review and sharing with others
+5. Done
--- a/plugins/compound-engineering/skills/ce-debug/references/anti-patterns.md
+++ b/plugins/compound-engineering/skills/ce-debug/references/anti-patterns.md
@@ -0,0 +1,91 @@
+# Debugging Anti-Patterns
+
+Read this before forming hypotheses. These patterns describe the most common ways debugging goes wrong. They feel productive in the moment — that is what makes them dangerous.
+
+---
+
+## Prediction Quality
+
+The prediction requirement exists to prevent symptom-fixing. A prediction tests whether your understanding of the bug is correct, not just whether a fix makes the error go away.
+
+**Bad prediction (restates the hypothesis):**
+> Hypothesis: The null pointer is because `user` is not initialized.
+> Prediction: `user` will be null when I log it.
+
+This just re-describes the symptom. It cannot be wrong if the hypothesis is right — so it cannot catch a wrong hypothesis.
+
+**Good prediction (tests something non-obvious):**
+> Hypothesis: The null pointer is because the auth middleware skips initialization on cached requests.
+> Prediction: Non-cached requests to the same endpoint will NOT produce the null pointer, and the `X-Cache` header will be present on failing requests.
+
+This tests a different code path and a different observable. If the prediction is wrong — cached and non-cached requests both fail — the hypothesis is wrong even if "initializing user earlier" happens to fix the immediate error.
+
+**Rule of thumb:** A good prediction names something you have not looked at yet. If confirming the prediction requires only looking at the same line of code you already identified, the prediction is not adding information.
+
+---
+
+## Shotgun Debugging
+
+Changing multiple things at once to "see if it helps."
+
+**How it feels:** Productive. You're making changes, running tests, making progress.
+
+**What actually happens:** If the bug goes away, you do not know which change fixed it. If it persists, you do not know which changes are relevant. You have introduced variables instead of eliminating them.
+
+**The fix:** One hypothesis, one change, one test. If the first change does not fix it, revert it before trying the next. Changes should be additive to understanding, not cumulative to the codebase.
+
+---
+
+## Confirmation Bias
+
+Interpreting ambiguous evidence as supporting your current hypothesis.
+
+**How it looks:**
+- A log line that *could* support your theory — you treat it as proof
+- A test passes after your change — you declare the bug fixed without checking if the test was actually exercising the failure path
+- The error message changes slightly — you interpret the change as "getting closer" instead of recognizing a different failure mode
+
+**The defense:** Before declaring a hypothesis confirmed, ask: "What evidence would DISPROVE this hypothesis?" If you cannot name something that would change your mind, you are not testing — you are justifying.
+
+---
+
+## "It Works Now, Move On"
+
+The bug stops appearing after a change. The temptation is to declare victory and move on.
+
+**When this is a trap:** If you cannot explain WHY the change fixed the bug — the full causal chain from your change through the system to the symptom — you may have:
+- Fixed a symptom while the root cause remains
+- Introduced a change that masks the bug without resolving it
+- Gotten lucky with timing (especially for intermittent bugs)
+
+**The test:** Can you explain the fix to someone else without using the words "somehow" or "I think"? If not, the root cause is not confirmed.
+
+---
+
+## Thoughts That Signal You Are About to Shortcut
+
+These feel like reasonable next steps. They are warning signs that investigation is being skipped.
+
+**Proposing a fix before explaining the cause.** If the words "I think we should change..." come before "the root cause is...", pause. The fix might be right, but without a confirmed causal chain there is no way to know. Explain the cause first.
+
+**Reaching for another attempt without new information.** After 2-3 failed hypotheses, trying a 4th without learning something new from the failures is not debugging — it is guessing with increasing frustration. Stop and diagnose why previous hypotheses failed (see smart escalation).
+
+**Certainty without evidence.** The feeling of "I know what this is" before reading the relevant code. Experienced developers have strong pattern-matching instincts, and they are right often enough to be dangerous when wrong. Read the code even when you are confident.
+
+**Minimizing the scope.** "It is probably just..." — the word "just" signals an assumption that the problem is small. Small problems do not resist 2-3 fix attempts. If you are still debugging, it is not "just" anything.
+
+**Treating environmental differences as irrelevant.** When something works in one environment and fails in another, the difference between environments IS the investigation. Do not dismiss it — compare them systematically.
+
+---
+
+## Smart Escalation Patterns
+
+When 2-3 hypotheses have been tested and none confirmed, the problem is not "I need hypothesis #4." The problem is usually one of these:
+
+**Different subsystems keep appearing.** Hypothesis 1 pointed to auth, hypothesis 2 to the database, hypothesis 3 to caching. This scatter pattern means the bug is not in any one subsystem — it is in the interaction between them, or in an architectural assumption that cuts across all of them. This is a design problem, not a localized bug.
+
+**Evidence contradicts itself.** The logs say X happened, but the code makes X impossible. The test fails with error A, but the code path that produces error A is unreachable from the test. When evidence contradicts, the mental model is wrong. Step back. Re-read the code from the entry point without any assumptions about what it does.
+
+**Works locally, fails elsewhere.** The most common causes: environment variables, dependency versions, file system differences (case sensitivity, path separators), timing differences (faster/slower machines), and data differences (test fixtures vs production data). Systematically compare the two environments rather than debugging the code.
+
+**Fix works but prediction was wrong.** This is the most dangerous pattern. The bug appears fixed, but the causal chain you identified was incorrect. The real cause is still present and will resurface. Keep investigating — you found a coincidental fix, not the root cause.
--- a/plugins/compound-engineering/skills/ce-debug/references/investigation-techniques.md
+++ b/plugins/compound-engineering/skills/ce-debug/references/investigation-techniques.md
@@ -0,0 +1,161 @@
+# Investigation Techniques
+
+Techniques for deeper investigation when standard code tracing is not enough. Load this when a bug does not reproduce reliably, involves timing or concurrency, or requires framework-specific tracing.
+
+---
+
+## Root-Cause Tracing
+
+When a bug manifests deep in the call stack, the instinct is to fix where the error appears. That treats a symptom. Instead, trace backward through the call chain to find where the bad state originated.
+
+**Backward tracing:**
+
+- Start at the error
+- At each level, ask: where did this value come from? Who called this function? What state was passed in?
+- Keep going upstream until finding the point where valid state first became invalid — that is the root cause
+
+**Worked example:**
+
+```
+Symptom: API returns 500 with "Cannot read property 'email' of undefined"
+Where it crashes: sendWelcomeEmail(user.email) in NotificationService
+Who called this? UserController.create() after saving the user record
+What was passed? user = await UserRepo.create(params) — but create() returns undefined on duplicate key
+Original cause: UserRepo.create() silently swallows duplicate key errors and returns undefined instead of throwing
+```
+
+The fix belongs at the origin (UserRepo.create should throw on duplicate key), not where the error appeared (NotificationService).
+
+**When manual tracing stalls**, add instrumentation:
+
+```
+// Before the problematic operation
+const stack = new Error().stack;
+console.error('DEBUG [operation]:', { value, cwd: process.cwd(), stack });
+```
+
+Use `console.error()` in tests — logger output may be suppressed. Log before the dangerous operation, not after it fails.
+
+---
+
+## Git Bisect for Regressions
+
+When a bug is a regression ("it worked before"), use binary search to find the breaking commit:
+
+```bash
+git bisect start
+git bisect bad                    # current commit is broken
+git bisect good <known-good-ref> # a commit where it worked
+# git bisect will checkout a middle commit — test it
+# mark as good or bad, repeat until the breaking commit is found
+git bisect reset                  # return to original branch when done
+```
+
+For automated bisection with a test script:
+
+```bash
+git bisect start HEAD <known-good-ref>
+git bisect run <test-command>
+```
+
+The test command should exit 0 for good, non-zero for bad.
+
+---
+
+## Intermittent Bug Techniques
+
+When a bug does not reproduce reliably after 2-3 attempts:
+
+**Logging traps.** Add targeted logging at the suspected failure point and run the scenario repeatedly. Capture the state that differs between passing and failing runs.
+
+**Statistical reproduction.** Run the failing scenario in a loop to establish a reproduction rate:
+
+```bash
+for i in $(seq 1 20); do echo "Run $i:"; <test-command> && echo "PASS" || echo "FAIL"; done
+```
+
+A 5% reproduction rate confirms the bug exists but suggests timing or data sensitivity.
+
+**Environment isolation.** Systematically eliminate variables:
+- Same test, different machine?
+- Same test, different data seed?
+- Same test, serial vs parallel execution?
+- Same test, with vs without network access?
+
+**Data-dependent triggers.** If the bug only appears with certain data, identify the trigger condition:
+- What is unique about the failing input?
+- Does the input size, encoding, or edge value matter?
+- Is the data order significant (sorted vs random)?
+
+---
+
+## Framework-Specific Debugging
+
+### Rails
+- Check callbacks: `before_save`, `after_commit`, `around_action` — these execute implicitly and can alter state
+- Check middleware chain: `rake middleware` lists the full stack
+- Check Active Record query generation: `.to_sql` on any relation
+- Use `Rails.logger.debug` with tagged logging for request tracing
+
+### Node.js
+- Async stack traces: run with `--async-stack-traces` flag for full async call chains
+- Unhandled rejections: check for missing `.catch()` or `await` on promises
+- Event loop delays: `process.hrtime()` before and after suspect operations
+- Memory leaks: `--inspect` flag + Chrome DevTools heap snapshots
+
+### Python
+- Traceback enrichment: `traceback.print_exc()` in except blocks
+- `pdb.set_trace()` or `breakpoint()` for interactive debugging
+- `sys.settrace()` for execution tracing
+- `logging.basicConfig(level=logging.DEBUG)` for verbose output
+
+---
+
+## Race Condition Investigation
+
+When timing or concurrency is suspected:
+
+**Timing isolation.** Add deliberate delays at suspect points to widen the race window and make it reproducible:
+
+```
+// Simulate slow operation to expose race
+await new Promise(r => setTimeout(r, 100));
+```
+
+**Shared mutable state.** Search for variables, caches, or database rows accessed by multiple threads or processes without synchronization. Common patterns:
+- Global or module-level mutable state
+- Cache reads without locks
+- Database rows read then updated without optimistic locking
+
+**Async ordering.** Check whether operations assume a specific execution order that is not guaranteed:
+- Promise.all with dependent operations
+- Event handlers that assume emission order
+- Database writes that assume read consistency
+
+---
+
+## Browser Debugging
+
+When investigating UI bugs with `agent-browser` or equivalent tools:
+
+```bash
+# Open the affected page
+agent-browser open http://localhost:${PORT:-3000}/affected/route
+
+# Capture current state
+agent-browser snapshot -i
+
+# Interact with the page
+agent-browser click @ref          # click an element
+agent-browser fill @ref "text"    # fill a form field
+agent-browser snapshot -i         # capture state after interaction
+
+# Save visual evidence
+agent-browser screenshot bug-evidence.png
+```
+
+**Port detection:** Check project instruction files (`AGENTS.md`, `CLAUDE.md`) for port references, then `package.json` dev scripts, then `.env` files, falling back to `3000`.
+
+**Console errors:** Check browser console output for JavaScript errors, failed network requests, and CORS issues. These often reveal the root cause of UI bugs before any code tracing is needed.
+
+**Network tab:** Check for failed API requests, unexpected response codes, or missing CORS headers. A 422 or 500 response from the backend narrows the investigation immediately.
--- a/plugins/compound-engineering/skills/reproduce-bug/SKILL.md
+++ b/plugins/compound-engineering/skills/reproduce-bug/SKILL.md
@@ -1,194 +0,0 @@
---
-name: reproduce-bug
-description: Systematically reproduce and investigate a bug from a GitHub issue. Use when the user provides a GitHub issue number or URL for a bug they want reproduced or investigated.
-argument-hint: "[GitHub issue number or URL]"
---
-
-# Reproduce Bug
-
-A framework-agnostic, hypothesis-driven workflow for reproducing and investigating bugs from issue reports. Works across any language, framework, or project type.
-
-## Phase 1: Understand the Issue
-
-Fetch and analyze the bug report to extract structured information before touching the codebase.
-
-### Fetch the issue
-
-If no issue number or URL was provided as an argument, ask the user for one before proceeding (using the platform's question tool -- e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini -- or present a prompt and wait for a reply).
-
-```bash
-gh issue view $ARGUMENTS --json title,body,comments,labels,assignees
-```
-
-If the argument is a URL rather than a number, extract the issue number or pass the URL directly to `gh`.
-
-### Extract key details
-
-Read the issue and comments, then identify:
-
- **Reported symptoms** -- what the user observed (error message, wrong output, visual glitch, crash)
- **Expected behavior** -- what should have happened instead
- **Reproduction steps** -- any steps the reporter provided
- **Environment clues** -- browser, OS, version, user role, data conditions
- **Frequency** -- always reproducible, intermittent, or one-time
-
-If the issue lacks reproduction steps or is ambiguous, note what is missing -- this shapes the investigation strategy.
-
-## Phase 2: Hypothesize
-
-Before running anything, form theories about the root cause. This focuses the investigation and prevents aimless exploration.
-
-### Search for relevant code
-
-Use the native content-search tool (e.g., Grep in Claude Code) to find code paths related to the reported symptoms. Search for:
-
- Error messages or strings mentioned in the issue
- Feature names, route paths, or UI labels described in the report
- Related model/service/controller names
-
-### Form hypotheses
-
-Based on the issue details and code search results, write down 2-3 plausible hypotheses. Each should identify:
-
- **What** might be wrong (e.g., "race condition in session refresh", "nil check missing on optional field")
- **Where** in the codebase (specific files and line ranges)
- **Why** it would produce the reported symptoms
-
-Rank hypotheses by likelihood. Start investigating the most likely one first.
-
-## Phase 3: Reproduce
-
-Attempt to trigger the bug. The reproduction strategy depends on the bug type.
-
-### Route A: Test-based reproduction (backend, logic, data bugs)
-
-Write or find an existing test that exercises the suspected code path:
-
-1. Search for existing test files covering the affected code using the native file-search tool (e.g., Glob in Claude Code)
-2. Run existing tests to see if any already fail
-3. If no test covers the scenario, write a minimal failing test that demonstrates the reported behavior
-4. A failing test that matches the reported symptoms confirms the bug
-
-### Route B: Browser-based reproduction (UI, visual, interaction bugs)
-
-Use the `agent-browser` CLI for browser automation. Do not use any alternative browser MCP integration or built-in browser-control tool. See the `agent-browser` skill for setup and detailed CLI usage.
-
-#### Verify server is running
-
-```bash
-agent-browser open http://localhost:${PORT:-3000}
-agent-browser snapshot -i
-```
-
-If the server is not running, ask the user to start their development server and provide the correct port.
-
-To detect the correct port, check project instruction files (`AGENTS.md`, `CLAUDE.md`) for port references, then `package.json` dev scripts, then `.env` files, falling back to `3000`.
-
-#### Follow reproduction steps
-
-Navigate to the affected area and execute the steps from the issue:
-
-```bash
-agent-browser open "http://localhost:${PORT}/[affected_route]"
-agent-browser snapshot -i
-```
-
-Use `agent-browser` commands to interact with the page:
- `agent-browser click @ref` -- click elements
- `agent-browser fill @ref "text"` -- fill form fields
- `agent-browser snapshot -i` -- capture current state
- `agent-browser screenshot bug-evidence.png` -- save visual evidence
-
-#### Capture the bug state
-
-When the bug is reproduced:
-1. Take a screenshot of the error state
-2. Check for console errors: look at browser output and any visible error messages
-3. Record the exact sequence of steps that triggered it
-
-### Route C: Manual / environment-specific reproduction
-
-For bugs that require specific data conditions, user roles, external service state, or cannot be automated:
-
-1. Document what conditions are needed
-2. Ask the user (using the platform's question tool -- e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini -- or present options and wait for a reply) whether they can set up the required conditions
-3. Guide them through manual reproduction steps if needed
-
-### If reproduction fails
-
-If the bug cannot be reproduced after trying the most likely hypotheses:
-
-1. Revisit the remaining hypotheses
-2. Check if the bug is environment-specific (version, OS, browser, data-dependent)
-3. Search the codebase for recent changes to the affected area: `git log --oneline -20 -- [affected_files]`
-4. Document what was tried and what conditions might be missing
-
-## Phase 4: Investigate
-
-Dig deeper into the root cause using whatever observability the project offers.
-
-### Check logs and traces
-
-Search for errors, warnings, or unexpected behavior around the time of reproduction. What to check depends on the bug and what the project has available:
-
- **Application logs** -- search local log output (dev server stdout, log files) for error patterns, stack traces, or warnings using the native content-search tool
- **Error tracking** -- check for related exceptions in the project's error tracker (Sentry, AppSignal, Bugsnag, Datadog, etc.)
- **Browser console** -- for UI bugs, check developer console output for JavaScript errors, failed network requests, or CORS issues
- **Database state** -- if the bug involves data, inspect relevant records for unexpected values, missing associations, or constraint violations
- **Request/response cycle** -- check server logs for the specific request: status codes, params, timing, middleware behavior
-
-### Trace the code path
-
-Starting from the entry point identified in Phase 2, trace the execution path:
-
-1. Read the relevant source files using the native file-read tool
-2. Identify where the behavior diverges from expectations
-3. Check edge cases: nil/null values, empty collections, boundary conditions, race conditions
-4. Look for recent changes that may have introduced the bug: `git log --oneline -10 -- [file]`
-
-## Phase 5: Document Findings
-
-Summarize everything discovered during the investigation.
-
-### Compile the report
-
-Organize findings into:
-
-1. **Root cause** -- what is actually wrong and where (with file paths and line numbers, e.g., `app/services/example_service.rb:42`)
-2. **Reproduction steps** -- verified steps to trigger the bug (mark as confirmed or unconfirmed)
-3. **Evidence** -- screenshots, test output, log excerpts, console errors
-4. **Suggested fix** -- if a fix is apparent, describe it with the specific code changes needed
-5. **Open questions** -- anything still unclear or needing further investigation
-
-### Present to user before any external action
-
-Present the full report to the user. Do not post comments to the GitHub issue or take any external action without explicit confirmation.
-
-Ask the user (using the platform's question tool, or present options and wait):
-
-```
-Investigation complete. How to proceed?
-
-1. Post findings to the issue as a comment
-2. Start working on a fix
-3. Just review the findings (no external action)
-```
-
-If the user chooses to post to the issue:
-
-```bash
-gh issue comment $ARGUMENTS --body "$(cat <<'EOF'
-## Bug Investigation
-
-**Root Cause:** [summary]
-
-**Reproduction Steps (verified):**
-1. [step]
-2. [step]
-
-**Relevant Code:** [file:line references]
-
-**Suggested Fix:** [description if applicable]
-EOF
-)"
-```