feat(ce-debug): add systematic debugging skill (#543)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:23:48 -07:00
parent b979143ad0
commit e38223ae91
5 changed files with 444 additions and 195 deletions
--- a/plugins/compound-engineering/skills/ce-debug/references/anti-patterns.md
+++ b/plugins/compound-engineering/skills/ce-debug/references/anti-patterns.md
@@ -0,0 +1,91 @@
+# Debugging Anti-Patterns
+
+Read this before forming hypotheses. These patterns describe the most common ways debugging goes wrong. They feel productive in the moment — that is what makes them dangerous.
+
+---
+
+## Prediction Quality
+
+The prediction requirement exists to prevent symptom-fixing. A prediction tests whether your understanding of the bug is correct, not just whether a fix makes the error go away.
+
+**Bad prediction (restates the hypothesis):**
+> Hypothesis: The null pointer is because `user` is not initialized.
+> Prediction: `user` will be null when I log it.
+
+This just re-describes the symptom. It cannot be wrong if the hypothesis is right — so it cannot catch a wrong hypothesis.
+
+**Good prediction (tests something non-obvious):**
+> Hypothesis: The null pointer is because the auth middleware skips initialization on cached requests.
+> Prediction: Non-cached requests to the same endpoint will NOT produce the null pointer, and the `X-Cache` header will be present on failing requests.
+
+This tests a different code path and a different observable. If the prediction is wrong — cached and non-cached requests both fail — the hypothesis is wrong even if "initializing user earlier" happens to fix the immediate error.
+
+**Rule of thumb:** A good prediction names something you have not looked at yet. If confirming the prediction requires only looking at the same line of code you already identified, the prediction is not adding information.
+
+---
+
+## Shotgun Debugging
+
+Changing multiple things at once to "see if it helps."
+
+**How it feels:** Productive. You're making changes, running tests, making progress.
+
+**What actually happens:** If the bug goes away, you do not know which change fixed it. If it persists, you do not know which changes are relevant. You have introduced variables instead of eliminating them.
+
+**The fix:** One hypothesis, one change, one test. If the first change does not fix it, revert it before trying the next. Changes should be additive to understanding, not cumulative to the codebase.
+
+---
+
+## Confirmation Bias
+
+Interpreting ambiguous evidence as supporting your current hypothesis.
+
+**How it looks:**
+- A log line that *could* support your theory — you treat it as proof
+- A test passes after your change — you declare the bug fixed without checking if the test was actually exercising the failure path
+- The error message changes slightly — you interpret the change as "getting closer" instead of recognizing a different failure mode
+
+**The defense:** Before declaring a hypothesis confirmed, ask: "What evidence would DISPROVE this hypothesis?" If you cannot name something that would change your mind, you are not testing — you are justifying.
+
+---
+
+## "It Works Now, Move On"
+
+The bug stops appearing after a change. The temptation is to declare victory and move on.
+
+**When this is a trap:** If you cannot explain WHY the change fixed the bug — the full causal chain from your change through the system to the symptom — you may have:
+- Fixed a symptom while the root cause remains
+- Introduced a change that masks the bug without resolving it
+- Gotten lucky with timing (especially for intermittent bugs)
+
+**The test:** Can you explain the fix to someone else without using the words "somehow" or "I think"? If not, the root cause is not confirmed.
+
+---
+
+## Thoughts That Signal You Are About to Shortcut
+
+These feel like reasonable next steps. They are warning signs that investigation is being skipped.
+
+**Proposing a fix before explaining the cause.** If the words "I think we should change..." come before "the root cause is...", pause. The fix might be right, but without a confirmed causal chain there is no way to know. Explain the cause first.
+
+**Reaching for another attempt without new information.** After 2-3 failed hypotheses, trying a 4th without learning something new from the failures is not debugging — it is guessing with increasing frustration. Stop and diagnose why previous hypotheses failed (see smart escalation).
+
+**Certainty without evidence.** The feeling of "I know what this is" before reading the relevant code. Experienced developers have strong pattern-matching instincts, and they are right often enough to be dangerous when wrong. Read the code even when you are confident.
+
+**Minimizing the scope.** "It is probably just..." — the word "just" signals an assumption that the problem is small. Small problems do not resist 2-3 fix attempts. If you are still debugging, it is not "just" anything.
+
+**Treating environmental differences as irrelevant.** When something works in one environment and fails in another, the difference between environments IS the investigation. Do not dismiss it — compare them systematically.
+
+---
+
+## Smart Escalation Patterns
+
+When 2-3 hypotheses have been tested and none confirmed, the problem is not "I need hypothesis #4." The problem is usually one of these:
+
+**Different subsystems keep appearing.** Hypothesis 1 pointed to auth, hypothesis 2 to the database, hypothesis 3 to caching. This scatter pattern means the bug is not in any one subsystem — it is in the interaction between them, or in an architectural assumption that cuts across all of them. This is a design problem, not a localized bug.
+
+**Evidence contradicts itself.** The logs say X happened, but the code makes X impossible. The test fails with error A, but the code path that produces error A is unreachable from the test. When evidence contradicts, the mental model is wrong. Step back. Re-read the code from the entry point without any assumptions about what it does.
+
+**Works locally, fails elsewhere.** The most common causes: environment variables, dependency versions, file system differences (case sensitivity, path separators), timing differences (faster/slower machines), and data differences (test fixtures vs production data). Systematically compare the two environments rather than debugging the code.
+
+**Fix works but prediction was wrong.** This is the most dangerous pattern. The bug appears fixed, but the causal chain you identified was incorrect. The real cause is still present and will resurface. Keep investigating — you found a coincidental fix, not the root cause.
--- a/plugins/compound-engineering/skills/ce-debug/references/investigation-techniques.md
+++ b/plugins/compound-engineering/skills/ce-debug/references/investigation-techniques.md
@@ -0,0 +1,161 @@
+# Investigation Techniques
+
+Techniques for deeper investigation when standard code tracing is not enough. Load this when a bug does not reproduce reliably, involves timing or concurrency, or requires framework-specific tracing.
+
+---
+
+## Root-Cause Tracing
+
+When a bug manifests deep in the call stack, the instinct is to fix where the error appears. That treats a symptom. Instead, trace backward through the call chain to find where the bad state originated.
+
+**Backward tracing:**
+
+- Start at the error
+- At each level, ask: where did this value come from? Who called this function? What state was passed in?
+- Keep going upstream until finding the point where valid state first became invalid — that is the root cause
+
+**Worked example:**
+
+```
+Symptom: API returns 500 with "Cannot read property 'email' of undefined"
+Where it crashes: sendWelcomeEmail(user.email) in NotificationService
+Who called this? UserController.create() after saving the user record
+What was passed? user = await UserRepo.create(params) — but create() returns undefined on duplicate key
+Original cause: UserRepo.create() silently swallows duplicate key errors and returns undefined instead of throwing
+```
+
+The fix belongs at the origin (UserRepo.create should throw on duplicate key), not where the error appeared (NotificationService).
+
+**When manual tracing stalls**, add instrumentation:
+
+```
+// Before the problematic operation
+const stack = new Error().stack;
+console.error('DEBUG [operation]:', { value, cwd: process.cwd(), stack });
+```
+
+Use `console.error()` in tests — logger output may be suppressed. Log before the dangerous operation, not after it fails.
+
+---
+
+## Git Bisect for Regressions
+
+When a bug is a regression ("it worked before"), use binary search to find the breaking commit:
+
+```bash
+git bisect start
+git bisect bad                    # current commit is broken
+git bisect good <known-good-ref> # a commit where it worked
+# git bisect will checkout a middle commit — test it
+# mark as good or bad, repeat until the breaking commit is found
+git bisect reset                  # return to original branch when done
+```
+
+For automated bisection with a test script:
+
+```bash
+git bisect start HEAD <known-good-ref>
+git bisect run <test-command>
+```
+
+The test command should exit 0 for good, non-zero for bad.
+
+---
+
+## Intermittent Bug Techniques
+
+When a bug does not reproduce reliably after 2-3 attempts:
+
+**Logging traps.** Add targeted logging at the suspected failure point and run the scenario repeatedly. Capture the state that differs between passing and failing runs.
+
+**Statistical reproduction.** Run the failing scenario in a loop to establish a reproduction rate:
+
+```bash
+for i in $(seq 1 20); do echo "Run $i:"; <test-command> && echo "PASS" || echo "FAIL"; done
+```
+
+A 5% reproduction rate confirms the bug exists but suggests timing or data sensitivity.
+
+**Environment isolation.** Systematically eliminate variables:
+- Same test, different machine?
+- Same test, different data seed?
+- Same test, serial vs parallel execution?
+- Same test, with vs without network access?
+
+**Data-dependent triggers.** If the bug only appears with certain data, identify the trigger condition:
+- What is unique about the failing input?
+- Does the input size, encoding, or edge value matter?
+- Is the data order significant (sorted vs random)?
+
+---
+
+## Framework-Specific Debugging
+
+### Rails
+- Check callbacks: `before_save`, `after_commit`, `around_action` — these execute implicitly and can alter state
+- Check middleware chain: `rake middleware` lists the full stack
+- Check Active Record query generation: `.to_sql` on any relation
+- Use `Rails.logger.debug` with tagged logging for request tracing
+
+### Node.js
+- Async stack traces: run with `--async-stack-traces` flag for full async call chains
+- Unhandled rejections: check for missing `.catch()` or `await` on promises
+- Event loop delays: `process.hrtime()` before and after suspect operations
+- Memory leaks: `--inspect` flag + Chrome DevTools heap snapshots
+
+### Python
+- Traceback enrichment: `traceback.print_exc()` in except blocks
+- `pdb.set_trace()` or `breakpoint()` for interactive debugging
+- `sys.settrace()` for execution tracing
+- `logging.basicConfig(level=logging.DEBUG)` for verbose output
+
+---
+
+## Race Condition Investigation
+
+When timing or concurrency is suspected:
+
+**Timing isolation.** Add deliberate delays at suspect points to widen the race window and make it reproducible:
+
+```
+// Simulate slow operation to expose race
+await new Promise(r => setTimeout(r, 100));
+```
+
+**Shared mutable state.** Search for variables, caches, or database rows accessed by multiple threads or processes without synchronization. Common patterns:
+- Global or module-level mutable state
+- Cache reads without locks
+- Database rows read then updated without optimistic locking
+
+**Async ordering.** Check whether operations assume a specific execution order that is not guaranteed:
+- Promise.all with dependent operations
+- Event handlers that assume emission order
+- Database writes that assume read consistency
+
+---
+
+## Browser Debugging
+
+When investigating UI bugs with `agent-browser` or equivalent tools:
+
+```bash
+# Open the affected page
+agent-browser open http://localhost:${PORT:-3000}/affected/route
+
+# Capture current state
+agent-browser snapshot -i
+
+# Interact with the page
+agent-browser click @ref          # click an element
+agent-browser fill @ref "text"    # fill a form field
+agent-browser snapshot -i         # capture state after interaction
+
+# Save visual evidence
+agent-browser screenshot bug-evidence.png
+```
+
+**Port detection:** Check project instruction files (`AGENTS.md`, `CLAUDE.md`) for port references, then `package.json` dev scripts, then `.env` files, falling back to `3000`.
+
+**Console errors:** Check browser console output for JavaScript errors, failed network requests, and CORS issues. These often reveal the root cause of UI bugs before any code tracing is needed.
+
+**Network tab:** Check for failed API requests, unexpected response codes, or missing CORS headers. A 422 or 500 response from the backend narrows the investigation immediately.