feat(testing): close the testing gap in ce:work, ce:plan, and testing-reviewer (#438)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-03-29 13:07:05 -07:00
committed by GitHub
parent a301a08205
commit 35678b8add
8 changed files with 399 additions and 6 deletions

View File

@@ -336,7 +336,7 @@ For each unit, include:
- **Execution note** - optional, only when the unit benefits from a non-default execution posture such as test-first, characterization-first, or external delegation
- **Technical design** - optional pseudo-code or diagram when the unit's approach is non-obvious and prose alone would leave it ambiguous. Frame explicitly as directional guidance, not implementation specification
- **Patterns to follow** - existing code or conventions to mirror
- **Test scenarios** - enumerate the specific test cases the implementer should write, right-sized to the unit's complexity and risk. Consider each category below and include scenarios from every category that applies to this unit. A simple config change may need one scenario; a payment flow may need a dozen. The quality signal is specificity — each scenario should name the input, action, and expected outcome so the implementer doesn't have to invent coverage.
- **Test scenarios** - enumerate the specific test cases the implementer should write, right-sized to the unit's complexity and risk. Consider each category below and include scenarios from every category that applies to this unit. A simple config change may need one scenario; a payment flow may need a dozen. The quality signal is specificity — each scenario should name the input, action, and expected outcome so the implementer doesn't have to invent coverage. For units with no behavioral change (pure config, scaffolding, styling), use `Test expectation: none -- [reason]` instead of leaving the field blank.
- **Happy path behaviors** - core functionality with expected inputs and outputs
- **Edge cases** (when the unit has meaningful boundaries) - boundary values, empty inputs, nil/null states, concurrent access
- **Error and failure paths** (when the unit has failure modes) - invalid input, downstream service failures, timeout behavior, permission denials
@@ -496,7 +496,7 @@ deepened: YYYY-MM-DD # optional, set when the confidence check substantively st
- [Existing file, class, or pattern]
**Test scenarios:**
<!-- Include only categories that apply to this unit. Omit categories that don't. -->
<!-- Include only categories that apply to this unit. Omit categories that don't. For units with no behavioral change, use "Test expectation: none -- [reason]" instead of leaving this section blank. -->
- [Scenario: specific input/action -> expected outcome. Prefix with category — Happy path, Edge case, Error path, or Integration — to signal intent]
**Verification:**
@@ -622,6 +622,7 @@ Before finalizing, check:
- If test-first or characterization-first posture was explicit or strongly implied, the relevant units carry it forward with a lightweight `Execution note`
- Each feature-bearing unit has test scenarios from every applicable category (happy path, edge cases, error paths, integration) — right-sized to the unit's complexity, not padded or skimped
- Test scenarios name specific inputs, actions, and expected outcomes without becoming test code
- Feature-bearing units with blank or missing test scenarios are flagged as incomplete — feature-bearing units must have actual test scenarios, not just an annotation. The `Test expectation: none -- [reason]` annotation is only valid for non-feature-bearing units (pure config, scaffolding, styling)
- Deferred items are explicit and not hidden as fake certainty
- If a High-Level Technical Design section is included, it uses the right medium for the work, carries the non-prescriptive framing, and does not contain implementation code (no imports, exact signatures, or framework-specific syntax)
- Per-unit technical design fields, if present, are concise and directional rather than copy-paste-ready
@@ -748,6 +749,7 @@ If the plan already has a `deepened:` date:
- Units are too large, too vague, or broken into micro-steps
- Approach notes are thin or do not name the pattern to follow
- Test scenarios are vague (don't name inputs and expected outcomes), skip applicable categories (e.g., no error paths for a unit with failure modes, no integration scenarios for a unit crossing layers), or are disproportionate to the unit's complexity
- Feature-bearing units have blank or missing test scenarios (feature-bearing units require actual test scenarios; the `Test expectation: none` annotation is only valid for non-feature-bearing units)
- Verification outcomes are vague or not expressed as observable results
**System-Wide Impact**

View File

@@ -151,6 +151,7 @@ Determine how to proceed based on what was provided in `<input_document>`.
- Add, update, or remove tests to match implementation changes (see Test Discovery below)
- Run System-Wide Test Check (see below)
- Run tests after changes
- Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
- Mark task as completed
- Evaluate for incremental commit (see below)
```
@@ -295,7 +296,7 @@ Determine how to proceed based on what was provided in `<input_document>`.
3. **Final Validation**
- All tasks marked completed
- All tests pass
- Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
- Linting passes
- Code follows existing patterns
- Figma designs match (if applicable)
@@ -503,7 +504,7 @@ Before creating PR, verify:
- [ ] All clarifying questions asked and answered
- [ ] All tasks marked completed
- [ ] Tests pass (run project's test command)
- [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
- [ ] Linting passes (use linting-agent)
- [ ] Code follows existing patterns
- [ ] Figma designs match implementation (if applicable)

View File

@@ -150,6 +150,7 @@ Determine how to proceed based on what was provided in `<input_document>`.
- Add, update, or remove tests to match implementation changes (see Test Discovery below)
- Run System-Wide Test Check (see below)
- Run tests after changes
- Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
- Mark task as completed
- Evaluate for incremental commit (see below)
```
@@ -286,7 +287,7 @@ Determine how to proceed based on what was provided in `<input_document>`.
3. **Final Validation**
- All tasks marked completed
- All tests pass
- Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
- Linting passes
- Code follows existing patterns
- Figma designs match (if applicable)
@@ -430,7 +431,7 @@ Before creating PR, verify:
- [ ] All clarifying questions asked and answered
- [ ] All tasks marked completed
- [ ] Tests pass (run project's test command)
- [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
- [ ] Linting passes (use linting-agent)
- [ ] Code follows existing patterns
- [ ] Figma designs match implementation (if applicable)