16 KiB
title, type, status, date, origin
| title | type | status | date | origin |
|---|---|---|---|---|
| feat: Close the testing gap in ce:work, ce:plan, and testing-reviewer | feat | active | 2026-03-29 | docs/brainstorms/2026-03-29-testing-addressed-gate-requirements.md |
feat: Close the testing gap in ce:work, ce:plan, and testing-reviewer
Overview
Targeted edits to three skill/agent files to make "no tests" a deliberate decision rather than an accidental omission. Adds per-task testing deliberation in ce:work's execution loop, blank-test-scenarios handling in ce:plan's review, and a missing-test-pattern check in the testing-reviewer agent. Ships with contract tests following the existing repo pattern.
Problem Frame
ce:work has thorough testing instructions but two narrow gaps let untested behavioral changes slip through silently: the quality gate says "All tests pass" (vacuously true with no tests), and ce:plan allows blank test scenarios without annotation. The testing-reviewer catches some gaps after the fact but doesn't flag the broad pattern of behavioral changes with zero test additions. (see origin: docs/brainstorms/2026-03-29-testing-addressed-gate-requirements.md)
Requirements Trace
- R1. ce:plan units with no test scenarios should annotate why, not leave the field blank
- R2. Blank test scenarios on feature-bearing units treated as incomplete in Phase 5.1 review
- R3. Per-task testing deliberation in ce:work's execution loop before marking a task done
- R4. Quality checklist and Final Validation updated from "Tests pass" to "Testing addressed"
- R5. Apply R3 and R4 to ce:work-beta with explicit sync decision
- R6. testing-reviewer adds a check for behavioral changes with no corresponding test additions
- R7. New check complements existing checks (untested branches, weak assertions, brittle tests, missing edge cases)
- R8. Contract tests verifying each behavioral change ships as intended
Scope Boundaries
- Prompt-level changes only -- no CI enforcement, no programmatic gates
- No new abstractions (no "testing assessment artifacts" or structured output schemas)
- No changes to testing-reviewer's output format (findings JSON stays the same)
- Deliberate test omission with justification is a valid outcome
Context & Research
Relevant Code and Patterns
plugins/compound-engineering/skills/ce-plan/SKILL.md— Phase 5.1 review checklist at lines 583-601, test scenario quality checks at lines 591-592. Two edit sites: instruction prose for Test scenarios at line 339 (section 3.5), and plan output template with HTML comment at line 499plugins/compound-engineering/skills/ce-work/SKILL.md— Phase 2 task loop at lines ~143-155, Final Validation at lines 287-295 ("All tests pass"), Quality Checklist at lines 427-443 ("Tests pass (run project's test command)")plugins/compound-engineering/skills/ce-work-beta/SKILL.md— Identical loop/checklist structure. Final Validation at lines 296-304, Quality Checklist at lines 500-516plugins/compound-engineering/agents/review/ce-testing-reviewer.agent.md— 4 existing checks in "What you're hunting for" (lines 15-20), confidence calibration (lines 22-29), output format (lines 37-48)tests/pipeline-review-contract.test.ts— Contract tests for ce:work, ce:work-beta, ce:brainstorm, ce:plan usingreadRepoFile()+toContain/not.toContainassertionstests/review-skill-contract.test.ts— Contract tests for ce:review agent using same pattern, includes frontmatter parsing and cross-file schema alignment
Institutional Learnings
- Beta-to-stable sync must be explicit per AGENTS.md (lines 161-163). The existing
pipeline-review-contract.test.tsalready tests ce:work-beta mirrors ce:work's review contract — follow same pattern. - Skill review checklist warns against contradictory rules across phases — the new "testing deliberation" must complement, not contradict, existing "Run tests after changes" instruction.
- Use negative assertions (
not.toContain) to prevent regression — assert old "Tests pass" / "All tests pass" language is fully replaced.
Key Technical Decisions
- Testing deliberation goes after "Run tests after changes" in the loop: This is the natural deliberation point — tests have just run (or not), and the agent should assess whether testing was adequately addressed before marking the task done. Placing it earlier (before test execution) would be premature; placing it at "Mark task as completed" would intermingle it with completion bookkeeping.
- Annotation uses existing template field, not a new field:
Test expectation: none -- [reason]goes in the Test scenarios section rather than adding a new template field. This keeps the template stable and leverages the existing Phase 5.1 check surface. - New testing-reviewer check is a 5th bullet, not a replacement: It's conceptually distinct from check #1 (untested branches within new code). Check #1 looks at branch coverage within tests that exist; the new check flags when no tests exist at all for behavioral changes.
- Contract tests extend existing files: New ce:work/ce:plan assertions go in
pipeline-review-contract.test.ts. Testing-reviewer assertion goes inreview-skill-contract.test.ts. This follows the established convention rather than creating a new file.
Open Questions
Resolved During Planning
- Where does testing deliberation go in the loop? After "Run tests after changes" (bullet 8) and before "Mark task as completed" (bullet 9). The agent has just run tests or skipped them — now it deliberates.
- What annotation format for units with no tests?
Test expectation: none -- [reason]in the Test scenarios field. Follows existing template structure. - Where does the new check go in testing-reviewer? 5th bullet in "What you're hunting for" after the existing 4 checks.
- New test file or extend existing? Extend existing —
pipeline-review-contract.test.tsfor skill changes,review-skill-contract.test.tsfor the agent change.
Deferred to Implementation
- Exact wording of the testing deliberation prompt in the execution loop — should be concise and action-oriented, final phrasing determined during implementation
- Whether the testing-reviewer's "What you don't flag" section needs a corresponding exclusion for non-behavioral changes (config, formatting, comments) — inspect during implementation
Implementation Units
- Unit 1: ce:plan — Blank test scenarios handling
Goal: Make blank test scenarios on feature-bearing units flagged as incomplete during plan review, and establish the annotation convention for units that genuinely need no tests.
Requirements: R1, R2
Dependencies: None
Files:
- Modify:
plugins/compound-engineering/skills/ce-plan/SKILL.md
Approach:
- Two edit sites in ce:plan for the annotation convention:
- The instruction prose (section 3.5, around line 339) that describes how to write Test scenarios — mention the
Test expectation: none -- [reason]convention here so the planner agent learns it when reading instructions - The plan output template (around line 499) which contains the HTML comment
<!-- Include only categories that apply to this unit. Omit categories that don't. -->— update this comment to also show the annotation convention for units with no test scenarios
- The instruction prose (section 3.5, around line 339) that describes how to write Test scenarios — mention the
- In Phase 5.1 review checklist (after line 592), add a new bullet: blank or missing test scenarios on a feature-bearing unit (as defined by ce:plan's existing Plan Quality Bar language) should be flagged as incomplete
- In the Phase 5.3.3 confidence-scoring checklist for Implementation Units (around line 717), add a parallel item so the confidence check also catches blank test scenarios
Patterns to follow:
- Existing Phase 5.1 test scenario quality checks at lines 591-592
- The unit template comment style at line 499
- ce:plan's existing "feature-bearing unit" terminology in the Plan Quality Bar
Test scenarios:
- Happy path: Plan with a feature-bearing unit that has
Test expectation: none -- config-only changein test scenarios -> Phase 5.1 review accepts it - Error path: Plan with a feature-bearing unit that has a completely blank/absent Test scenarios field -> Phase 5.1 review flags it as incomplete
- Happy path: Plan with a non-feature-bearing unit (scaffolding, config) that uses the annotation -> accepted without issue
Verification:
- Phase 5.1 checklist explicitly addresses blank test scenarios
- Plan template comment mentions the
Test expectation: none -- [reason]convention - Confidence scoring checklist includes blank test scenarios as a scoring trigger
- Unit 2: ce:work and ce:work-beta — Testing deliberation and checklist update
Goal: Add per-task testing deliberation to the execution loop and update both checklist surfaces from "Tests pass" to "Testing addressed."
Requirements: R3, R4, R5
Dependencies: None
Files:
- Modify:
plugins/compound-engineering/skills/ce-work/SKILL.md - Modify:
plugins/compound-engineering/skills/ce-work-beta/SKILL.md
Approach:
- In the Phase 2 task execution loop (lines ~143-155 in ce:work, ~144-156 in ce:work-beta), add a new bullet between "Run tests after changes" and "Mark task as completed". The new bullet should prompt the agent to assess: did this task change behavior? If yes, were tests written or updated? If no tests were added, what is the justification? Keep it concise — 2-3 questions in one bullet, matching the existing loop bullet style. Do not expand into a multi-paragraph section
- In the Quality Checklist (ce:work line ~433, ce:work-beta line ~506), replace
- [ ] Tests pass (run project's test command)with- [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed) - In the Final Validation (ce:work line ~289, ce:work-beta line ~298), replace
- All tests passwith- Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed) - Ensure both files receive identical changes
Sync decision: Propagating to beta — shared testing deliberation guidance, not experimental delegate-mode behavior.
Patterns to follow:
- Existing execution loop bullet style at lines 138-155
- Existing Quality Checklist item style (checkbox with parenthetical guidance)
- The mandatory review pattern (which was also synced identically between stable and beta)
Test scenarios:
- Happy path: ce:work execution loop includes the testing deliberation step in the correct position (after "Run tests" and before "Mark task as completed")
- Happy path: Quality Checklist contains "Testing addressed" and does not contain "Tests pass (run project's test command)"
- Happy path: Final Validation contains "Testing addressed" and does not contain "All tests pass"
- Integration: ce:work-beta has identical testing deliberation and checklist wording as ce:work
Verification:
- Both files contain the testing deliberation step in the execution loop
- Both files' Quality Checklist and Final Validation use "Testing addressed" language
- Old "Tests pass" and "All tests pass" language is fully removed from both files
- Unit 3: testing-reviewer — Behavioral changes with no test additions check
Goal: Add a 5th check to the testing-reviewer agent that flags behavioral code changes in the diff with zero corresponding test additions or modifications.
Requirements: R6, R7
Dependencies: None
Files:
- Modify:
plugins/compound-engineering/agents/review/ce-testing-reviewer.agent.md
Approach:
- Add a 5th bold-titled bullet in "What you're hunting for" (after the existing 4th check at line 20). The check should: describe the pattern (behavioral code changes — new logic branches, state mutations, API changes — with zero corresponding test file additions or modifications in the diff), explain what makes it distinct from check #1 (which looks at untested branches within code that has tests, while this flags when no tests exist at all), and note that non-behavioral changes (config, formatting, comments, type-only changes) are excluded
- Consider adding a corresponding item in "What you don't flag" for non-behavioral changes if it adds clarity
Patterns to follow:
- Existing check format: bold title followed by
--and explanation - Existing checks use specific, concrete language ("new
if/else,switch,try/catch") - Confidence calibration tiers (High 0.80+ when provable from diff alone)
Test scenarios:
- Happy path: testing-reviewer.md "What you're hunting for" section contains the behavioral-changes-with-no-tests check
- Happy path: Check is described as distinct from existing untested-branches check
Verification:
- testing-reviewer.md has 5 checks in "What you're hunting for" instead of 4
- The new check specifically addresses "behavioral changes with no corresponding test additions"
- Unit 4: Contract tests for all changes
Goal: Add contract tests that verify each skill/agent modification ships as intended, following the existing string-assertion pattern.
Requirements: R8
Dependencies: Units 1, 2, 3
Files:
- Modify:
tests/pipeline-review-contract.test.ts - Modify:
tests/review-skill-contract.test.ts
Approach:
- In
pipeline-review-contract.test.ts, extend the existingce:work review contractdescribe block with new tests:- ce:work includes testing deliberation in execution loop
- ce:work Quality Checklist contains "Testing addressed" and does not contain "Tests pass (run project's test command)"
- ce:work Final Validation contains "Testing addressed" and does not contain "All tests pass"
- ce:work-beta mirrors all testing deliberation and checklist changes
- In
pipeline-review-contract.test.ts, extend or add ace:plan review contracttest:- ce:plan Phase 5.1 review addresses blank test scenarios on feature-bearing units
- In
review-skill-contract.test.ts, add a new describe block for testing-reviewer:- testing-reviewer includes the behavioral-changes-with-no-test-additions check
Use negative assertions (not.toContain) for the old checklist language to prevent regression.
Patterns to follow:
readRepoFile()helper +expect(content).toContain(...)/expect(content).not.toContain(...)in existing contract tests- ce:work-beta mirror test pattern at pipeline-review-contract.test.ts lines 39-50
describe/testblock naming convention in both files
Test scenarios:
- Happy path: All new contract tests pass after Units 1-3 are complete
- Error path: Reverting any skill change causes the corresponding contract test to fail (verified by inspection of assertion specificity)
Verification:
bun testpasses with the new contract tests- Each R3-R7 change surface has at least one contract test assertion
System-Wide Impact
- Interaction graph: These are prompt-level skill edits. No callbacks, middleware, or runtime dependencies. The testing-reviewer is invoked by ce:review which is invoked by ce:work — the chain is: ce:work -> ce:review -> testing-reviewer. Changes to the reviewer's check list affect what ce:review surfaces but not how it surfaces it.
- Error propagation: Not applicable — no runtime error paths. If the testing deliberation prompt is poorly worded, the worst case is the agent ignores it (same as today).
- API surface parity: ce:work and ce:work-beta must remain in sync per AGENTS.md. Contract tests enforce this.
- Unchanged invariants: The testing-reviewer's output format (JSON with
findings,residual_risks,testing_gaps) is unchanged. The plan template's structure is unchanged — only the comment and Phase 5.1 checklist are modified.
Risks & Dependencies
| Risk | Mitigation |
|---|---|
| Testing deliberation prompt is too verbose and gets ignored by the agent | Keep it concise — 2-3 questions, not a paragraph. Match the existing loop bullet style. |
| Old "Tests pass" language persists in one location, creating contradiction | Negative contract test assertions (not.toContain) catch any leftover old language |
| ce:work-beta drifts from ce:work | Contract tests explicitly assert both files contain identical testing changes |
Sources & References
- Origin document: docs/brainstorms/2026-03-29-testing-addressed-gate-requirements.md
- Related learning:
docs/solutions/skill-design/beta-promotion-orchestration-contract.md - Related learning:
docs/solutions/skill-design/compound-refresh-skill-improvements.md(avoid contradictory rules across phases) - Related test:
tests/pipeline-review-contract.test.ts - Related test:
tests/review-skill-contract.test.ts