john/claude-engineering-plugin

Fork 0

Files

Trevin Chow 5c0ec9137a

CI / pr-title (push) Has been cancelled

Details

CI / test (push) Has been cancelled

Details

Release PR / release-pr (push) Has been cancelled

Details

Release PR / publish-cli (push) Has been cancelled

Details

refactor(cli)!: rename all skills and agents to consistent ce- prefix (#503 )

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-18 15:44:22 -07:00

16 KiB

Raw Blame History

title, type, status, date, origin

title	type	status	date	origin
feat: Close the testing gap in ce:work, ce:plan, and testing-reviewer	feat	active	2026-03-29	docs/brainstorms/2026-03-29-testing-addressed-gate-requirements.md

feat: Close the testing gap in ce:work, ce:plan, and testing-reviewer

Overview

Targeted edits to three skill/agent files to make "no tests" a deliberate decision rather than an accidental omission. Adds per-task testing deliberation in ce:work's execution loop, blank-test-scenarios handling in ce:plan's review, and a missing-test-pattern check in the testing-reviewer agent. Ships with contract tests following the existing repo pattern.

Problem Frame

ce:work has thorough testing instructions but two narrow gaps let untested behavioral changes slip through silently: the quality gate says "All tests pass" (vacuously true with no tests), and ce:plan allows blank test scenarios without annotation. The testing-reviewer catches some gaps after the fact but doesn't flag the broad pattern of behavioral changes with zero test additions. (see origin: docs/brainstorms/2026-03-29-testing-addressed-gate-requirements.md)

Requirements Trace

R1. ce:plan units with no test scenarios should annotate why, not leave the field blank
R2. Blank test scenarios on feature-bearing units treated as incomplete in Phase 5.1 review
R3. Per-task testing deliberation in ce:work's execution loop before marking a task done
R4. Quality checklist and Final Validation updated from "Tests pass" to "Testing addressed"
R5. Apply R3 and R4 to ce:work-beta with explicit sync decision
R6. testing-reviewer adds a check for behavioral changes with no corresponding test additions
R7. New check complements existing checks (untested branches, weak assertions, brittle tests, missing edge cases)
R8. Contract tests verifying each behavioral change ships as intended

Scope Boundaries

Prompt-level changes only -- no CI enforcement, no programmatic gates
No new abstractions (no "testing assessment artifacts" or structured output schemas)
No changes to testing-reviewer's output format (findings JSON stays the same)
Deliberate test omission with justification is a valid outcome

Context & Research

Relevant Code and Patterns

plugins/compound-engineering/skills/ce-plan/SKILL.md — Phase 5.1 review checklist at lines 583-601, test scenario quality checks at lines 591-592. Two edit sites: instruction prose for Test scenarios at line 339 (section 3.5), and plan output template with HTML comment at line 499
plugins/compound-engineering/skills/ce-work/SKILL.md — Phase 2 task loop at lines ~143-155, Final Validation at lines 287-295 ("All tests pass"), Quality Checklist at lines 427-443 ("Tests pass (run project's test command)")
plugins/compound-engineering/skills/ce-work-beta/SKILL.md — Identical loop/checklist structure. Final Validation at lines 296-304, Quality Checklist at lines 500-516
plugins/compound-engineering/agents/review/ce-testing-reviewer.agent.md — 4 existing checks in "What you're hunting for" (lines 15-20), confidence calibration (lines 22-29), output format (lines 37-48)
tests/pipeline-review-contract.test.ts — Contract tests for ce:work, ce:work-beta, ce:brainstorm, ce:plan using readRepoFile() + toContain/not.toContain assertions
tests/review-skill-contract.test.ts — Contract tests for ce:review agent using same pattern, includes frontmatter parsing and cross-file schema alignment

Institutional Learnings

Beta-to-stable sync must be explicit per AGENTS.md (lines 161-163). The existing pipeline-review-contract.test.ts already tests ce:work-beta mirrors ce:work's review contract — follow same pattern.
Skill review checklist warns against contradictory rules across phases — the new "testing deliberation" must complement, not contradict, existing "Run tests after changes" instruction.
Use negative assertions (not.toContain) to prevent regression — assert old "Tests pass" / "All tests pass" language is fully replaced.

Key Technical Decisions

Testing deliberation goes after "Run tests after changes" in the loop: This is the natural deliberation point — tests have just run (or not), and the agent should assess whether testing was adequately addressed before marking the task done. Placing it earlier (before test execution) would be premature; placing it at "Mark task as completed" would intermingle it with completion bookkeeping.
Annotation uses existing template field, not a new field: Test expectation: none -- [reason] goes in the Test scenarios section rather than adding a new template field. This keeps the template stable and leverages the existing Phase 5.1 check surface.
New testing-reviewer check is a 5th bullet, not a replacement: It's conceptually distinct from check #1 (untested branches within new code). Check #1 looks at branch coverage within tests that exist; the new check flags when no tests exist at all for behavioral changes.
Contract tests extend existing files: New ce:work/ce:plan assertions go in pipeline-review-contract.test.ts. Testing-reviewer assertion goes in review-skill-contract.test.ts. This follows the established convention rather than creating a new file.

Open Questions

Resolved During Planning

Where does testing deliberation go in the loop? After "Run tests after changes" (bullet 8) and before "Mark task as completed" (bullet 9). The agent has just run tests or skipped them — now it deliberates.
What annotation format for units with no tests? Test expectation: none -- [reason] in the Test scenarios field. Follows existing template structure.
Where does the new check go in testing-reviewer? 5th bullet in "What you're hunting for" after the existing 4 checks.
New test file or extend existing? Extend existing — pipeline-review-contract.test.ts for skill changes, review-skill-contract.test.ts for the agent change.

Deferred to Implementation

Exact wording of the testing deliberation prompt in the execution loop — should be concise and action-oriented, final phrasing determined during implementation
Whether the testing-reviewer's "What you don't flag" section needs a corresponding exclusion for non-behavioral changes (config, formatting, comments) — inspect during implementation

Implementation Units

Unit 1: ce:plan — Blank test scenarios handling

Goal: Make blank test scenarios on feature-bearing units flagged as incomplete during plan review, and establish the annotation convention for units that genuinely need no tests.

Requirements: R1, R2

Dependencies: None

Files:

Modify: plugins/compound-engineering/skills/ce-plan/SKILL.md

Approach:

Two edit sites in ce:plan for the annotation convention:
- The instruction prose (section 3.5, around line 339) that describes how to write Test scenarios — mention the Test expectation: none -- [reason] convention here so the planner agent learns it when reading instructions
- The plan output template (around line 499) which contains the HTML comment  — update this comment to also show the annotation convention for units with no test scenarios
In Phase 5.1 review checklist (after line 592), add a new bullet: blank or missing test scenarios on a feature-bearing unit (as defined by ce:plan's existing Plan Quality Bar language) should be flagged as incomplete
In the Phase 5.3.3 confidence-scoring checklist for Implementation Units (around line 717), add a parallel item so the confidence check also catches blank test scenarios

Patterns to follow:

Existing Phase 5.1 test scenario quality checks at lines 591-592
The unit template comment style at line 499
ce:plan's existing "feature-bearing unit" terminology in the Plan Quality Bar

Test scenarios:

Happy path: Plan with a feature-bearing unit that has Test expectation: none -- config-only change in test scenarios -> Phase 5.1 review accepts it
Error path: Plan with a feature-bearing unit that has a completely blank/absent Test scenarios field -> Phase 5.1 review flags it as incomplete
Happy path: Plan with a non-feature-bearing unit (scaffolding, config) that uses the annotation -> accepted without issue

Verification:

Phase 5.1 checklist explicitly addresses blank test scenarios
Plan template comment mentions the Test expectation: none -- [reason] convention
Confidence scoring checklist includes blank test scenarios as a scoring trigger

Unit 2: ce:work and ce:work-beta — Testing deliberation and checklist update

Goal: Add per-task testing deliberation to the execution loop and update both checklist surfaces from "Tests pass" to "Testing addressed."

Requirements: R3, R4, R5

Dependencies: None

Files:

Modify: plugins/compound-engineering/skills/ce-work/SKILL.md
Modify: plugins/compound-engineering/skills/ce-work-beta/SKILL.md

Approach:

In the Phase 2 task execution loop (lines ~143-155 in ce:work, ~144-156 in ce:work-beta), add a new bullet between "Run tests after changes" and "Mark task as completed". The new bullet should prompt the agent to assess: did this task change behavior? If yes, were tests written or updated? If no tests were added, what is the justification? Keep it concise — 2-3 questions in one bullet, matching the existing loop bullet style. Do not expand into a multi-paragraph section
In the Quality Checklist (ce:work line ~433, ce:work-beta line ~506), replace - [ ] Tests pass (run project's test command) with - [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
In the Final Validation (ce:work line ~289, ce:work-beta line ~298), replace - All tests pass with - Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
Ensure both files receive identical changes

Sync decision: Propagating to beta — shared testing deliberation guidance, not experimental delegate-mode behavior.

Patterns to follow:

Existing execution loop bullet style at lines 138-155
Existing Quality Checklist item style (checkbox with parenthetical guidance)
The mandatory review pattern (which was also synced identically between stable and beta)

Test scenarios:

Happy path: ce:work execution loop includes the testing deliberation step in the correct position (after "Run tests" and before "Mark task as completed")
Happy path: Quality Checklist contains "Testing addressed" and does not contain "Tests pass (run project's test command)"
Happy path: Final Validation contains "Testing addressed" and does not contain "All tests pass"
Integration: ce:work-beta has identical testing deliberation and checklist wording as ce:work

Verification:

Both files contain the testing deliberation step in the execution loop
Both files' Quality Checklist and Final Validation use "Testing addressed" language
Old "Tests pass" and "All tests pass" language is fully removed from both files

Unit 3: testing-reviewer — Behavioral changes with no test additions check

Goal: Add a 5th check to the testing-reviewer agent that flags behavioral code changes in the diff with zero corresponding test additions or modifications.

Requirements: R6, R7

Dependencies: None

Files:

Modify: plugins/compound-engineering/agents/review/ce-testing-reviewer.agent.md

Approach:

Add a 5th bold-titled bullet in "What you're hunting for" (after the existing 4th check at line 20). The check should: describe the pattern (behavioral code changes — new logic branches, state mutations, API changes — with zero corresponding test file additions or modifications in the diff), explain what makes it distinct from check #1 (which looks at untested branches within code that has tests, while this flags when no tests exist at all), and note that non-behavioral changes (config, formatting, comments, type-only changes) are excluded
Consider adding a corresponding item in "What you don't flag" for non-behavioral changes if it adds clarity

Patterns to follow:

Existing check format: bold title followed by -- and explanation
Existing checks use specific, concrete language ("new if/else, switch, try/catch")
Confidence calibration tiers (High 0.80+ when provable from diff alone)

Test scenarios:

Happy path: testing-reviewer.md "What you're hunting for" section contains the behavioral-changes-with-no-tests check
Happy path: Check is described as distinct from existing untested-branches check

Verification:

testing-reviewer.md has 5 checks in "What you're hunting for" instead of 4
The new check specifically addresses "behavioral changes with no corresponding test additions"

Unit 4: Contract tests for all changes

Goal: Add contract tests that verify each skill/agent modification ships as intended, following the existing string-assertion pattern.

Requirements: R8

Dependencies: Units 1, 2, 3

Files:

Modify: tests/pipeline-review-contract.test.ts
Modify: tests/review-skill-contract.test.ts

Approach:

In pipeline-review-contract.test.ts, extend the existing ce:work review contract describe block with new tests:
- ce:work includes testing deliberation in execution loop
- ce:work Quality Checklist contains "Testing addressed" and does not contain "Tests pass (run project's test command)"
- ce:work Final Validation contains "Testing addressed" and does not contain "All tests pass"
- ce:work-beta mirrors all testing deliberation and checklist changes
In pipeline-review-contract.test.ts, extend or add a ce:plan review contract test:
- ce:plan Phase 5.1 review addresses blank test scenarios on feature-bearing units
In review-skill-contract.test.ts, add a new describe block for testing-reviewer:
- testing-reviewer includes the behavioral-changes-with-no-test-additions check

Use negative assertions (not.toContain) for the old checklist language to prevent regression.

Patterns to follow:

readRepoFile() helper + expect(content).toContain(...) / expect(content).not.toContain(...) in existing contract tests
ce:work-beta mirror test pattern at pipeline-review-contract.test.ts lines 39-50
describe/test block naming convention in both files

Test scenarios:

Happy path: All new contract tests pass after Units 1-3 are complete
Error path: Reverting any skill change causes the corresponding contract test to fail (verified by inspection of assertion specificity)

Verification:

bun test passes with the new contract tests
Each R3-R7 change surface has at least one contract test assertion

System-Wide Impact

Interaction graph: These are prompt-level skill edits. No callbacks, middleware, or runtime dependencies. The testing-reviewer is invoked by ce:review which is invoked by ce:work — the chain is: ce:work -> ce:review -> testing-reviewer. Changes to the reviewer's check list affect what ce:review surfaces but not how it surfaces it.
Error propagation: Not applicable — no runtime error paths. If the testing deliberation prompt is poorly worded, the worst case is the agent ignores it (same as today).
API surface parity: ce:work and ce:work-beta must remain in sync per AGENTS.md. Contract tests enforce this.
Unchanged invariants: The testing-reviewer's output format (JSON with findings, residual_risks, testing_gaps) is unchanged. The plan template's structure is unchanged — only the comment and Phase 5.1 checklist are modified.

Risks & Dependencies

Risk	Mitigation
Testing deliberation prompt is too verbose and gets ignored by the agent	Keep it concise — 2-3 questions, not a paragraph. Match the existing loop bullet style.
Old "Tests pass" language persists in one location, creating contradiction	Negative contract test assertions (`not.toContain`) catch any leftover old language
ce:work-beta drifts from ce:work	Contract tests explicitly assert both files contain identical testing changes

Sources & References

Origin document: docs/brainstorms/2026-03-29-testing-addressed-gate-requirements.md
Related learning: docs/solutions/skill-design/beta-promotion-orchestration-contract.md
Related learning: docs/solutions/skill-design/compound-refresh-skill-improvements.md (avoid contradictory rules across phases)
Related test: tests/pipeline-review-contract.test.ts
Related test: tests/review-skill-contract.test.ts

16 KiB Raw Blame History

feat: Close the testing gap in ce:work, ce:plan, and testing-reviewer

Overview

Problem Frame

Requirements Trace

Scope Boundaries

Context & Research

Relevant Code and Patterns

Institutional Learnings

Key Technical Decisions

Open Questions

Resolved During Planning

Deferred to Implementation

Implementation Units

System-Wide Impact

Risks & Dependencies

Sources & References

16 KiB

Raw Blame History