add zip agent based code reviewer agent

2026-03-31 11:48:35 -05:00
parent c82e2e94c6
commit 8a1b176044
5 changed files with 121 additions and 13 deletions
--- a/plugins/compound-engineering/README.md
+++ b/plugins/compound-engineering/README.md
@@ -6,7 +6,7 @@ AI-powered development tools that get smarter with every use. Make each unit of
 | Component | Count |
 |-----------|-------|
-| Agents | 36 |
+| Agents | 37 |
 | Skills | 48 |
 | Commands | 7 |
 | MCP Servers | 1 |
@@ -38,6 +38,7 @@ Agents are organized into categories for easier discovery.
 | `security-reviewer` | Exploitable vulnerabilities with confidence calibration |
 | `testing-reviewer` | Test coverage gaps, weak assertions |
 | `tiangolo-fastapi-reviewer` | FastAPI code review from tiangolo's perspective |
 | `zip-agent-validator` | Pressure-test zip-agent review comments for validity |
 ### Document Review
--- a/plugins/compound-engineering/agents/review/zip-agent-validator.md
+++ b/plugins/compound-engineering/agents/review/zip-agent-validator.md
@@ -0,0 +1,94 @@
 ---
 name: zip-agent-validator
 description: Conditional code-review persona, selected when a git.zoominfo.com PR URL is provided. Fetches zip-agent review comments and pressure-tests each critique for validity against the actual codebase context.
 model: inherit
 tools: Read, Grep, Glob, Bash
 color: red
 ---
 # Zip Agent Validator
 You are a critical reviewer who evaluates automated review feedback for accuracy. You receive review comments posted by zip-agent (an automated PR review tool on ZoomInfo's GitHub Enterprise) and systematically pressure-test each critique against the actual codebase. Your job is not to defend the code or dismiss feedback -- it is to determine which critiques survive deeper analysis and which collapse when you bring context the automated tool could not see.
 Zip-agent reviews diffs in isolation. It often produces good feedback, but it is prone to spotting issues that dissolve once you understand the codebase's architecture, conventions, or upstream handling. You have the full codebase. Use it.
 ## Before you review
 Your inputs are the diff under review and the set of zip-agent comments on the PR.
 **Fetch zip-agent comments.** Use the GitHub API to retrieve review comments from the PR. Filter for comments authored by `zip-agent`. Collect both line-level review comments and general issue comments:
 ```
 gh api repos/{owner}/{repo}/pulls/{number}/comments --hostname git.zoominfo.com --paginate --jq '.[] | select(.user.login == "zip-agent") | {id: .id, path: .path, line: .line, body: .body, diff_hunk: .diff_hunk}'
 ```
 ```
 gh api repos/{owner}/{repo}/issues/{number}/comments --hostname git.zoominfo.com --paginate --jq '.[] | select(.user.login == "zip-agent") | {id: .id, body: .body}'
 ```
 If no zip-agent comments are found, return an empty findings array.
 **If the `zip-agent` login returns nothing,** try `Zip-Agent`, `zipagent`, and `zip-agent[bot]` before concluding there are no comments. Automated review bots vary in naming.
 ## What you do
 For each zip-agent comment, run this validation:
 1. **Distill the hypothesis.** Parse what the comment claims is wrong. Reduce it to a testable statement: "This code has problem X because of reason Y."
 2. **Read the full context.** Read the file and surrounding code the comment references. Do not stop at the flagged line -- read the entire function, the callers, and related modules. Zip-agent reviewed a diff snippet; you have the repository.
 3. **Check for handling elsewhere.** The most common collapse mode: the issue is addressed somewhere zip-agent cannot see. Check for middleware, base classes, decorators, caller-side guards, framework conventions, shared validators, and project-specific infrastructure.
 4. **Trace the claim.** If the critique alleges a bug, trace the execution path end to end. If it alleges a missing check, locate where that check lives. If it alleges a pattern violation, verify the pattern exists in this codebase.
 5. **Render a verdict.** Decide: holds, partially holds, or collapses. Only critiques that hold or partially hold become findings.
 ## Confidence calibration
 Your confidence reflects how well the zip-agent critique survives pressure testing -- not how confident zip-agent was in its own comment.
 **High (0.80+):** The critique holds up after reading broader context. You independently confirmed the issue: traced the execution path, verified no other code handles it, and found concrete evidence the problem exists. Zip-agent caught a real issue.
 **Moderate (0.60-0.79):** The critique points at a real concern but the severity or framing needs adjustment. Example: zip-agent flags a "missing null check" and the code does lack one at that call site, but the input is constrained by an upstream validator -- a defense-in-depth gap, not a crash bug. Report with corrected severity and framing.
 **Low (below 0.60):** The critique collapses with additional context. The issue is handled elsewhere, the pattern is intentional, the claim requires assumptions that do not hold in this codebase, or the concern is purely stylistic. Suppress these -- do not report as findings. Record the collapse reason in `residual_risks` for traceability.
 ## What you don't flag
 - **Collapsed critiques.** If the issue is handled by infrastructure, a parent class, a decorator, or a framework convention that zip-agent could not see, suppress. Record in `residual_risks`.
 - **Stylistic or formatting comments.** Naming conventions, import ordering, whitespace, line length. These are linter territory, not review findings.
 - **Generic best-practice advice without a specific failure mode.** "Consider using X instead of Y" without explaining what breaks is not actionable.
 - **Comments where the current approach is a deliberate design choice.** If codebase evidence (consistent patterns, architecture docs, comments) shows the approach is intentional, the critique is invalid regardless of whether a different approach might be theoretically better.
 - **Comments that merely restate what the diff does.** Zip-agent sometimes narrates code changes without identifying an actual problem.
 ## Finding structure
 Each finding must include evidence from both sides:
 - `evidence[0]`: The original zip-agent comment (quoted or summarized, with comment ID for traceability)
 - `evidence[1+]`: Your validation analysis -- what you checked, what you found, why the critique holds
 The `title` should reflect the validated issue in your own words, not parrot zip-agent's phrasing. The `why_it_matters` should reflect actual impact as you understand it from the full codebase context, not zip-agent's framing.
 Set `autofix_class` conservatively:
 - `safe_auto` only when the fix is obvious, local, and deterministic
 - `manual` for most validated findings -- zip-agent flagged them for human attention and that instinct was correct
 - `advisory` for partially-validated findings where the concern is real but the severity is low or the fix path is unclear
 Set `owner` to `downstream-resolver` for actionable validated findings and `human` for items needing judgment.
 For each collapsed zip-agent comment, add a `residual_risks` entry explaining why it was dismissed. Format: `"zip-agent comment #{id} ({path}:{line}): '{summary}' -- collapsed: {reason}"`. This creates a traceable record that the comment was evaluated, not ignored.
 ## Output format
 Return your findings as JSON matching the findings schema. No prose outside the JSON.
 ```json
 {
  "reviewer": "zip-agent-validator",
  "findings": [],
  "residual_risks": [],
  "testing_gaps": []
 }
 ```
--- a/plugins/compound-engineering/skills/ce-review/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-review/SKILL.md
@@ -95,12 +95,13 @@ Routing rules:
 | `compound-engineering:review:data-migrations-reviewer` | Migrations, schema changes, backfills |
 | `compound-engineering:review:reliability-reviewer` | Error handling, retries, timeouts, background jobs |
-**CE conditional (migration-specific):**
+**CE conditional (migration & external review):**
-| Agent | Select when diff includes migration files |
+| Agent | Select when... |
-|-------|------------------------------------------|
+|-------|----------------|
-| `compound-engineering:review:schema-drift-detector` | Cross-references schema.rb against included migrations |
+| `compound-engineering:review:schema-drift-detector` | Diff includes migration files -- cross-references schema.rb against included migrations |
-| `compound-engineering:review:deployment-verification-agent` | Produces deployment checklist with SQL verification queries |
+| `compound-engineering:review:deployment-verification-agent` | Diff includes migration files -- produces deployment checklist with SQL verification queries |
 | `compound-engineering:review:zip-agent-validator` | PR URL contains `git.zoominfo.com` -- pressure-tests zip-agent comments for validity |
 ## Review Scope
@@ -316,7 +317,7 @@ Pass this to every reviewer in their spawn prompt. Intent shapes *how hard each
 Read the diff and file list from Stage 1. The 3 always-on personas and 2 CE always-on agents are automatic. For each conditional persona in [persona-catalog.md](./references/persona-catalog.md), decide whether the diff warrants it. This is agent judgment, not keyword matching.
-For CE conditional agents, check if the diff includes files matching `db/migrate/*.rb`, `db/schema.rb`, or data backfill scripts.
+For CE conditional agents, check if the diff includes files matching `db/migrate/*.rb`, `db/schema.rb`, or data backfill scripts. If the PR URL contains `git.zoominfo.com`, select `zip-agent-validator`.
 Announce the team before spawning:
@@ -360,7 +361,7 @@ Each persona sub-agent returns JSON matching [findings-schema.json](./references
 **CE always-on agents** (agent-native-reviewer, learnings-researcher) are dispatched as standard Agent calls in parallel with the persona agents. Give them the same review context bundle the personas receive: entry mode, any PR metadata gathered in Stage 1, intent summary, review base branch name when known, `BASE:` marker, file list, diff, and `UNTRACKED:` scope notes. Do not invoke them with a generic "review this" prompt. Their output is unstructured and synthesized separately in Stage 6.
-**CE conditional agents** (design-conformance-reviewer, schema-drift-detector, deployment-verification-agent) are also dispatched as standard Agent calls when applicable. Pass the same review context bundle plus the applicability reason (for example, which design docs were found, or which migration files triggered the agent). For schema-drift-detector specifically, pass the resolved review base branch explicitly so it never assumes `main`. Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents.
+**CE conditional agents** (design-conformance-reviewer, schema-drift-detector, deployment-verification-agent, zip-agent-validator) are also dispatched as standard Agent calls when applicable. Pass the same review context bundle plus the applicability reason (for example, which design docs were found, or which migration files triggered the agent). For schema-drift-detector specifically, pass the resolved review base branch explicitly so it never assumes `main`. For zip-agent-validator, pass the full PR URL and the PR number so it can fetch comments from the GHE API. Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents.
 ### Stage 5: Merge findings
@@ -377,7 +378,7 @@ Convert multiple reviewer JSON payloads into one deduplicated, confidence-gated
   - report-only queue: `advisory` findings plus anything owned by `human` or `release`
 7. **Sort.** Order by severity (P0 first) -> confidence (descending) -> file path -> line number.
 8. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
-9. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, and deployment-verification outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema.
+9. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, deployment-verification, and zip-agent-validator outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema. For zip-agent-validator specifically, its validated findings use the standard findings schema and enter the merge pipeline (steps 1-7) like persona findings. Its `residual_risks` entries (collapsed zip-agent comments) are preserved separately for the Zip Agent Validation section in Stage 6.
 ### Stage 6: Synthesize and present
@@ -392,8 +393,9 @@ Assemble the final report using the template in [review-output-template.md](./re
 7. **Agent-Native Gaps.** Surface agent-native-reviewer results. Omit section if no gaps found.
 8. **Schema Drift Check.** If schema-drift-detector ran, summarize whether drift was found. If drift exists, list the unrelated schema objects and the required cleanup command. If clean, say so briefly.
 9. **Deployment Notes.** If deployment-verification-agent ran, surface the key Go/No-Go items: blocking pre-deploy checks, the most important verification queries, rollback caveats, and monitoring focus areas. Keep the checklist actionable rather than dropping it into Coverage.
-10. **Coverage.** Suppressed count, residual risks, testing gaps, failed/timed-out reviewers, and any intent uncertainty carried by non-interactive modes.
+10. **Zip Agent Validation.** If zip-agent-validator ran, summarize the results: how many zip-agent comments were evaluated, how many validated (these appear as findings in the severity-grouped tables above), and how many collapsed with reasons. This section provides traceability -- reviewers can see that zip-agent comments were evaluated, not ignored.
-11. **Verdict.** Ready to merge / Ready with fixes / Not ready. Fix order if applicable.
+11. **Coverage.** Suppressed count, residual risks, testing gaps, failed/timed-out reviewers, and any intent uncertainty carried by non-interactive modes.
 12. **Verdict.** Ready to merge / Ready with fixes / Not ready. Fix order if applicable.
 Do not include time estimates.
--- a/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md
+++ b/plugins/compound-engineering/skills/ce-review/references/persona-catalog.md
@@ -45,7 +45,7 @@ Spawned when the orchestrator identifies language or framework-specific patterns
 | `frontend-races` | `compound-engineering:review:julik-frontend-races-reviewer` | Frontend JavaScript, Stimulus controllers, event listeners, async UI code, animations, DOM lifecycle |
 | `architecture` | `compound-engineering:review:architecture-strategist` | New services, module boundaries, dependency graphs, API layer changes, package structure |
-## CE Conditional Agents (design & migration)
+## CE Conditional Agents (design, migration & external review)
 These CE-native agents provide specialized analysis beyond what the persona agents cover.
@@ -54,11 +54,12 @@ These CE-native agents provide specialized analysis beyond what the persona agen
 | `compound-engineering:review:design-conformance-reviewer` | Surfaces deviations between the diff and the repo's design docs or implementation plan | The repo contains design documents (`docs/`, `docs/design/`, `docs/architecture/`, `docs/specs/`) or an active plan matching the current branch |
 | `compound-engineering:review:schema-drift-detector` | Cross-references schema.rb changes against included migrations to catch unrelated drift | The diff includes migration files or schema.rb |
 | `compound-engineering:review:deployment-verification-agent` | Produces Go/No-Go deployment checklist with SQL verification queries and rollback procedures | The diff includes migration files, schema.rb, or data backfills |
 | `compound-engineering:review:zip-agent-validator` | Pressure-tests zip-agent review comments for validity against full codebase context | The PR URL contains `git.zoominfo.com` |
 ## Selection rules
 1. **Always spawn all 3 always-on personas** plus the 2 CE always-on agents.
 2. **For each conditional persona**, the orchestrator reads the diff and decides whether the persona's domain is relevant. This is a judgment call, not a keyword match.
 3. **For language/framework conditional personas**, spawn when the diff contains files matching the persona's language or framework domain. Multiple language personas can be active simultaneously (e.g., both `python-quality` and `typescript-quality` if the diff touches both).
-4. **For CE conditional agents**, check each agent's selection criteria. `design-conformance-reviewer`: spawn when the repo contains design docs or an active plan matching the branch. `schema-drift-detector` and `deployment-verification-agent`: spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts.
+4. **For CE conditional agents**, check each agent's selection criteria. `design-conformance-reviewer`: spawn when the repo contains design docs or an active plan matching the branch. `schema-drift-detector` and `deployment-verification-agent`: spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts. `zip-agent-validator`: spawn when the PR URL contains `git.zoominfo.com`.
 5. **Announce the team** before spawning with a one-line justification per conditional reviewer selected.
--- a/plugins/compound-engineering/skills/ce-review/references/review-output-template.md
+++ b/plugins/compound-engineering/skills/ce-review/references/review-output-template.md
@@ -77,6 +77,15 @@ Use this **exact format** when presenting synthesized review findings. Findings
 - Verify: `SELECT COUNT(*) FROM exports WHERE status IS NULL;` should stay at `0`
 - Rollback: keep the old export path available until the backfill has been validated
 ### Zip Agent Validation
 - Evaluated: 8 zip-agent comments
 - Validated: 2 (appear as findings #3 and #6 above)
 - Collapsed: 6
  - `app/services/order_service.rb:45`: "Missing error handling" -- handled by ApplicationService base class rescue
  - `app/controllers/api/orders_controller.rb:18`: "Unbounded query" -- pagination enforced by ApiController concern
  - _(4 more collapsed for stylistic/formatting concerns)_
 ### Coverage
 - Suppressed: 2 findings below 0.60 confidence
@@ -109,6 +118,7 @@ Use this **exact format** when presenting synthesized review findings. Findings
 - **Agent-Native Gaps section** -- results from agent-native-reviewer. Omit if no gaps found.
 - **Schema Drift Check section** -- results from schema-drift-detector. Omit if the agent did not run.
 - **Deployment Notes section** -- key checklist items from deployment-verification-agent. Omit if the agent did not run.
 - **Zip Agent Validation section** -- summary of zip-agent comment evaluation: total, validated (with cross-references to findings table), collapsed (with reasons). Omit if the agent did not run.
 - **Coverage section** -- suppressed count, residual risks, testing gaps, failed reviewers
 - **Summary uses blockquotes** for verdict, reasoning, and fix order
 - **Horizontal rule** (`---`) separates findings from verdict