Files

John Lamb eb96e32c58 Merge upstream v2.40.0 with local fork additions preserved

Incorporates 163 upstream commits (origin/main) while preserving all
local skills, agents, and commands. Metadata descriptions updated to
reflect actual component counts (30 agents, 56 skills, 7 commands).
file-todos/SKILL.md merged with both upstream command rename and local
assessment fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-16 10:45:33 -05:00

22 KiB

Raw Blame History

name, description, argument-hint

name	description	argument-hint
workflows:review	Perform exhaustive code reviews using multi-agent analysis, ultra-thinking, and worktrees	[PR number, GitHub URL, branch name, or latest]

Review Command

<command_purpose> Perform exhaustive code reviews using multi-agent analysis, ultra-thinking, and Git worktrees for deep local inspection. </command_purpose>

Introduction

Senior Code Review Architect with expertise in security, performance, architecture, and quality assurance

Prerequisites

- Git repository with GitHub CLI (`gh`) installed and authenticated - Clean main/master branch - Proper permissions to create worktrees and access the repository - For document reviews: Path to a markdown file or document

Main Tasks

1. Determine Review Target & Setup (ALWAYS FIRST)

<review_target> #$ARGUMENTS </review_target>

First, I need to determine the review target type and set up the code for analysis.

Immediate Actions:

<task_list>

Determine review type: PR number (numeric), GitHub URL, file path (.md), or empty (current branch)
Check current git branch
If ALREADY on the target branch (PR branch, requested branch name, or the branch already checked out for review) → proceed with analysis on current branch
If DIFFERENT branch than the review target → offer to use worktree: "Use git-worktree skill for isolated Call skill: git-worktree with branch name
Fetch PR metadata using gh pr view --json for title, body, files, linked issues
Set up language-specific analysis tools
Prepare security scanning environment
Make sure we are on the branch we are reviewing. Use gh pr checkout to switch to the branch or manually checkout the branch.

Ensure that the code is ready for analysis (either in worktree or on current branch). ONLY then proceed to the next step.

</task_list>

Protected Artifacts

<protected_artifacts> The following paths are compound-engineering pipeline artifacts and must never be flagged for deletion, removal, or gitignore by any review agent:

docs/plans/*.md — Plan files created by /workflows:plan. These are living documents that track implementation progress (checkboxes are checked off by /workflows:work).
docs/solutions/*.md — Solution documents created during the pipeline.

If a review agent flags any file in these directories for cleanup or removal, discard that finding during synthesis. Do not create a todo for it. </protected_artifacts>

Load Review Agents

Read compound-engineering.local.md in the project root. If found, use review_agents from YAML frontmatter. If the markdown body contains review context, pass it to each agent as additional instructions.

If no settings file exists, invoke the setup skill to create one. Then read the newly created file and continue.

Parallel Agents to review the PR:

<parallel_tasks>

Run all configured review agents in parallel using Task tool. For each agent in the review_agents list:

Task {agent-name}(PR content + review context from settings body)

Additionally, always run these regardless of settings:

Task agent-native-reviewer(PR content) - Verify new features are agent-accessible
Task learnings-researcher(PR content) - Search docs/solutions/ for past issues related to this PR's modules and patterns

</parallel_tasks>

Conditional Agents (Run if applicable):

<conditional_agents>

These agents are run ONLY when the PR matches specific criteria. Check the PR files list to determine if they apply:

MIGRATIONS: If PR contains database migrations, schema.rb, or data backfills:

Task schema-drift-detector(PR content) - Detects unrelated schema.rb changes by cross-referencing against included migrations (run FIRST)
Task data-migration-expert(PR content) - Validates ID mappings match production, checks for swapped values, verifies rollback safety
Task deployment-verification-agent(PR content) - Creates Go/No-Go deployment checklist with SQL verification queries

When to run:

PR includes files matching db/migrate/*.rb or db/schema.rb
PR modifies columns that store IDs, enums, or mappings
PR includes data backfill scripts or rake tasks
PR title/body mentions: migration, backfill, data transformation, ID mapping

What these agents check:

schema-drift-detector: Cross-references schema.rb changes against PR migrations to catch unrelated columns/indexes from local database state
data-migration-expert: Verifies hard-coded mappings match production reality (prevents swapped IDs), checks for orphaned associations, validates dual-write patterns
deployment-verification-agent: Produces executable pre/post-deploy checklists with SQL queries, rollback procedures, and monitoring plans

</conditional_agents>

4. Ultra-Thinking Deep Dive Phases

<ultrathink_instruction> For each phase below, spend maximum cognitive effort. Think step by step. Consider all angles. Question assumptions. And bring all reviews in a synthesis to the user.</ultrathink_instruction>

Complete system context map with component interactions

Phase 3: Stakeholder Perspective Analysis

<thinking_prompt> ULTRA-THINK: Put yourself in each stakeholder's shoes. What matters to them? What are their pain points? </thinking_prompt>

<stakeholder_perspectives>

Developer Perspective
- How easy is this to understand and modify?
- Are the APIs intuitive?
- Is debugging straightforward?
- Can I test this easily?
Operations Perspective
- How do I deploy this safely?
- What metrics and logs are available?
- How do I troubleshoot issues?
- What are the resource requirements?
End User Perspective
- Is the feature intuitive?
- Are error messages helpful?
- Is performance acceptable?
- Does it solve my problem?
Security Team Perspective
- What's the attack surface?
- Are there compliance requirements?
- How is data protected?
- What are the audit capabilities?
Business Perspective
- What's the ROI?
- Are there legal/compliance risks?
- How does this affect time-to-market?
- What's the total cost of ownership? </stakeholder_perspectives>

Phase 4: Scenario Exploration

<thinking_prompt> ULTRA-THINK: Explore edge cases and failure scenarios. What could go wrong? How does the system behave under stress? </thinking_prompt>

<scenario_checklist>

Happy Path: Normal operation with valid inputs
Invalid Inputs: Null, empty, malformed data
Boundary Conditions: Min/max values, empty collections
Concurrent Access: Race conditions, deadlocks
Scale Testing: 10x, 100x, 1000x normal load
Network Issues: Timeouts, partial failures
Resource Exhaustion: Memory, disk, connections
Security Attacks: Injection, overflow, DoS
Data Corruption: Partial writes, inconsistency
Cascading Failures: Downstream service issues </scenario_checklist>

6. Multi-Angle Review Perspectives

Technical Excellence Angle

Code craftsmanship evaluation
Engineering best practices
Technical documentation quality
Tooling and automation assessment
Naming accuracy (see Naming Scrutiny below)

Naming Scrutiny (REQUIRED)

Every name introduced or modified in the PR must pass these checks:

#	Check	Question
1	Caller's perspective	Does the name describe what it does, not how?
2	No false qualifiers	Does every `_with_X` / `_and_X` reflect a real choice?
3	Visibility matches intent	Are private helpers actually private?
4	Consistent convention	Does the pattern match every other instance in the codebase?
5	Precise, not vague	Could this name apply to ten different things? (`data`, `manager`, `handler` = red flags)
6	Complete words	No ambiguous abbreviations? (`auth` = authentication or authorization?)
7	Correct part of speech	Functions = verbs, classes = nouns, booleans = assertions?

Common anti-patterns to flag:

False optionality: save_with_validation() when validation is mandatory
Leaked implementation: create_batch_with_items() when callers just need create_batch()
Type encoding: word_string, new_hash instead of domain terms
Structural naming: input, output, result instead of what they contain
Doppelgangers: names differing by one letter (useProfileQuery vs useProfilesQuery)

Include naming findings in the synthesized review. Flag as P2 (Important) unless the name is actively misleading about behavior (P1).

Business Value Angle

Feature completeness validation
Performance impact on users
Cost-benefit analysis
Time-to-market considerations

Risk Management Angle

Security risk assessment
Operational risk evaluation
Compliance risk verification
Technical debt accumulation

Team Dynamics Angle

Code review etiquette
Knowledge sharing effectiveness
Collaboration patterns
Mentoring opportunities

4. Simplification and Minimalism Review

Run the Task code-simplicity-reviewer() to see if we can simplify the code.

5. Findings Synthesis and Todo Creation Using file-todos Skill

<critical_requirement> ALL findings MUST be stored in the todos/ directory using the file-todos skill. Create todo files immediately after synthesis - do NOT present findings for user approval first. Use the skill for structured todo management. </critical_requirement>

Step 1: Synthesize All Findings

Consolidate all agent reports into a categorized list of findings. Remove duplicates, prioritize by severity and impact.

<synthesis_tasks>

Collect findings from all parallel agents
Surface learnings-researcher results: if past solutions are relevant, flag them as "Known Pattern" with links to docs/solutions/ files
Discard any findings that recommend deleting or gitignoring files in docs/plans/ or docs/solutions/ (see Protected Artifacts above)
Categorize by type: security, performance, architecture, quality, etc.
Assign severity levels: 🔴 CRITICAL (P1), 🟡 IMPORTANT (P2), 🔵 NICE-TO-HAVE (P3)
Remove duplicate or overlapping findings
Estimate effort for each finding (Small/Medium/Large)

</synthesis_tasks>

Step 2: Pressure Test Each Finding

<critical_evaluation>

IMPORTANT: Treat agent findings as suggestions, not mandates.

Not all findings are equally valid. Apply engineering judgment before creating todos. The goal is to make the right call for the codebase, not rubber-stamp every suggestion.

For each finding, verify:

Check	Question
Code	Does the concern actually apply to this specific code?
Tests	Are there existing tests that already cover this case?
Usage	How is this code used in practice? Does the concern matter?
Compatibility	Would the suggested change break anything?
Prior Decisions	Was this intentional? Is there a documented reason?
Cost vs Benefit	Is the fix worth the effort and risk?

Assess each finding:

Assessment	Meaning
Clear & Correct	Valid concern, well-reasoned, applies here
Unclear	Ambiguous or missing context
Likely Incorrect	Agent misunderstands code, context, or requirements
YAGNI	Over-engineering, premature abstraction, no clear benefit
Duplicate	Already covered by another finding (merge into existing)

IMPORTANT: ALL findings become todos. Never drop agent feedback - include the pressure test assessment IN each todo so /triage can use it.

Each todo will include:

The assessment (Clear & Correct / Unclear / Likely Incorrect / YAGNI)
The verification results (what was checked)
Technical justification (why valid, or why you think it should be skipped)
Recommended action for triage (Fix now / Clarify / Push back / Skip)

Provide technical justification for all assessments:

Don't just label - explain WHY with specific reasoning
Reference codebase constraints, requirements, or trade-offs
Example: "This abstraction would be YAGNI - we only have one implementation and no plans for variants. Adding it now increases complexity without clear benefit."

The human reviews during /triage and makes the final call.

</critical_evaluation>

Step 3: Create Todo Files Using file-todos Skill

<critical_instruction> Use the file-todos skill to create todo files for ALL findings immediately. Do NOT present findings one-by-one asking for user approval. Create all todo files in parallel using the skill, then summarize results to user. </critical_instruction>

Implementation Options:

Option A: Direct File Creation (Fast)

Create todo files directly using Write tool
All findings in parallel for speed
Invoke Skill: "compound-engineering:file-todos" and read the template from its assets directory
Follow naming convention: {issue_id}-pending-{priority}-{description}.md

Option B: Sub-Agents in Parallel (Recommended for Scale) For large PRs with 15+ findings, use sub-agents to create finding files in parallel:

# Launch multiple finding-creator agents in parallel
Task() - Create todos for first finding
Task() - Create todos for second finding
Task() - Create todos for third finding
etc. for each finding.

Sub-agents can:

Process multiple findings simultaneously
Write detailed todo files with all sections filled
Organize findings by severity
Create comprehensive Proposed Solutions
Add acceptance criteria and work logs
Complete much faster than sequential processing

Execution Strategy:

Synthesize all findings into categories (P1/P2/P3)
Group findings by severity
Launch 3 parallel sub-agents (one per severity level)
Each sub-agent creates its batch of todos using the file-todos skill
Consolidate results and present summary

Process (Using file-todos Skill):

For each finding:
- Determine severity (P1/P2/P3)
- Write detailed Problem Statement and Findings
- Create 2-3 Proposed Solutions with pros/cons/effort/risk
- Estimate effort (Small/Medium/Large)
- Add acceptance criteria and work log
Use file-todos skill for structured todo management:
```
Skill: "compound-engineering:file-todos"
```
The skill provides:
- Template at ./assets/todo-template.md (relative to skill directory)
- Naming convention: {issue_id}-{status}-{priority}-{description}.md
- YAML frontmatter structure: status, priority, issue_id, tags, dependencies
- All required sections: Problem Statement, Findings, Solutions, etc.

Create todo files in parallel:

{next_id}-pending-{priority}-{description}.md

Examples:

001-pending-p1-path-traversal-vulnerability.md
002-pending-p1-api-response-validation.md
003-pending-p2-concurrency-limit.md
004-pending-p3-unused-parameter.md

Follow template structure from file-todos skill (read ./assets/todo-template.md from skill directory)

Todo File Structure (from template):

Each todo must include:

YAML frontmatter: status, priority, issue_id, tags, dependencies
Problem Statement: What's broken/missing, why it matters
Assessment (Pressure Test): Verification results and engineering judgment
- Assessment: Clear & Correct / Unclear / YAGNI
- Verified: Code, Tests, Usage, Prior Decisions
- Technical Justification: Why this finding is valid (or why skipped)
Findings: Discoveries from agents with evidence/location
Proposed Solutions: 2-3 options, each with pros/cons/effort/risk
Recommended Action: (Filled during triage, leave blank initially)
Technical Details: Affected files, components, database changes
Acceptance Criteria: Testable checklist items
Work Log: Dated record with actions and learnings
Resources: Links to PR, issues, documentation, similar patterns

File naming convention:

{issue_id}-{status}-{priority}-{description}.md

Examples:
- 001-pending-p1-security-vulnerability.md
- 002-pending-p2-performance-optimization.md
- 003-pending-p3-code-cleanup.md

Status values:

pending - New findings, needs triage/decision
ready - Approved by manager, ready to work
complete - Work finished

Priority values:

p1 - Critical (blocks merge, security/data issues)
p2 - Important (should fix, architectural/performance)
p3 - Nice-to-have (enhancements, cleanup)

Tagging: Always add code-review tag, plus: security, performance, architecture, rails, quality, etc.

Step 4: Summary Report

After creating all todo files, present comprehensive summary:

## ✅ Code Review Complete

**Review Target:** PR #XXXX - [PR Title] **Branch:** [branch-name]

### Findings Summary:

- **Total Findings:** [X]
- **🔴 CRITICAL (P1):** [count] - BLOCKS MERGE
- **🟡 IMPORTANT (P2):** [count] - Should Fix
- **🔵 NICE-TO-HAVE (P3):** [count] - Enhancements

### Created Todo Files:

**P1 - Critical (BLOCKS MERGE):**

- `001-pending-p1-{finding}.md` - {description}
- `002-pending-p1-{finding}.md` - {description}

**P2 - Important:**

- `003-pending-p2-{finding}.md` - {description}
- `004-pending-p2-{finding}.md` - {description}

**P3 - Nice-to-Have:**

- `005-pending-p3-{finding}.md` - {description}

### Review Agents Used:

- kieran-python-reviewer
- security-sentinel
- performance-oracle
- architecture-strategist
- agent-native-reviewer
- [other agents]

### Assessment Summary (Pressure Test Results):

All agent findings were pressure tested and included in todos:

| Assessment | Count | Description |
|------------|-------|-------------|
| **Clear & Correct** | {X} | Valid concerns, recommend fixing |
| **Unclear** | {X} | Need clarification before implementing |
| **Likely Incorrect** | {X} | May misunderstand context - review during triage |
| **YAGNI** | {X} | May be over-engineering - review during triage |
| **Duplicate** | {X} | Merged into other findings |

**Note:** All assessments are included in the todo files. Human judgment during `/triage` makes the final call on whether to accept, clarify, or reject each item.

### Next Steps:

1. **Address P1 Findings**: CRITICAL - must be fixed before merge

   - Review each P1 todo in detail
   - Implement fixes or request exemption
   - Verify fixes before merging PR

2. **Triage All Todos**:
   ```bash
   ls todos/*-pending-*.md  # View all pending todos
   /triage                  # Use slash command for interactive triage
   ```

Work on Approved Todos:

/resolve_todo_parallel  # Fix all approved items efficiently

Track Progress:
- Rename file when status changes: pending → ready → complete
- Update Work Log as you work
- Commit todos: git add todos/ && git commit -m "refactor: add code review findings"

Severity Breakdown:

🔴 P1 (Critical - Blocks Merge):

Security vulnerabilities
Data corruption risks
Breaking changes
Critical architectural issues

🟡 P2 (Important - Should Fix):

Performance issues
Significant architectural concerns
Major code quality problems
Reliability issues

🔵 P3 (Nice-to-Have):

Minor improvements
Code cleanup
Optimization opportunities
Documentation updates


### 7. End-to-End Testing (Optional)

<detect_project_type>

**First, detect the project type from PR files:**

| Indicator | Project Type |
|-----------|--------------|
| `*.xcodeproj`, `*.xcworkspace`, `Package.swift` (iOS) | iOS/macOS |
| `Gemfile`, `package.json`, `app/views/*`, `*.html.*` | Web |
| Both iOS files AND web files | Hybrid (test both) |

</detect_project_type>

<offer_testing>

After presenting the Summary Report, offer appropriate testing based on project type:

**For Web Projects:**
```markdown
**"Want to run browser tests on the affected pages?"**
1. Yes - run `/test-browser`
2. No - skip

For iOS Projects:

**"Want to run Xcode simulator tests on the app?"**
1. Yes - run `/xcode-test`
2. No - skip

For Hybrid Projects (e.g., Rails + Hotwire Native):

**"Want to run end-to-end tests?"**
1. Web only - run `/test-browser`
2. iOS only - run `/xcode-test`
3. Both - run both commands
4. No - skip

</offer_testing>

If User Accepts Web Testing:

Spawn a subagent to run browser tests (preserves main context):

Task general-purpose("Run /test-browser for PR #[number]. Test all affected pages, check for console errors, handle failures by creating todos and fixing.")

The subagent will:

Identify pages affected by the PR
Navigate to each page and capture snapshots (using Playwright MCP or agent-browser CLI)
Check for console errors
Test critical interactions
Pause for human verification on OAuth/email/payment flows
Create P1 todos for any failures
Fix and retry until all tests pass

Standalone: /test-browser [PR number]

If User Accepts iOS Testing:

Spawn a subagent to run Xcode tests (preserves main context):

Task general-purpose("Run /xcode-test for scheme [name]. Build for simulator, install, launch, take screenshots, check for crashes.")