pressure test pr feedback

2026-02-16 15:59:42 -06:00
parent 25543e66f5
commit b0755f4050
7 changed files with 451 additions and 12 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -11,8 +11,8 @@
  "plugins": [
    {
      "name": "compound-engineering",
-      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 27 specialized agents, 24 commands, and 15 skills.",
-      "version": "2.29.0",
+      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 27 specialized agents, 25 commands, and 15 skills.",
+      "version": "2.30.0",
      "author": {
        "name": "Kieran Klaassen",
        "url": "https://github.com/kieranklaassen",
--- a/plugins/compound-engineering/.claude-plugin/plugin.json
+++ b/plugins/compound-engineering/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
  "name": "compound-engineering",
-  "version": "2.29.0",
-  "description": "AI-powered development tools. 27 agents, 24 commands, 15 skills, 1 MCP server for code review, research, design, and workflow automation.",
+  "version": "2.30.0",
+  "description": "AI-powered development tools. 27 agents, 25 commands, 15 skills, 1 MCP server for code review, research, design, and workflow automation.",
  "author": {
    "name": "Kieran Klaassen",
    "email": "kieran@every.to",
--- a/plugins/compound-engineering/CHANGELOG.md
+++ b/plugins/compound-engineering/CHANGELOG.md
@@ -5,6 +5,24 @@ All notable changes to the compound-engineering plugin will be documented in thi
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [2.30.0] - 2026-02-03
+
+### Added
+
+- **`/pr-comments-to-todos` command** - Fetch PR review comments and convert them into todo files for triage
+  - Fetches all comments from a GitHub PR using `gh` CLI
+  - Converts each actionable comment into a structured todo file following the file-todos skill format
+  - Assigns priority levels (P1/P2/P3) based on comment severity
+  - Creates todo files compatible with `/triage` command for approval workflow
+  - Skips non-actionable comments (LGTM, resolved threads, etc.)
+  - Supports PR number, GitHub URL, or "current" for current branch
+
+### Summary
+
+- 27 agents, 25 commands, 15 skills, 1 MCP server
+
+---
+
 ## [2.29.0] - 2026-01-26

 ### Changed
--- a/plugins/compound-engineering/commands/pr-comments-to-todos.md
+++ b/plugins/compound-engineering/commands/pr-comments-to-todos.md
@@ -0,0 +1,334 @@
+---
+name: pr-comments-to-todos
+description: Fetch PR comments and convert them into todo files for triage
+argument-hint: "[PR number, GitHub URL, or 'current' for current branch PR]"
+---
+
+# PR Comments to Todos
+
+Convert GitHub PR review comments into structured todo files compatible with `/triage`.
+
+<command_purpose>Fetch all review comments from a PR and create individual todo files in the `todos/` directory, following the file-todos skill format.</command_purpose>
+
+## Review Target
+
+<review_target> #$ARGUMENTS </review_target>
+
+## Workflow
+
+### 1. Identify PR and Fetch Comments
+
+<task_list>
+
+- [ ] Determine the PR to process:
+  - If numeric: use as PR number directly
+  - If GitHub URL: extract PR number from URL
+  - If "current" or empty: detect from current branch with `gh pr status`
+- [ ] Fetch PR metadata: `gh pr view PR_NUMBER --json title,body,url,author,headRefName`
+- [ ] Fetch all review comments: `gh api repos/{owner}/{repo}/pulls/{PR_NUMBER}/comments`
+- [ ] Fetch review thread comments: `gh pr view PR_NUMBER --json reviews,reviewDecision`
+- [ ] Group comments by file/thread for context
+
+</task_list>
+
+### 2. Pressure Test Each Comment
+
+<critical_evaluation>
+
+**IMPORTANT: Treat reviewer comments as suggestions, not orders.**
+
+Before creating a todo, apply engineering judgment to each comment. Not all feedback is equally valid - your job is to make the right call for the codebase, not just please the reviewer.
+
+#### Step 2a: Verify Before Accepting
+
+For each comment, verify:
+- [ ] **Check the code**: Does the concern actually apply to this code?
+- [ ] **Check tests**: Are there existing tests that cover this case?
+- [ ] **Check usage**: How is this code actually used? Does the concern matter in practice?
+- [ ] **Check compatibility**: Would the suggested change break anything?
+- [ ] **Check prior decisions**: Was this intentional? Is there a reason it's done this way?
+
+#### Step 2b: Assess Each Comment
+
+Assign an assessment to each comment:
+
+| Assessment | Meaning |
+|------------|---------|
+| **Clear & Correct** | Valid concern, well-reasoned, applies to this code |
+| **Unclear** | Ambiguous, missing context, or doesn't specify what to change |
+| **Likely Incorrect** | Misunderstands the code, context, or requirements |
+| **YAGNI** | Over-engineering, premature abstraction, no clear benefit |
+
+#### Step 2c: Include Assessment in Todo
+
+**IMPORTANT: ALL comments become todos.** Never drop feedback - include the pressure test assessment IN the todo so `/triage` can use it to decide.
+
+For each comment, the todo will include:
+- The assessment (Clear & Correct / Unclear / Likely Incorrect / YAGNI)
+- The verification results (what was checked)
+- Technical justification (why valid, or why you think it should be skipped)
+- Recommended action for triage (Fix now / Clarify / Push back / Skip)
+
+The human reviews during `/triage` and makes the final call.
+
+</critical_evaluation>
+
+### 3. Categorize All Comments
+
+<categorization>
+
+For ALL comments (regardless of assessment), determine:
+
+**Severity (Priority):**
+- 🔴 **P1 (Critical)**: Security issues, data loss risks, breaking changes, blocking bugs
+- 🟡 **P2 (Important)**: Performance issues, architectural concerns, significant code quality
+- 🔵 **P3 (Nice-to-have)**: Style suggestions, minor improvements, documentation
+
+**Category Tags:**
+- `security` - Security vulnerabilities or concerns
+- `performance` - Performance issues or optimizations
+- `architecture` - Design or structural concerns
+- `bug` - Functional bugs or edge cases
+- `quality` - Code quality, readability, maintainability
+- `testing` - Test coverage or test quality
+- `documentation` - Missing or unclear documentation
+- `style` - Code style or formatting
+- `needs-clarification` - Comment requires clarification before implementing
+- `pushback-candidate` - Human should review before accepting
+
+**Skip these (don't create todos):**
+- Simple acknowledgments ("LGTM", "Looks good")
+- Questions that were answered inline
+- Already resolved threads
+
+**Note:** Comments assessed as YAGNI or Likely Incorrect still become todos with that assessment included. The human decides during `/triage` whether to accept or reject.
+
+</categorization>
+
+### 4. Create Todo Files Using file-todos Skill
+
+<critical_instruction>Create todo files for ALL actionable comments immediately. Use the file-todos skill structure and naming convention.</critical_instruction>
+
+#### Determine Next Issue ID
+
+```bash
+# Find the highest existing issue ID
+ls todos/ 2>/dev/null | grep -o '^[0-9]\+' | sort -n | tail -1 | awk '{printf "%03d", $1+1}'
+# If no todos exist, start with 001
+```
+
+#### File Naming Convention
+
+```
+{issue_id}-pending-{priority}-{brief-description}.md
+```
+
+Examples:
+```
+001-pending-p1-sql-injection-vulnerability.md
+002-pending-p2-missing-error-handling.md
+003-pending-p3-rename-variable-for-clarity.md
+```
+
+#### Todo File Structure
+
+For each comment, create a file with this structure:
+
+```yaml
+---
+status: pending
+priority: p1  # or p2, p3 based on severity
+issue_id: "001"
+tags: [code-review, pr-feedback, {category}]
+dependencies: []
+---
+```
+
+```markdown
+# [Brief Title from Comment]
+
+## Problem Statement
+
+[Summarize the reviewer's concern - what is wrong or needs improvement]
+
+**PR Context:**
+- PR: #{PR_NUMBER} - {PR_TITLE}
+- File: {file_path}:{line_number}
+- Reviewer: @{reviewer_username}
+
+## Assessment (Pressure Test)
+
+| Criterion | Result |
+|-----------|--------|
+| **Assessment** | Clear & Correct / Unclear / Likely Incorrect / YAGNI |
+| **Recommended Action** | Fix now / Clarify / Push back / Skip |
+| **Verified Code?** | Yes/No - [what was checked] |
+| **Verified Tests?** | Yes/No - [existing coverage] |
+| **Verified Usage?** | Yes/No - [how code is used] |
+| **Prior Decisions?** | Yes/No - [any intentional design] |
+
+**Technical Justification:**
+[If pushing back or marking YAGNI, provide specific technical reasoning. Reference codebase constraints, requirements, or trade-offs. Example: "This abstraction would be YAGNI - we only have one implementation and no plans for variants."]
+
+## Findings
+
+- **Original Comment:** "{exact reviewer comment}"
+- **Location:** `{file_path}:{line_number}`
+- **Code Context:**
+  ```{language}
+  {relevant code snippet}
+  ```
+- **Why This Matters:** [Impact if not addressed, or why it doesn't matter]
+
+## Proposed Solutions
+
+### Option 1: [Primary approach based on reviewer suggestion]
+
+**Approach:** [Describe the fix]
+
+**Pros:**
+- Addresses reviewer concern directly
+- [Other benefits]
+
+**Cons:**
+- [Any drawbacks]
+
+**Effort:** Small / Medium / Large
+
+**Risk:** Low / Medium / High
+
+---
+
+### Option 2: [Alternative if applicable]
+
+[Only include if there's a meaningful alternative approach]
+
+## Recommended Action
+
+*(To be filled during triage)*
+
+## Technical Details
+
+**Affected Files:**
+- `{file_path}:{line_number}` - {what needs changing}
+
+**Related Components:**
+- [Components affected by this change]
+
+## Resources
+
+- **PR:** #{PR_NUMBER}
+- **Comment Link:** {direct_link_to_comment}
+- **Reviewer:** @{reviewer_username}
+
+## Acceptance Criteria
+
+- [ ] Reviewer concern addressed
+- [ ] Tests pass
+- [ ] Code reviewed and approved
+- [ ] PR comment resolved
+
+## Work Log
+
+### {today's date} - Created from PR Review
+
+**By:** Claude Code
+
+**Actions:**
+- Extracted comment from PR #{PR_NUMBER} review
+- Created todo for triage
+
+**Learnings:**
+- Original reviewer context: {any additional context}
+```
+
+### 5. Parallel Todo Creation (For Multiple Comments)
+
+<parallel_processing>
+
+When processing PRs with many comments (5+), create todos in parallel for efficiency:
+
+1. Synthesize all comments into a categorized list
+2. Assign severity (P1/P2/P3) to each
+3. Launch parallel Write operations for all todos
+4. Each todo follows the file-todos skill template exactly
+
+</parallel_processing>
+
+### 6. Summary Report
+
+After creating all todo files, present:
+
+````markdown
+## ✅ PR Comments Converted to Todos
+
+**PR:** #{PR_NUMBER} - {PR_TITLE}
+**Branch:** {branch_name}
+**Total Comments Processed:** {X}
+
+### Created Todo Files:
+
+**🔴 P1 - Critical:**
+- `{id}-pending-p1-{desc}.md` - {summary}
+
+**🟡 P2 - Important:**
+- `{id}-pending-p2-{desc}.md` - {summary}
+
+**🔵 P3 - Nice-to-Have:**
+- `{id}-pending-p3-{desc}.md` - {summary}
+
+### Skipped (Not Actionable):
+- {count} comments skipped (LGTM, questions answered, resolved threads)
+
+### Assessment Summary:
+
+All comments were pressure tested and included in todos:
+
+| Assessment | Count | Description |
+|------------|-------|-------------|
+| **Clear & Correct** | {X} | Valid concerns, recommend fixing |
+| **Unclear** | {X} | Need clarification before implementing |
+| **Likely Incorrect** | {X} | May misunderstand context - review during triage |
+| **YAGNI** | {X} | May be over-engineering - review during triage |
+
+**Note:** All assessments are included in the todo files. Human judgment during `/triage` makes the final call on whether to accept, clarify, or reject each item.
+
+### Next Steps:
+
+1. **Triage the todos:**
+   ```bash
+   /triage
+   ```
+   Review each todo and approve (pending → ready) or skip
+
+2. **Work on approved items:**
+   ```bash
+   /resolve_todo_parallel
+   ```
+
+3. **After fixes, resolve PR comments:**
+   ```bash
+   bin/resolve-pr-thread THREAD_ID
+   ```
+````
+
+## Important Notes
+
+<requirements>
+- Ensure `todos/` directory exists before creating files
+- Each todo must have unique issue_id (never reuse)
+- All todos start with `status: pending` for triage
+- Include `code-review` and `pr-feedback` tags on all todos
+- Preserve exact reviewer quotes in Findings section
+- Link back to original PR and comment in Resources
+</requirements>
+
+## Integration with /triage
+
+The output of this command is designed to work seamlessly with `/triage`:
+
+1. **This command** creates `todos/*-pending-*.md` files
+2. **`/triage`** reviews each pending todo and:
+   - Approves → renames to `*-ready-*.md`
+   - Skips → deletes the todo file
+3. **`/resolve_todo_parallel`** works on approved (ready) todos
--- a/plugins/compound-engineering/commands/workflows/review.md
+++ b/plugins/compound-engineering/commands/workflows/review.md
@@ -214,7 +214,53 @@ Remove duplicates, prioritize by severity and impact.

 </synthesis_tasks>

-#### Step 2: Create Todo Files Using file-todos Skill
+#### Step 2: Pressure Test Each Finding
+
+<critical_evaluation>
+
+**IMPORTANT: Treat agent findings as suggestions, not mandates.**
+
+Not all findings are equally valid. Apply engineering judgment before creating todos. The goal is to make the right call for the codebase, not rubber-stamp every suggestion.
+
+**For each finding, verify:**
+
+| Check | Question |
+|-------|----------|
+| **Code** | Does the concern actually apply to this specific code? |
+| **Tests** | Are there existing tests that already cover this case? |
+| **Usage** | How is this code used in practice? Does the concern matter? |
+| **Compatibility** | Would the suggested change break anything? |
+| **Prior Decisions** | Was this intentional? Is there a documented reason? |
+| **Cost vs Benefit** | Is the fix worth the effort and risk? |
+
+**Assess each finding:**
+
+| Assessment | Meaning |
+|------------|---------|
+| **Clear & Correct** | Valid concern, well-reasoned, applies here |
+| **Unclear** | Ambiguous or missing context |
+| **Likely Incorrect** | Agent misunderstands code, context, or requirements |
+| **YAGNI** | Over-engineering, premature abstraction, no clear benefit |
+| **Duplicate** | Already covered by another finding (merge into existing) |
+
+**IMPORTANT: ALL findings become todos.** Never drop agent feedback - include the pressure test assessment IN each todo so `/triage` can use it.
+
+Each todo will include:
+- The assessment (Clear & Correct / Unclear / Likely Incorrect / YAGNI)
+- The verification results (what was checked)
+- Technical justification (why valid, or why you think it should be skipped)
+- Recommended action for triage (Fix now / Clarify / Push back / Skip)
+
+**Provide technical justification for all assessments:**
+- Don't just label - explain WHY with specific reasoning
+- Reference codebase constraints, requirements, or trade-offs
+- Example: "This abstraction would be YAGNI - we only have one implementation and no plans for variants. Adding it now increases complexity without clear benefit."
+
+The human reviews during `/triage` and makes the final call.
+
+</critical_evaluation>
+
+#### Step 3: Create Todo Files Using file-todos Skill

 <critical_instruction> Use the file-todos skill to create todo files for ALL findings immediately. Do NOT present findings one-by-one asking for user approval. Create all todo files in parallel using the skill, then summarize results to user. </critical_instruction>

@@ -224,7 +270,7 @@ Remove duplicates, prioritize by severity and impact.

 - Create todo files directly using Write tool
 - All findings in parallel for speed
- Use standard template from `.claude/skills/file-todos/assets/todo-template.md`
+- Invoke `Skill: "compound-engineering:file-todos"` and read the template from its assets directory
 - Follow naming convention: `{issue_id}-pending-{priority}-{description}.md`

 **Option B: Sub-Agents in Parallel (Recommended for Scale)** For large PRs with 15+ findings, use sub-agents to create finding files in parallel:
@@ -266,13 +312,13 @@ Sub-agents can:

 2. Use file-todos skill for structured todo management:

-   ```bash
-   skill: file-todos
+   ```
+   Skill: "compound-engineering:file-todos"
   ```

   The skill provides:

-   - Template location: `.claude/skills/file-todos/assets/todo-template.md`
+   - Template at `./assets/todo-template.md` (relative to skill directory)
   - Naming convention: `{issue_id}-{status}-{priority}-{description}.md`
   - YAML frontmatter structure: status, priority, issue_id, tags, dependencies
   - All required sections: Problem Statement, Findings, Solutions, etc.
@@ -292,7 +338,7 @@ Sub-agents can:
   004-pending-p3-unused-parameter.md
   ```

-5. Follow template structure from file-todos skill: `.claude/skills/file-todos/assets/todo-template.md`
+5. Follow template structure from file-todos skill (read `./assets/todo-template.md` from skill directory)

 **Todo File Structure (from template):**

@@ -300,6 +346,10 @@ Each todo must include:

 - **YAML frontmatter**: status, priority, issue_id, tags, dependencies
 - **Problem Statement**: What's broken/missing, why it matters
+- **Assessment (Pressure Test)**: Verification results and engineering judgment
+  - Assessment: Clear & Correct / Unclear / YAGNI
+  - Verified: Code, Tests, Usage, Prior Decisions
+  - Technical Justification: Why this finding is valid (or why skipped)
 - **Findings**: Discoveries from agents with evidence/location
 - **Proposed Solutions**: 2-3 options, each with pros/cons/effort/risk
 - **Recommended Action**: (Filled during triage, leave blank initially)
@@ -333,7 +383,7 @@ Examples:

 **Tagging:** Always add `code-review` tag, plus: `security`, `performance`, `architecture`, `rails`, `quality`, etc.

-#### Step 3: Summary Report
+#### Step 4: Summary Report

 After creating all todo files, present comprehensive summary:

@@ -367,13 +417,27 @@ After creating all todo files, present comprehensive summary:

 ### Review Agents Used:

- kieran-rails-reviewer
+- kieran-python-reviewer
 - security-sentinel
 - performance-oracle
 - architecture-strategist
 - agent-native-reviewer
 - [other agents]

+### Assessment Summary (Pressure Test Results):
+
+All agent findings were pressure tested and included in todos:
+
+| Assessment | Count | Description |
+|------------|-------|-------------|
+| **Clear & Correct** | {X} | Valid concerns, recommend fixing |
+| **Unclear** | {X} | Need clarification before implementing |
+| **Likely Incorrect** | {X} | May misunderstand context - review during triage |
+| **YAGNI** | {X} | May be over-engineering - review during triage |
+| **Duplicate** | {X} | Merged into other findings |
+
+**Note:** All assessments are included in the todo files. Human judgment during `/triage` makes the final call on whether to accept, clarify, or reject each item.
+
 ### Next Steps:

 1. **Address P1 Findings**: CRITICAL - must be fixed before merge
--- a/plugins/compound-engineering/skills/file-todos/SKILL.md
+++ b/plugins/compound-engineering/skills/file-todos/SKILL.md
@@ -44,6 +44,7 @@ Each todo is a markdown file with YAML frontmatter and structured sections. Use

 **Required sections:**
 - **Problem Statement** - What is broken, missing, or needs improvement?
+- **Assessment (Pressure Test)** - For code review findings: verification results and engineering judgment
 - **Findings** - Investigation results, root cause, key discoveries
 - **Proposed Solutions** - Multiple options with pros/cons, effort, risk
 - **Recommended Action** - Clear plan (filled during triage)
@@ -55,6 +56,12 @@ Each todo is a markdown file with YAML frontmatter and structured sections. Use
 - **Resources** - Links to errors, tests, PRs, documentation
 - **Notes** - Additional context or decisions

+**Assessment section fields (for code review findings):**
+- Assessment: Clear & Correct | Unclear | Likely Incorrect | YAGNI
+- Recommended Action: Fix now | Clarify | Push back | Skip
+- Verified: Code, Tests, Usage, Prior Decisions (Yes/No with details)
+- Technical Justification: Why this finding is valid or should be skipped
+
 **YAML frontmatter fields:**
 ```yaml
 ---
--- a/plugins/compound-engineering/skills/file-todos/assets/todo-template.md
+++ b/plugins/compound-engineering/skills/file-todos/assets/todo-template.md
@@ -19,6 +19,22 @@ What is broken, missing, or needs improvement? Provide clear context about why t
 - Email service is missing proper error handling for rate-limit scenarios
 - Documentation doesn't cover the new authentication flow

+## Assessment (Pressure Test)
+
+*(For findings from code review or automated agents)*
+
+| Criterion | Result |
+|-----------|--------|
+| **Assessment** | Clear & Correct / Unclear / Likely Incorrect / YAGNI |
+| **Recommended Action** | Fix now / Clarify / Push back / Skip |
+| **Verified Code?** | Yes/No - [what was checked] |
+| **Verified Tests?** | Yes/No - [existing coverage] |
+| **Verified Usage?** | Yes/No - [how code is used] |
+| **Prior Decisions?** | Yes/No - [any intentional design] |
+
+**Technical Justification:**
+[If pushing back or marking YAGNI, provide specific technical reasoning. Reference codebase constraints, requirements, or trade-offs.]
+
 ## Findings

 Investigation results, root cause analysis, and key discoveries.