feat(document-review): smarter autofix, batch-confirm, and error/omission classification (#401)

This commit is contained in:
Trevin Chow
2026-03-26 19:40:13 -07:00
committed by GitHub
parent b30288c44e
commit 0863cfa4cb
4 changed files with 95 additions and 46 deletions

View File

@@ -19,6 +19,7 @@
"severity",
"section",
"why_it_matters",
"finding_type",
"autofix_class",
"confidence",
"evidence"
@@ -44,8 +45,13 @@
},
"autofix_class": {
"type": "string",
"enum": ["auto", "present"],
"description": "How this issue should be handled. auto = local deterministic fix the orchestrator can apply without asking (terminology, formatting, cross-references). present = requires user judgment."
"enum": ["auto", "batch_confirm", "present"],
"description": "How this issue should be handled. auto = local deterministic fix applied silently (terminology, formatting, cross-references, completeness corrections). batch_confirm = obvious fix with a clear correct answer, but touches meaning enough to warrant grouped approval. present = requires individual user judgment."
},
"finding_type": {
"type": "string",
"enum": ["error", "omission"],
"description": "Whether the finding is a mistake in what the document says (error) or something the document forgot to say (omission). Errors are design tensions, contradictions, or incorrect statements. Omissions are missing mechanical steps, forgotten list entries, or absent details."
},
"suggested_fix": {
"type": ["string", "null"],
@@ -91,8 +97,13 @@
"P3": "Minor improvement. User's discretion."
},
"autofix_classes": {
"auto": "Local, deterministic document fix: terminology consistency, formatting, cross-reference correction. Must be unambiguous and not change the document's meaning.",
"present": "Requires user judgment -- strategic questions, tradeoffs, meaning-changing fixes, or informational findings."
"auto": "Local, deterministic document fix: terminology consistency, formatting, cross-reference correction, completeness corrections (wrong counts, missing list entries, undefined values where the correct value is verifiable from elsewhere in the document). Must be unambiguous.",
"batch_confirm": "Obvious fix with a clear correct answer, but touches document meaning enough to warrant user awareness. Grouped with other batch_confirm findings for a single approval rather than individual review. Examples: adding a missing implementation step that is mechanically implied, updating a section summary to reflect its own contents.",
"present": "Requires individual user judgment -- strategic questions, design tradeoffs, meaning-changing fixes, or findings where reasonable people could disagree on the right action."
},
"finding_types": {
"error": "Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs. These are mistakes in what exists.",
"omission": "Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references. These are gaps in completeness."
}
}
}

View File

@@ -15,35 +15,52 @@ Use this **exact format** when presenting synthesized review findings. Findings
- security-lens -- plan adds public API endpoint with auth flow
- scope-guardian -- plan has 15 requirements across 3 priority levels
Applied 3 auto-fixes. Batched 2 fixes for approval. 4 findings to consider (2 errors, 2 omissions).
### Auto-fixes Applied
- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence, auto)
- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence, auto)
- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence)
- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence)
- Updated unit count from "6 units" to "7 units" to match listed units (coherence)
### Batch Confirm
These fixes have one clear correct answer but touch document meaning. Apply all?
| # | Section | Fix | Reviewer |
|---|---------|-----|----------|
| 1 | Unit 4 | Add "update API rate-limit config" step -- implied by Unit 3's rate-limit introduction | feasibility |
| 2 | Verification | Add auth token refresh to test scenarios -- required by Unit 2's token expiry handling | security-lens |
### P0 -- Must Fix
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 | `present` |
#### Errors
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 |
### P1 -- Should Fix
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 2 | Implementation Unit 3 | Plan proposes custom auth when codebase already uses Devise | feasibility | 0.85 | `present` |
| 3 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 | `present` |
#### Errors
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 2 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 |
#### Omissions
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 3 | Implementation Unit 3 | Plan proposes custom auth but does not mention existing Devise setup or migration path | feasibility | 0.85 |
### P2 -- Consider Fixing
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 | `present` |
#### Omissions
### P3 -- Minor
| # | Section | Issue | Reviewer | Confidence | Route |
|---|---------|-------|----------|------------|-------|
| 5 | Overview | "Service" used to mean both microservice and business class | coherence | 0.65 | `auto` |
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 |
### Residual Concerns
@@ -59,20 +76,22 @@ Use this **exact format** when presenting synthesized review findings. Findings
### Coverage
| Persona | Status | Findings | Residual |
|---------|--------|----------|----------|
| coherence | completed | 2 | 0 |
| feasibility | completed | 1 | 1 |
| security-lens | completed | 1 | 0 |
| scope-guardian | completed | 1 | 0 |
| product-lens | not activated | -- | -- |
| design-lens | not activated | -- | -- |
| Persona | Status | Findings | Auto | Batch | Present | Residual |
|---------|--------|----------|------|-------|---------|----------|
| coherence | completed | 3 | 2 | 0 | 1 | 0 |
| feasibility | completed | 2 | 0 | 1 | 1 | 1 |
| security-lens | completed | 2 | 0 | 1 | 1 | 0 |
| scope-guardian | completed | 1 | 0 | 0 | 1 | 0 |
| product-lens | not activated | -- | -- | -- | -- | -- |
| design-lens | not activated | -- | -- | -- | -- | -- |
```
## Section Rules
- **Summary line**: Always present after the reviewer list. Format: "Applied N auto-fixes. Batched M fixes for approval. K findings to consider (X errors, Y omissions)." Omit any zero clause.
- **Auto-fixes Applied**: List fixes that were applied automatically (auto class). Omit section if none.
- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels.
- **Batch Confirm**: Group `batch_confirm` findings for a single yes/no/select approval. Omit section if none.
- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels. Within each severity, separate into **Errors** and **Omissions** sub-headers. Omit a sub-header if that severity has none of that type.
- **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none.
- **Deferred Questions**: Questions for later workflow stages. Omit if none.
- **Coverage**: Always include. Shows which personas ran and their output counts.
- **Coverage**: Always include. Shows which personas ran and their output counts broken down by route (Auto, Batch, Present).

View File

@@ -22,9 +22,13 @@ Rules:
- Suppress any finding below your stated confidence floor (see your Confidence calibration section).
- Every finding MUST include at least one evidence item -- a direct quote from the document.
- You are operationally read-only. Analyze the document and produce findings. Do not edit the document, create files, or make changes. You may use non-mutating tools (file reads, glob, grep, git log) to gather context about the codebase when evaluating feasibility or existing patterns.
- Set `finding_type` for every finding:
- `error`: Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs.
- `omission`: Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references.
- Set `autofix_class` conservatively:
- `auto`: Only for local, deterministic fixes -- terminology corrections, formatting fixes, cross-reference repairs. The fix must be unambiguous and not change the document's meaning.
- `present`: Everything else -- strategic questions, tradeoffs, meaning-changing fixes, informational findings.
- `auto`: Deterministic fixes where the correct value is verifiable from the document itself -- terminology corrections, formatting fixes, cross-reference repairs, wrong counts, missing list entries where the correct entry exists elsewhere in the document. The fix must be unambiguous.
- `batch_confirm`: Obvious fix with one clear correct answer, but it touches document meaning. Examples: adding a missing implementation step that is mechanically implied by other content, updating a summary to match its own details. Use when reasonable people would agree on the fix but it goes beyond cosmetic correction.
- `present`: Everything else -- strategic questions, tradeoffs, design tensions where reasonable people could disagree, informational findings.
- `suggested_fix` is optional. Only include it when the fix is obvious and correct. For `present` findings, frame as a question instead.
- If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
- Use your suppress conditions. Do not flag issues that belong to other personas.