[2.23.0] Add /agent-native-audit command

- New command for comprehensive agent-native architecture review - Launches 8 parallel sub-agents, one per core principle - Principles: Action Parity, Tools as Primitives, Context Injection, Shared Workspace, CRUD Completeness, UI Integration, Capability Discovery, Prompt-Native Features - Each agent produces specific score (X/Y format with percentage) - Generates summary report with overall score and top 10 recommendations - Supports single principle audit via argument Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-08 08:49:04 -08:00
parent 68aa93678c
commit 8c4ed0d458
3 changed files with 296 additions and 2 deletions
--- a/plugins/compound-engineering/.claude-plugin/plugin.json
+++ b/plugins/compound-engineering/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
  "name": "compound-engineering",
-  "version": "2.22.0",
-  "description": "AI-powered development tools. 27 agents, 20 commands, 13 skills, 2 MCP servers for code review, research, design, and workflow automation.",
+  "version": "2.23.0",
+  "description": "AI-powered development tools. 27 agents, 21 commands, 13 skills, 2 MCP servers for code review, research, design, and workflow automation.",
  "author": {
    "name": "Kieran Klaassen",
    "email": "kieran@every.to",
--- a/plugins/compound-engineering/CHANGELOG.md
+++ b/plugins/compound-engineering/CHANGELOG.md
@@ -5,6 +5,23 @@ All notable changes to the compound-engineering plugin will be documented in thi
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

+## [2.23.0] - 2026-01-08
+
+### Added
+
+- **`/agent-native-audit` command** - Comprehensive agent-native architecture review
+  - Launches 8 parallel sub-agents, one per core principle
+  - Principles: Action Parity, Tools as Primitives, Context Injection, Shared Workspace, CRUD Completeness, UI Integration, Capability Discovery, Prompt-Native Features
+  - Each agent produces specific score (X/Y format with percentage)
+  - Generates summary report with overall score and top 10 recommendations
+  - Supports single principle audit via argument
+
+### Summary
+
+- 27 agents, 21 commands, 13 skills, 2 MCP servers
+
+---
+
 ## [2.22.0] - 2026-01-05

 ### Added
--- a/plugins/compound-engineering/commands/agent-native-audit.md
+++ b/plugins/compound-engineering/commands/agent-native-audit.md
@@ -0,0 +1,277 @@
+---
+name: agent-native-audit
+description: Run comprehensive agent-native architecture review with scored principles
+argument-hint: "[optional: specific principle to audit]"
+---
+
+# Agent-Native Architecture Audit
+
+Conduct a comprehensive review of the codebase against agent-native architecture principles, launching parallel sub-agents for each principle and producing a scored report.
+
+## Core Principles to Audit
+
+1. **Action Parity** - "Whatever the user can do, the agent can do"
+2. **Tools as Primitives** - "Tools provide capability, not behavior"
+3. **Context Injection** - "System prompt includes dynamic context about app state"
+4. **Shared Workspace** - "Agent and user work in the same data space"
+5. **CRUD Completeness** - "Every entity has full CRUD (Create, Read, Update, Delete)"
+6. **UI Integration** - "Agent actions immediately reflected in UI"
+7. **Capability Discovery** - "Users can discover what the agent can do"
+8. **Prompt-Native Features** - "Features are prompts defining outcomes, not code"
+
+## Workflow
+
+### Step 1: Load the Agent-Native Skill
+
+First, invoke the agent-native-architecture skill to understand all principles:
+
+```
+/compound-engineering:agent-native-architecture
+```
+
+Select option 7 (action parity) to load the full reference material.
+
+### Step 2: Launch Parallel Sub-Agents
+
+Launch 8 parallel sub-agents using the Task tool with `subagent_type: Explore`, one for each principle. Each agent should:
+
+1. Enumerate ALL instances in the codebase (user actions, tools, contexts, data stores, etc.)
+2. Check compliance against the principle
+3. Provide a SPECIFIC SCORE like "X out of Y (percentage%)"
+4. List specific gaps and recommendations
+
+<sub-agents>
+
+**Agent 1: Action Parity**
+```
+Audit for ACTION PARITY - "Whatever the user can do, the agent can do."
+
+Tasks:
+1. Enumerate ALL user actions in frontend (API calls, button clicks, form submissions)
+   - Search for API service files, fetch calls, form handlers
+   - Check routes and components for user interactions
+2. Check which have corresponding agent tools
+   - Search for agent tool definitions
+   - Map user actions to agent capabilities
+3. Score: "Agent can do X out of Y user actions"
+
+Format:
+## Action Parity Audit
+### User Actions Found
+| Action | Location | Agent Tool | Status |
+### Score: X/Y (percentage%)
+### Missing Agent Tools
+### Recommendations
+```
+
+**Agent 2: Tools as Primitives**
+```
+Audit for TOOLS AS PRIMITIVES - "Tools provide capability, not behavior."
+
+Tasks:
+1. Find and read ALL agent tool files
+2. Classify each as:
+   - PRIMITIVE (good): read, write, store, list - enables capability without business logic
+   - WORKFLOW (bad): encodes business logic, makes decisions, orchestrates steps
+3. Score: "X out of Y tools are proper primitives"
+
+Format:
+## Tools as Primitives Audit
+### Tool Analysis
+| Tool | File | Type | Reasoning |
+### Score: X/Y (percentage%)
+### Problematic Tools (workflows that should be primitives)
+### Recommendations
+```
+
+**Agent 3: Context Injection**
+```
+Audit for CONTEXT INJECTION - "System prompt includes dynamic context about app state"
+
+Tasks:
+1. Find context injection code (search for "context", "system prompt", "inject")
+2. Read agent prompts and system messages
+3. Enumerate what IS injected vs what SHOULD be:
+   - Available resources (files, drafts, documents)
+   - User preferences/settings
+   - Recent activity
+   - Available capabilities listed
+   - Session history
+   - Workspace state
+
+Format:
+## Context Injection Audit
+### Context Types Analysis
+| Context Type | Injected? | Location | Notes |
+### Score: X/Y (percentage%)
+### Missing Context
+### Recommendations
+```
+
+**Agent 4: Shared Workspace**
+```
+Audit for SHARED WORKSPACE - "Agent and user work in the same data space"
+
+Tasks:
+1. Identify all data stores/tables/models
+2. Check if agents read/write to SAME tables or separate ones
+3. Look for sandbox isolation anti-pattern (agent has separate data space)
+
+Format:
+## Shared Workspace Audit
+### Data Store Analysis
+| Data Store | User Access | Agent Access | Shared? |
+### Score: X/Y (percentage%)
+### Isolated Data (anti-pattern)
+### Recommendations
+```
+
+**Agent 5: CRUD Completeness**
+```
+Audit for CRUD COMPLETENESS - "Every entity has full CRUD"
+
+Tasks:
+1. Identify all entities/models in the codebase
+2. For each entity, check if agent tools exist for:
+   - Create
+   - Read
+   - Update
+   - Delete
+3. Score per entity and overall
+
+Format:
+## CRUD Completeness Audit
+### Entity CRUD Analysis
+| Entity | Create | Read | Update | Delete | Score |
+### Overall Score: X/Y entities with full CRUD (percentage%)
+### Incomplete Entities (list missing operations)
+### Recommendations
+```
+
+**Agent 6: UI Integration**
+```
+Audit for UI INTEGRATION - "Agent actions immediately reflected in UI"
+
+Tasks:
+1. Check how agent writes/changes propagate to frontend
+2. Look for:
+   - Streaming updates (SSE, WebSocket)
+   - Polling mechanisms
+   - Shared state/services
+   - Event buses
+   - File watching
+3. Identify "silent actions" anti-pattern (agent changes state but UI doesn't update)
+
+Format:
+## UI Integration Audit
+### Agent Action → UI Update Analysis
+| Agent Action | UI Mechanism | Immediate? | Notes |
+### Score: X/Y (percentage%)
+### Silent Actions (anti-pattern)
+### Recommendations
+```
+
+**Agent 7: Capability Discovery**
+```
+Audit for CAPABILITY DISCOVERY - "Users can discover what the agent can do"
+
+Tasks:
+1. Check for these 7 discovery mechanisms:
+   - Onboarding flow showing agent capabilities
+   - Help documentation
+   - Capability hints in UI
+   - Agent self-describes in responses
+   - Suggested prompts/actions
+   - Empty state guidance
+   - Slash commands (/help, /tools)
+2. Score against 7 mechanisms
+
+Format:
+## Capability Discovery Audit
+### Discovery Mechanism Analysis
+| Mechanism | Exists? | Location | Quality |
+### Score: X/7 (percentage%)
+### Missing Discovery
+### Recommendations
+```
+
+**Agent 8: Prompt-Native Features**
+```
+Audit for PROMPT-NATIVE FEATURES - "Features are prompts defining outcomes, not code"
+
+Tasks:
+1. Read all agent prompts
+2. Classify each feature/behavior as defined in:
+   - PROMPT (good): outcomes defined in natural language
+   - CODE (bad): business logic hardcoded
+3. Check if behavior changes require prompt edit vs code change
+
+Format:
+## Prompt-Native Features Audit
+### Feature Definition Analysis
+| Feature | Defined In | Type | Notes |
+### Score: X/Y (percentage%)
+### Code-Defined Features (anti-pattern)
+### Recommendations
+```
+
+</sub-agents>
+
+### Step 3: Compile Summary Report
+
+After all agents complete, compile a summary with:
+
+```markdown
+## Agent-Native Architecture Review: [Project Name]
+
+### Overall Score Summary
+
+| Core Principle | Score | Percentage | Status |
+|----------------|-------|------------|--------|
+| Action Parity | X/Y | Z% | ✅/⚠️/❌ |
+| Tools as Primitives | X/Y | Z% | ✅/⚠️/❌ |
+| Context Injection | X/Y | Z% | ✅/⚠️/❌ |
+| Shared Workspace | X/Y | Z% | ✅/⚠️/❌ |
+| CRUD Completeness | X/Y | Z% | ✅/⚠️/❌ |
+| UI Integration | X/Y | Z% | ✅/⚠️/❌ |
+| Capability Discovery | X/Y | Z% | ✅/⚠️/❌ |
+| Prompt-Native Features | X/Y | Z% | ✅/⚠️/❌ |
+
+**Overall Agent-Native Score: X%**
+
+### Status Legend
+- ✅ Excellent (80%+)
+- ⚠️ Partial (50-79%)
+- ❌ Needs Work (<50%)
+
+### Top 10 Recommendations by Impact
+
+| Priority | Action | Principle | Effort |
+|----------|--------|-----------|--------|
+
+### What's Working Excellently
+
+[List top 5 strengths]
+```
+
+## Success Criteria
+
+- [ ] All 8 sub-agents complete their audits
+- [ ] Each principle has a specific numeric score (X/Y format)
+- [ ] Summary table shows all scores and status indicators
+- [ ] Top 10 recommendations are prioritized by impact
+- [ ] Report identifies both strengths and gaps
+
+## Optional: Single Principle Audit
+
+If $ARGUMENTS specifies a single principle (e.g., "action parity"), only run that sub-agent and provide detailed findings for that principle alone.
+
+Valid arguments:
+- `action parity` or `1`
+- `tools` or `primitives` or `2`
+- `context` or `injection` or `3`
+- `shared` or `workspace` or `4`
+- `crud` or `5`
+- `ui` or `integration` or `6`
+- `discovery` or `7`
+- `prompt` or `features` or `8`