[2.10.0] Add agent-native reviewer and architecture skill

- Add agent-native-reviewer agent to verify features are agent-accessible - Add agent-native-architecture skill for prompt-native design patterns - Add agent-native-reviewer to /review command parallel agents - Move agent-native skill to correct plugin folder - Update component counts (25 agents, 12 skills) - Include mermaid dark mode fix from PR #45 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 11:26:02 -08:00
parent abeb76c485
commit 4ea9f52ba9
12 changed files with 122 additions and 7 deletions
--- a/plugins/compound-engineering/skills/agent-native-architecture/references/architecture-patterns.md
+++ b/plugins/compound-engineering/skills/agent-native-architecture/references/architecture-patterns.md
@@ -0,0 +1,215 @@
+<overview>
+Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
+</overview>
+
+<pattern name="event-driven-agent">
+## Event-Driven Agent Architecture
+
+The agent runs as a long-lived process that responds to events. Events become prompts.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    Agent Loop                                │
+├─────────────────────────────────────────────────────────────┤
+│  Event Source → Agent (Claude) → Tool Calls → Response      │
+└─────────────────────────────────────────────────────────────┘
+                          │
+          ┌───────────────┼───────────────┐
+          ▼               ▼               ▼
+    ┌─────────┐    ┌──────────┐    ┌───────────┐
+    │ Content │    │   Self   │    │   Data    │
+    │  Tools  │    │  Tools   │    │   Tools   │
+    └─────────┘    └──────────┘    └───────────┘
+    (write_file)   (read_source)   (store_item)
+                   (restart)       (list_items)
+```
+
+**Key characteristics:**
+- Events (messages, webhooks, timers) trigger agent turns
+- Agent decides how to respond based on system prompt
+- Tools are primitives for IO, not business logic
+- State persists between events via data tools
+
+**Example: Discord feedback bot**
+```typescript
+// Event source
+client.on("messageCreate", (message) => {
+  if (!message.author.bot) {
+    runAgent({
+      userMessage: `New message from ${message.author}: "${message.content}"`,
+      channelId: message.channelId,
+    });
+  }
+});
+
+// System prompt defines behavior
+const systemPrompt = `
+When someone shares feedback:
+1. Acknowledge their feedback warmly
+2. Ask clarifying questions if needed
+3. Store it using the feedback tools
+4. Update the feedback site
+
+Use your judgment about importance and categorization.
+`;
+```
+</pattern>
+
+<pattern name="two-layer-git">
+## Two-Layer Git Architecture
+
+For self-modifying agents, separate code (shared) from data (instance-specific).
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     GitHub (shared repo)                     │
+│  - src/           (agent code)                              │
+│  - site/          (web interface)                           │
+│  - package.json   (dependencies)                            │
+│  - .gitignore     (excludes data/, logs/)                   │
+└─────────────────────────────────────────────────────────────┘
+                          │
+                     git clone
+                          │
+                          ▼
+┌─────────────────────────────────────────────────────────────┐
+│                  Instance (Server)                           │
+│                                                              │
+│  FROM GITHUB (tracked):                                      │
+│  - src/           → pushed back on code changes             │
+│  - site/          → pushed, triggers deployment             │
+│                                                              │
+│  LOCAL ONLY (untracked):                                     │
+│  - data/          → instance-specific storage               │
+│  - logs/          → runtime logs                            │
+│  - .env           → secrets                                 │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Why this works:**
+- Code and site are version controlled (GitHub)
+- Raw data stays local (instance-specific)
+- Site is generated from data, so reproducible
+- Automatic rollback via git history
+</pattern>
+
+<pattern name="multi-instance">
+## Multi-Instance Branching
+
+Each agent instance gets its own branch while sharing core code.
+
+```
+main                        # Shared features, bug fixes
+├── instance/feedback-bot   # Every Reader feedback bot
+├── instance/support-bot    # Customer support bot
+└── instance/research-bot   # Research assistant
+```
+
+**Change flow:**
+| Change Type | Work On | Then |
+|-------------|---------|------|
+| Core features | main | Merge to instance branches |
+| Bug fixes | main | Merge to instance branches |
+| Instance config | instance branch | Done |
+| Instance data | instance branch | Done |
+
+**Sync tools:**
+```typescript
+tool("self_deploy", "Pull latest from main, rebuild, restart", ...)
+tool("sync_from_instance", "Merge from another instance", ...)
+tool("propose_to_main", "Create PR to share improvements", ...)
+```
+</pattern>
+
+<pattern name="site-as-output">
+## Site as Agent Output
+
+The agent generates and maintains a website as a natural output, not through specialized site tools.
+
+```
+Discord Message
+      ↓
+Agent processes it, extracts insights
+      ↓
+Agent decides what site updates are needed
+      ↓
+Agent writes files using write_file primitive
+      ↓
+Git commit + push triggers deployment
+      ↓
+Site updates automatically
+```
+
+**Key insight:** Don't build site generation tools. Give the agent file tools and teach it in the prompt how to create good sites.
+
+```markdown
+## Site Management
+
+You maintain a public feedback site. When feedback comes in:
+1. Use write_file to update site/public/content/feedback.json
+2. If the site's React components need improvement, modify them
+3. Commit changes and push to trigger Vercel deploy
+
+The site should be:
+- Clean, modern dashboard aesthetic
+- Clear visual hierarchy
+- Status organization (Inbox, Active, Done)
+
+You decide the structure. Make it good.
+```
+</pattern>
+
+<pattern name="approval-gates">
+## Approval Gates Pattern
+
+Separate "propose" from "apply" for dangerous operations.
+
+```typescript
+// Pending changes stored separately
+const pendingChanges = new Map<string, string>();
+
+tool("write_file", async ({ path, content }) => {
+  if (requiresApproval(path)) {
+    // Store for approval
+    pendingChanges.set(path, content);
+    const diff = generateDiff(path, content);
+    return {
+      text: `Change requires approval.\n\n${diff}\n\nReply "yes" to apply.`
+    };
+  } else {
+    // Apply immediately
+    writeFileSync(path, content);
+    return { text: `Wrote ${path}` };
+  }
+});
+
+tool("apply_pending", async () => {
+  for (const [path, content] of pendingChanges) {
+    writeFileSync(path, content);
+  }
+  pendingChanges.clear();
+  return { text: "Applied all pending changes" };
+});
+```
+
+**What requires approval:**
+- src/*.ts (agent code)
+- package.json (dependencies)
+- system prompt changes
+
+**What doesn't:**
+- data/* (instance data)
+- site/* (generated content)
+- docs/* (documentation)
+</pattern>
+
+<design_questions>
+## Questions to Ask When Designing
+
+1. **What events trigger agent turns?** (messages, webhooks, timers, user requests)
+2. **What primitives does the agent need?** (read, write, call API, restart)
+3. **What decisions should the agent make?** (format, structure, priority, action)
+4. **What decisions should be hardcoded?** (security boundaries, approval requirements)
+5. **How does the agent verify its work?** (health checks, build verification)
+6. **How does the agent recover from mistakes?** (git rollback, approval gates)
+</design_questions>
--- a/plugins/compound-engineering/skills/agent-native-architecture/references/mcp-tool-design.md
+++ b/plugins/compound-engineering/skills/agent-native-architecture/references/mcp-tool-design.md
@@ -0,0 +1,316 @@
+<overview>
+How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.
+
+**Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.
+</overview>
+
+<principle name="primitives-not-workflows">
+## Tools Are Primitives, Not Workflows
+
+**Wrong approach:** Tools that encode business logic
+```typescript
+tool("process_feedback", {
+  feedback: z.string(),
+  category: z.enum(["bug", "feature", "question"]),
+  priority: z.enum(["low", "medium", "high"]),
+}, async ({ feedback, category, priority }) => {
+  // Tool decides how to process
+  const processed = categorize(feedback);
+  const stored = await saveToDatabase(processed);
+  const notification = await notify(priority);
+  return { processed, stored, notification };
+});
+```
+
+**Right approach:** Primitives that enable any workflow
+```typescript
+tool("store_item", {
+  key: z.string(),
+  value: z.any(),
+}, async ({ key, value }) => {
+  await db.set(key, value);
+  return { text: `Stored ${key}` };
+});
+
+tool("send_message", {
+  channel: z.string(),
+  content: z.string(),
+}, async ({ channel, content }) => {
+  await messenger.send(channel, content);
+  return { text: "Sent" };
+});
+```
+
+The agent decides categorization, priority, and when to notify based on the system prompt.
+</principle>
+
+<principle name="descriptive-names">
+## Tools Should Have Descriptive, Primitive Names
+
+Names should describe the capability, not the use case:
+
+| Wrong | Right |
+|-------|-------|
+| `process_user_feedback` | `store_item` |
+| `create_feedback_summary` | `write_file` |
+| `send_notification` | `send_message` |
+| `deploy_to_production` | `git_push` |
+
+The prompt tells the agent *when* to use primitives. The tool just provides *capability*.
+</principle>
+
+<principle name="simple-inputs">
+## Inputs Should Be Simple
+
+Tools accept data. They don't accept decisions.
+
+**Wrong:** Tool accepts decisions
+```typescript
+tool("format_content", {
+  content: z.string(),
+  format: z.enum(["markdown", "html", "json"]),
+  style: z.enum(["formal", "casual", "technical"]),
+}, ...)
+```
+
+**Right:** Tool accepts data, agent decides format
+```typescript
+tool("write_file", {
+  path: z.string(),
+  content: z.string(),
+}, ...)
+// Agent decides to write index.html with HTML content, or data.json with JSON
+```
+</principle>
+
+<principle name="rich-outputs">
+## Outputs Should Be Rich
+
+Return enough information for the agent to verify and iterate.
+
+**Wrong:** Minimal output
+```typescript
+async ({ key }) => {
+  await db.delete(key);
+  return { text: "Deleted" };
+}
+```
+
+**Right:** Rich output
+```typescript
+async ({ key }) => {
+  const existed = await db.has(key);
+  if (!existed) {
+    return { text: `Key ${key} did not exist` };
+  }
+  await db.delete(key);
+  return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
+}
+```
+</principle>
+
+<design_template>
+## Tool Design Template
+
+```typescript
+import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
+import { z } from "zod";
+
+export const serverName = createSdkMcpServer({
+  name: "server-name",
+  version: "1.0.0",
+  tools: [
+    // READ operations
+    tool(
+      "read_item",
+      "Read an item by key",
+      { key: z.string().describe("Item key") },
+      async ({ key }) => {
+        const item = await storage.get(key);
+        return {
+          content: [{
+            type: "text",
+            text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
+          }],
+          isError: !item,
+        };
+      }
+    ),
+
+    tool(
+      "list_items",
+      "List all items, optionally filtered",
+      {
+        prefix: z.string().optional().describe("Filter by key prefix"),
+        limit: z.number().default(100).describe("Max items"),
+      },
+      async ({ prefix, limit }) => {
+        const items = await storage.list({ prefix, limit });
+        return {
+          content: [{
+            type: "text",
+            text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
+          }],
+        };
+      }
+    ),
+
+    // WRITE operations
+    tool(
+      "store_item",
+      "Store an item",
+      {
+        key: z.string().describe("Item key"),
+        value: z.any().describe("Item data"),
+      },
+      async ({ key, value }) => {
+        await storage.set(key, value);
+        return {
+          content: [{ type: "text", text: `Stored ${key}` }],
+        };
+      }
+    ),
+
+    tool(
+      "delete_item",
+      "Delete an item",
+      { key: z.string().describe("Item key") },
+      async ({ key }) => {
+        const existed = await storage.delete(key);
+        return {
+          content: [{
+            type: "text",
+            text: existed ? `Deleted ${key}` : `${key} did not exist`,
+          }],
+        };
+      }
+    ),
+
+    // EXTERNAL operations
+    tool(
+      "call_api",
+      "Make an HTTP request",
+      {
+        url: z.string().url(),
+        method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
+        body: z.any().optional(),
+      },
+      async ({ url, method, body }) => {
+        const response = await fetch(url, { method, body: JSON.stringify(body) });
+        const text = await response.text();
+        return {
+          content: [{
+            type: "text",
+            text: `${response.status} ${response.statusText}\n\n${text}`,
+          }],
+          isError: !response.ok,
+        };
+      }
+    ),
+  ],
+});
+```
+</design_template>
+
+<example name="feedback-server">
+## Example: Feedback Storage Server
+
+This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.
+
+```typescript
+export const feedbackMcpServer = createSdkMcpServer({
+  name: "feedback",
+  version: "1.0.0",
+  tools: [
+    tool(
+      "store_feedback",
+      "Store a feedback item",
+      {
+        item: z.object({
+          id: z.string(),
+          author: z.string(),
+          content: z.string(),
+          importance: z.number().min(1).max(5),
+          timestamp: z.string(),
+          status: z.string().optional(),
+          urls: z.array(z.string()).optional(),
+          metadata: z.any().optional(),
+        }).describe("Feedback item"),
+      },
+      async ({ item }) => {
+        await db.feedback.insert(item);
+        return {
+          content: [{
+            type: "text",
+            text: `Stored feedback ${item.id} from ${item.author}`,
+          }],
+        };
+      }
+    ),
+
+    tool(
+      "list_feedback",
+      "List feedback items",
+      {
+        limit: z.number().default(50),
+        status: z.string().optional(),
+      },
+      async ({ limit, status }) => {
+        const items = await db.feedback.list({ limit, status });
+        return {
+          content: [{
+            type: "text",
+            text: JSON.stringify(items, null, 2),
+          }],
+        };
+      }
+    ),
+
+    tool(
+      "update_feedback",
+      "Update a feedback item",
+      {
+        id: z.string(),
+        updates: z.object({
+          status: z.string().optional(),
+          importance: z.number().optional(),
+          metadata: z.any().optional(),
+        }),
+      },
+      async ({ id, updates }) => {
+        await db.feedback.update(id, updates);
+        return {
+          content: [{ type: "text", text: `Updated ${id}` }],
+        };
+      }
+    ),
+  ],
+});
+```
+
+The system prompt then tells the agent *how* to use these primitives:
+
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Extract author, content, and any URLs
+2. Rate importance 1-5 based on actionability
+3. Store using feedback.store_feedback
+4. If high importance (4-5), notify the channel
+
+Use your judgment about importance ratings.
+```
+</example>
+
+<checklist>
+## MCP Tool Design Checklist
+
+- [ ] Tool names describe capability, not use case
+- [ ] Inputs are data, not decisions
+- [ ] Outputs are rich (enough for agent to verify)
+- [ ] CRUD operations are separate tools (not one mega-tool)
+- [ ] No business logic in tool implementations
+- [ ] Error states clearly communicated via `isError`
+- [ ] Descriptions explain what the tool does, not when to use it
+</checklist>
--- a/plugins/compound-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md
+++ b/plugins/compound-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md
@@ -0,0 +1,317 @@
+<overview>
+How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives.
+</overview>
+
+<diagnosis>
+## Diagnosing Non-Prompt-Native Code
+
+Signs your agent isn't prompt-native:
+
+**Tools that encode workflows:**
+```typescript
+// RED FLAG: Tool contains business logic
+tool("process_feedback", async ({ message }) => {
+  const category = categorize(message);        // Logic in code
+  const priority = calculatePriority(message); // Logic in code
+  await store(message, category, priority);    // Orchestration in code
+  if (priority > 3) await notify();            // Decision in code
+});
+```
+
+**Agent calls functions instead of figuring things out:**
+```typescript
+// RED FLAG: Agent is just a function caller
+"Use process_feedback to handle incoming messages"
+// vs.
+"When feedback comes in, decide importance, store it, notify if high"
+```
+
+**Artificial limits on agent capability:**
+```typescript
+// RED FLAG: Tool prevents agent from doing what users can do
+tool("read_file", async ({ path }) => {
+  if (!ALLOWED_PATHS.includes(path)) {
+    throw new Error("Not allowed to read this file");
+  }
+  return readFile(path);
+});
+```
+
+**Prompts that specify HOW instead of WHAT:**
+```markdown
+// RED FLAG: Micromanaging the agent
+When creating a summary:
+1. Use exactly 3 bullet points
+2. Each bullet must be under 20 words
+3. Format with em-dashes for sub-points
+4. Bold the first word of each bullet
+```
+</diagnosis>
+
+<refactoring_workflow>
+## Step-by-Step Refactoring
+
+**Step 1: Identify workflow tools**
+
+List all your tools. Mark any that:
+- Have business logic (categorize, calculate, decide)
+- Orchestrate multiple operations
+- Make decisions on behalf of the agent
+- Contain conditional logic (if/else based on content)
+
+**Step 2: Extract the primitives**
+
+For each workflow tool, identify the underlying primitives:
+
+| Workflow Tool | Hidden Primitives |
+|---------------|-------------------|
+| `process_feedback` | `store_item`, `send_message` |
+| `generate_report` | `read_file`, `write_file` |
+| `deploy_and_notify` | `git_push`, `send_message` |
+
+**Step 3: Move behavior to the prompt**
+
+Take the logic from your workflow tools and express it in natural language:
+
+```typescript
+// Before (in code):
+async function processFeedback(message) {
+  const priority = message.includes("crash") ? 5 :
+                   message.includes("bug") ? 4 : 3;
+  await store(message, priority);
+  if (priority >= 4) await notify();
+}
+```
+
+```markdown
+// After (in prompt):
+## Feedback Processing
+
+When someone shares feedback:
+1. Rate importance 1-5:
+   - 5: Crashes, data loss, security issues
+   - 4: Bug reports with clear reproduction steps
+   - 3: General suggestions, minor issues
+2. Store using store_item
+3. If importance >= 4, notify the team
+
+Use your judgment. Context matters more than keywords.
+```
+
+**Step 4: Simplify tools to primitives**
+
+```typescript
+// Before: 1 workflow tool
+tool("process_feedback", { message, category, priority }, ...complex logic...)
+
+// After: 2 primitive tools
+tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
+tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)
+```
+
+**Step 5: Remove artificial limits**
+
+```typescript
+// Before: Limited capability
+tool("read_file", async ({ path }) => {
+  if (!isAllowed(path)) throw new Error("Forbidden");
+  return readFile(path);
+});
+
+// After: Full capability
+tool("read_file", async ({ path }) => {
+  return readFile(path);  // Agent can read anything
+});
+// Use approval gates for WRITES, not artificial limits on READS
+```
+
+**Step 6: Test with outcomes, not procedures**
+
+Instead of testing "does it call the right function?", test "does it achieve the outcome?"
+
+```typescript
+// Before: Testing procedure
+expect(mockProcessFeedback).toHaveBeenCalledWith(...)
+
+// After: Testing outcome
+// Send feedback → Check it was stored with reasonable importance
+// Send high-priority feedback → Check notification was sent
+```
+</refactoring_workflow>
+
+<before_after>
+## Before/After Examples
+
+**Example 1: Feedback Processing**
+
+Before:
+```typescript
+tool("handle_feedback", async ({ message, author }) => {
+  const category = detectCategory(message);
+  const priority = calculatePriority(message, category);
+  const feedbackId = await db.feedback.insert({
+    id: generateId(),
+    author,
+    message,
+    category,
+    priority,
+    timestamp: new Date().toISOString(),
+  });
+
+  if (priority >= 4) {
+    await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
+  }
+
+  return { feedbackId, category, priority };
+});
+```
+
+After:
+```typescript
+// Simple storage primitive
+tool("store_feedback", async ({ item }) => {
+  await db.feedback.insert(item);
+  return { text: `Stored feedback ${item.id}` };
+});
+
+// Simple message primitive
+tool("send_message", async ({ channel, content }) => {
+  await discord.send(channel, content);
+  return { text: "Sent" };
+});
+```
+
+System prompt:
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Generate a unique ID
+2. Rate importance 1-5 based on impact and urgency
+3. Store using store_feedback with the full item
+4. If importance >= 4, send a notification to the team channel
+
+Importance guidelines:
+- 5: Critical (crashes, data loss, security)
+- 4: High (detailed bug reports, blocking issues)
+- 3: Medium (suggestions, minor bugs)
+- 2: Low (cosmetic, edge cases)
+- 1: Minimal (off-topic, duplicates)
+```
+
+**Example 2: Report Generation**
+
+Before:
+```typescript
+tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
+  const data = await fetchMetrics(startDate, endDate);
+  const summary = summarizeMetrics(data);
+  const charts = generateCharts(data);
+
+  if (format === "html") {
+    return renderHtmlReport(summary, charts);
+  } else if (format === "markdown") {
+    return renderMarkdownReport(summary, charts);
+  } else {
+    return renderPdfReport(summary, charts);
+  }
+});
+```
+
+After:
+```typescript
+tool("query_metrics", async ({ start, end }) => {
+  const data = await db.metrics.query({ start, end });
+  return { text: JSON.stringify(data, null, 2) };
+});
+
+tool("write_file", async ({ path, content }) => {
+  writeFileSync(path, content);
+  return { text: `Wrote ${path}` };
+});
+```
+
+System prompt:
+```markdown
+## Report Generation
+
+When asked to generate a report:
+1. Query the relevant metrics using query_metrics
+2. Analyze the data and identify key trends
+3. Create a clear, well-formatted report
+4. Write it using write_file in the appropriate format
+
+Use your judgment about format and structure. Make it useful.
+```
+</before_after>
+
+<common_challenges>
+## Common Refactoring Challenges
+
+**"But the agent might make mistakes!"**
+
+Yes, and you can iterate. Change the prompt to add guidance:
+```markdown
+// Before
+Rate importance 1-5.
+
+// After (if agent keeps rating too high)
+Rate importance 1-5. Be conservative—most feedback is 2-3.
+Only use 4-5 for truly blocking or critical issues.
+```
+
+**"The workflow is complex!"**
+
+Complex workflows can still be expressed in prompts. The agent is smart.
+```markdown
+When processing video feedback:
+1. Check if it's a Loom, YouTube, or direct link
+2. For YouTube, pass URL directly to video analysis
+3. For others, download first, then analyze
+4. Extract timestamped issues
+5. Rate based on issue density and severity
+```
+
+**"We need deterministic behavior!"**
+
+Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.
+
+Keep in code:
+- Security validation
+- Rate limiting
+- Audit logging
+- Exact format requirements
+
+Move to prompts:
+- Categorization decisions
+- Priority judgments
+- Content generation
+- Workflow orchestration
+
+**"What about testing?"**
+
+Test outcomes, not procedures:
+- "Given this input, does the agent achieve the right result?"
+- "Does stored feedback have reasonable importance ratings?"
+- "Are notifications sent for truly high-priority items?"
+</common_challenges>
+
+<checklist>
+## Refactoring Checklist
+
+Diagnosis:
+- [ ] Listed all tools with business logic
+- [ ] Identified artificial limits on agent capability
+- [ ] Found prompts that micromanage HOW
+
+Refactoring:
+- [ ] Extracted primitives from workflow tools
+- [ ] Moved business logic to system prompt
+- [ ] Removed artificial limits
+- [ ] Simplified tool inputs to data, not decisions
+
+Validation:
+- [ ] Agent achieves same outcomes with primitives
+- [ ] Behavior can be changed by editing prompts
+- [ ] New features could be added without new tools
+</checklist>
--- a/plugins/compound-engineering/skills/agent-native-architecture/references/self-modification.md
+++ b/plugins/compound-engineering/skills/agent-native-architecture/references/self-modification.md
@@ -0,0 +1,269 @@
+<overview>
+Self-modification is the advanced tier of agent native engineering: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
+
+This is the logical extension of "whatever the developer can do, the agent can do."
+</overview>
+
+<why_self_modification>
+## Why Self-Modification?
+
+Traditional software is static—it does what you wrote, nothing more. Self-modifying agents can:
+
+- **Fix their own bugs** - See an error, patch the code, restart
+- **Add new capabilities** - User asks for something new, agent implements it
+- **Evolve behavior** - Learn from feedback and adjust prompts
+- **Deploy themselves** - Push code, trigger builds, restart
+
+The agent becomes a living system that improves over time, not frozen code.
+</why_self_modification>
+
+<capabilities>
+## What Self-Modification Enables
+
+**Code modification:**
+- Read and understand source files
+- Write fixes and new features
+- Commit and push to version control
+- Trigger builds and verify they pass
+
+**Prompt evolution:**
+- Edit the system prompt based on feedback
+- Add new features as prompt sections
+- Refine judgment criteria that aren't working
+
+**Infrastructure control:**
+- Pull latest code from upstream
+- Merge from other branches/instances
+- Restart after changes
+- Roll back if something breaks
+
+**Site/output generation:**
+- Generate and maintain websites
+- Create documentation
+- Build dashboards from data
+</capabilities>
+
+<guardrails>
+## Required Guardrails
+
+Self-modification is powerful. It needs safety mechanisms.
+
+**Approval gates for code changes:**
+```typescript
+tool("write_file", async ({ path, content }) => {
+  if (isCodeFile(path)) {
+    // Store for approval, don't apply immediately
+    pendingChanges.set(path, content);
+    const diff = generateDiff(path, content);
+    return { text: `Requires approval:\n\n${diff}\n\nReply "yes" to apply.` };
+  }
+  // Non-code files apply immediately
+  writeFileSync(path, content);
+  return { text: `Wrote ${path}` };
+});
+```
+
+**Auto-commit before changes:**
+```typescript
+tool("self_deploy", async () => {
+  // Save current state first
+  runGit("stash");  // or commit uncommitted changes
+
+  // Then pull/merge
+  runGit("fetch origin");
+  runGit("merge origin/main --no-edit");
+
+  // Build and verify
+  runCommand("npm run build");
+
+  // Only then restart
+  scheduleRestart();
+});
+```
+
+**Build verification:**
+```typescript
+// Don't restart unless build passes
+try {
+  runCommand("npm run build", { timeout: 120000 });
+} catch (error) {
+  // Rollback the merge
+  runGit("merge --abort");
+  return { text: "Build failed, aborting deploy", isError: true };
+}
+```
+
+**Health checks after restart:**
+```typescript
+tool("health_check", async () => {
+  const uptime = process.uptime();
+  const buildValid = existsSync("dist/index.js");
+  const gitClean = !runGit("status --porcelain");
+
+  return {
+    text: JSON.stringify({
+      status: "healthy",
+      uptime: `${Math.floor(uptime / 60)}m`,
+      build: buildValid ? "valid" : "missing",
+      git: gitClean ? "clean" : "uncommitted changes",
+    }, null, 2),
+  };
+});
+```
+</guardrails>
+
+<git_architecture>
+## Git-Based Self-Modification
+
+Use git as the foundation for self-modification. It provides:
+- Version history (rollback capability)
+- Branching (experiment safely)
+- Merge (sync with other instances)
+- Push/pull (deploy and collaborate)
+
+**Essential git tools:**
+```typescript
+tool("status", "Show git status", {}, ...);
+tool("diff", "Show file changes", { path: z.string().optional() }, ...);
+tool("log", "Show commit history", { count: z.number() }, ...);
+tool("commit_code", "Commit code changes", { message: z.string() }, ...);
+tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...);
+tool("pull", "Pull from GitHub", { source: z.enum(["main", "instance"]) }, ...);
+tool("rollback", "Revert recent commits", { commits: z.number() }, ...);
+```
+
+**Multi-instance architecture:**
+```
+main                      # Shared code
+├── instance/bot-a       # Instance A's branch
+├── instance/bot-b       # Instance B's branch
+└── instance/bot-c       # Instance C's branch
+```
+
+Each instance can:
+- Pull updates from main
+- Push improvements back to main (via PR)
+- Sync features from other instances
+- Maintain instance-specific config
+</git_architecture>
+
+<prompt_evolution>
+## Self-Modifying Prompts
+
+The system prompt is a file the agent can read and write.
+
+```typescript
+// Agent can read its own prompt
+tool("read_file", ...);  // Can read src/prompts/system.md
+
+// Agent can propose changes
+tool("write_file", ...);  // Can write to src/prompts/system.md (with approval)
+```
+
+**System prompt as living document:**
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Acknowledge warmly
+2. Rate importance 1-5
+3. Store using feedback tools
+
+<!-- Note to self: Video walkthroughs should always be 4-5,
+     learned this from Dan's feedback on 2024-12-07 -->
+```
+
+The agent can:
+- Add notes to itself
+- Refine judgment criteria
+- Add new feature sections
+- Document edge cases it learned
+</prompt_evolution>
+
+<when_to_use>
+## When to Implement Self-Modification
+
+**Good candidates:**
+- Long-running autonomous agents
+- Agents that need to adapt to feedback
+- Systems where behavior evolution is valuable
+- Internal tools where rapid iteration matters
+
+**Not necessary for:**
+- Simple single-task agents
+- Highly regulated environments
+- Systems where behavior must be auditable
+- One-off or short-lived agents
+
+Start with a non-self-modifying prompt-native agent. Add self-modification when you need it.
+</when_to_use>
+
+<example_tools>
+## Complete Self-Modification Toolset
+
+```typescript
+const selfMcpServer = createSdkMcpServer({
+  name: "self",
+  version: "1.0.0",
+  tools: [
+    // FILE OPERATIONS
+    tool("read_file", "Read any project file", { path: z.string() }, ...),
+    tool("write_file", "Write a file (code requires approval)", { path, content }, ...),
+    tool("list_files", "List directory contents", { path: z.string() }, ...),
+    tool("search_code", "Search for patterns", { pattern: z.string() }, ...),
+
+    // APPROVAL WORKFLOW
+    tool("apply_pending", "Apply approved changes", {}, ...),
+    tool("get_pending", "Show pending changes", {}, ...),
+    tool("clear_pending", "Discard pending changes", {}, ...),
+
+    // RESTART
+    tool("restart", "Rebuild and restart", {}, ...),
+    tool("health_check", "Check if bot is healthy", {}, ...),
+  ],
+});
+
+const gitMcpServer = createSdkMcpServer({
+  name: "git",
+  version: "1.0.0",
+  tools: [
+    // STATUS
+    tool("status", "Show git status", {}, ...),
+    tool("diff", "Show changes", { path: z.string().optional() }, ...),
+    tool("log", "Show history", { count: z.number() }, ...),
+
+    // COMMIT & PUSH
+    tool("commit_code", "Commit code changes", { message: z.string() }, ...),
+    tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...),
+
+    // SYNC
+    tool("pull", "Pull from upstream", { source: z.enum(["main", "instance"]) }, ...),
+    tool("self_deploy", "Pull, build, restart", { source: z.enum(["main", "instance"]) }, ...),
+
+    // SAFETY
+    tool("rollback", "Revert commits", { commits: z.number() }, ...),
+    tool("health_check", "Detailed health report", {}, ...),
+  ],
+});
+```
+</example_tools>
+
+<checklist>
+## Self-Modification Checklist
+
+Before enabling self-modification:
+- [ ] Git-based version control set up
+- [ ] Approval gates for code changes
+- [ ] Build verification before restart
+- [ ] Rollback mechanism available
+- [ ] Health check endpoint
+- [ ] Instance identity configured
+
+When implementing:
+- [ ] Agent can read all project files
+- [ ] Agent can write files (with appropriate approval)
+- [ ] Agent can commit and push
+- [ ] Agent can pull updates
+- [ ] Agent can restart itself
+- [ ] Agent can roll back if needed
+</checklist>
--- a/plugins/compound-engineering/skills/agent-native-architecture/references/system-prompt-design.md
+++ b/plugins/compound-engineering/skills/agent-native-architecture/references/system-prompt-design.md
@@ -0,0 +1,250 @@
+<overview>
+How to write system prompts for prompt-native agents. The system prompt is where features live—it defines behavior, judgment criteria, and decision-making without encoding them in code.
+</overview>
+
+<principle name="features-in-prompts">
+## Features Are Prompt Sections
+
+Each feature is a section of the system prompt that tells the agent how to behave.
+
+**Traditional approach:** Feature = function in codebase
+```typescript
+function processFeedback(message) {
+  const category = categorize(message);
+  const priority = calculatePriority(message);
+  await store(message, category, priority);
+  if (priority > 3) await notify();
+}
+```
+
+**Prompt-native approach:** Feature = section in system prompt
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Read the message to understand what they're saying
+2. Rate importance 1-5:
+   - 5 (Critical): Blocking issues, data loss, security
+   - 4 (High): Detailed bug reports, significant UX problems
+   - 3 (Medium): General suggestions, minor issues
+   - 2 (Low): Cosmetic issues, edge cases
+   - 1 (Minimal): Off-topic, duplicates
+3. Store using feedback.store_feedback
+4. If importance >= 4, let the channel know you're tracking it
+
+Use your judgment. Context matters.
+```
+</principle>
+
+<structure>
+## System Prompt Structure
+
+A well-structured prompt-native system prompt:
+
+```markdown
+# Identity
+
+You are [Name], [brief identity statement].
+
+## Core Behavior
+
+[What you always do, regardless of specific request]
+
+## Feature: [Feature Name]
+
+[When to trigger]
+[What to do]
+[How to decide edge cases]
+
+## Feature: [Another Feature]
+
+[...]
+
+## Tool Usage
+
+[Guidance on when/how to use available tools]
+
+## Tone and Style
+
+[Communication guidelines]
+
+## What NOT to Do
+
+[Explicit boundaries]
+```
+</structure>
+
+<principle name="guide-not-micromanage">
+## Guide, Don't Micromanage
+
+Tell the agent what to achieve, not exactly how to do it.
+
+**Micromanaging (bad):**
+```markdown
+When creating a summary:
+1. Use exactly 3 bullet points
+2. Each bullet under 20 words
+3. Use em-dashes for sub-points
+4. Bold the first word of each bullet
+5. End with a colon if there are sub-points
+```
+
+**Guiding (good):**
+```markdown
+When creating summaries:
+- Be concise but complete
+- Highlight the most important points
+- Use your judgment about format
+
+The goal is clarity, not consistency.
+```
+
+Trust the agent's intelligence. It knows how to communicate.
+</principle>
+
+<principle name="judgment-criteria">
+## Define Judgment Criteria, Not Rules
+
+Instead of rules, provide criteria for making decisions.
+
+**Rules (rigid):**
+```markdown
+If the message contains "bug", set importance to 4.
+If the message contains "crash", set importance to 5.
+```
+
+**Judgment criteria (flexible):**
+```markdown
+## Importance Rating
+
+Rate importance based on:
+- **Impact**: How many users affected? How severe?
+- **Urgency**: Is this blocking? Time-sensitive?
+- **Actionability**: Can we actually fix this?
+- **Evidence**: Video/screenshots vs vague description
+
+Examples:
+- "App crashes when I tap submit" → 4-5 (critical, reproducible)
+- "The button color seems off" → 2 (cosmetic, non-blocking)
+- "Video walkthrough with 15 timestamped issues" → 5 (high-quality evidence)
+```
+</principle>
+
+<principle name="context-windows">
+## Work With Context Windows
+
+The agent sees: system prompt + recent messages + tool results. Design for this.
+
+**Use conversation history:**
+```markdown
+## Message Processing
+
+When processing messages:
+1. Check if this relates to recent conversation
+2. If someone is continuing a previous thread, maintain context
+3. Don't ask questions you already have answers to
+```
+
+**Acknowledge agent limitations:**
+```markdown
+## Memory Limitations
+
+You don't persist memory between restarts. Use the memory server:
+- Before responding, check memory.recall for relevant context
+- After important decisions, use memory.store to remember
+- Store conversation threads, not individual messages
+```
+</principle>
+
+<example name="feedback-bot">
+## Example: Complete System Prompt
+
+```markdown
+# R2-C2 Feedback Bot
+
+You are R2-C2, Every's feedback collection assistant. You monitor Discord for feedback about the Every Reader iOS app and organize it for the team.
+
+## Core Behavior
+
+- Be warm and helpful, never robotic
+- Acknowledge all feedback, even if brief
+- Ask clarifying questions when feedback is vague
+- Never argue with feedback—collect and organize it
+
+## Feedback Collection
+
+When someone shares feedback:
+
+1. **Acknowledge** warmly: "Thanks for this!" or "Good catch!"
+2. **Clarify** if needed: "Can you tell me more about when this happens?"
+3. **Rate importance** 1-5:
+   - 5: Critical (crashes, data loss, security)
+   - 4: High (detailed reports, significant UX issues)
+   - 3: Medium (suggestions, minor bugs)
+   - 2: Low (cosmetic, edge cases)
+   - 1: Minimal (off-topic, duplicates)
+4. **Store** using feedback.store_feedback
+5. **Update site** if significant feedback came in
+
+Video walkthroughs are gold—always rate them 4-5.
+
+## Site Management
+
+You maintain a public feedback site. When feedback accumulates:
+
+1. Sync data to site/public/content/feedback.json
+2. Update status counts and organization
+3. Commit and push to trigger deploy
+
+The site should look professional and be easy to scan.
+
+## Message Deduplication
+
+Before processing any message:
+1. Check memory.recall(key: "processed_{messageId}")
+2. Skip if already processed
+3. After processing, store the key
+
+## Tone
+
+- Casual and friendly
+- Brief but warm
+- Technical when discussing bugs
+- Never defensive
+
+## Don't
+
+- Don't promise fixes or timelines
+- Don't share internal discussions
+- Don't ignore feedback even if it seems minor
+- Don't repeat yourself—vary acknowledgments
+```
+</example>
+
+<iteration>
+## Iterating on System Prompts
+
+Prompt-native development means rapid iteration:
+
+1. **Observe** agent behavior in production
+2. **Identify** gaps: "It's not rating video feedback high enough"
+3. **Add guidance**: "Video walkthroughs are gold—always rate them 4-5"
+4. **Deploy** (just edit the prompt file)
+5. **Repeat**
+
+No code changes. No recompilation. Just prose.
+</iteration>
+
+<checklist>
+## System Prompt Checklist
+
+- [ ] Clear identity statement
+- [ ] Core behaviors that always apply
+- [ ] Features as separate sections
+- [ ] Judgment criteria instead of rigid rules
+- [ ] Examples for ambiguous cases
+- [ ] Explicit boundaries (what NOT to do)
+- [ ] Tone guidance
+- [ ] Tool usage guidance (when to use each)
+- [ ] Memory/context handling
+</checklist>