Add agent-native-architecture skill

New skill teaching prompt-native development patterns: - Features defined in prompts, not code - Tools as primitives that enable capability - "Whatever the user can do, the agent can do" - Self-modification patterns (advanced tier) - Refactoring guide for existing codebases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-08 16:23:31 -08:00
parent 9341872823
commit 27d07d068c
6 changed files with 1568 additions and 0 deletions
--- a/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md
@@ -0,0 +1,201 @@
 ---
 name: agent-native-architecture
 description: Build AI agents using prompt-native architecture where features are defined in prompts, not code. Use when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy.
 ---
 <essential_principles>
 ## The Prompt-Native Philosophy
 Agent native engineering inverts traditional software architecture. Instead of writing code that the agent executes, you define outcomes in prompts and let the agent figure out HOW to achieve them.
 ### The Foundational Principle
 **Whatever the user can do, the agent can do. Many things the developer can do, the agent can do.**
 Don't artificially limit the agent. If a user could read files, write code, browse the web, deploy an app—the agent should be able to do those things too. The agent figures out HOW to achieve an outcome; it doesn't just call your pre-written functions.
 ### Features Are Prompts
 Each feature is a prompt that defines an outcome and gives the agent the tools it needs. The agent then figures out how to accomplish it.
 **Traditional:** Feature = function in codebase that agent calls
 **Prompt-native:** Feature = prompt defining desired outcome + primitive tools
 The agent doesn't execute your code. It uses primitives to achieve outcomes you describe.
 ### Tools Provide Capability, Not Behavior
 Tools should be primitives that enable capability. The prompt defines what to do with that capability.
 **Wrong:** `generate_dashboard(data, layout, filters)` — agent executes your workflow
 **Right:** `read_file`, `write_file`, `list_files` — agent figures out how to build a dashboard
 Pure primitives are better, but domain primitives (like `store_feedback`) are OK if they don't encode logic—just storage/retrieval.
 ### The Development Lifecycle
 1. **Start in the prompt** - New features begin as natural language defining outcomes
 2. **Iterate rapidly** - Change behavior by editing prose, not refactoring code
 3. **Graduate when stable** - Harden to code when requirements stabilize AND speed/reliability matter
 4. **Many features stay as prompts** - Not everything needs to become code
 ### Self-Modification (Advanced)
 The advanced tier: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
 When implementing:
 - Approval gates for code changes
 - Auto-commit before modifications (rollback capability)
 - Health checks after changes
 - Build verification before restart
 ### When NOT to Use This Approach
 - **High-frequency operations** - thousands of calls per second
 - **Deterministic requirements** - exact same output every time
 - **Cost-sensitive scenarios** - when API costs would be prohibitive
 - **High security** - though this is overblown for most apps
 </essential_principles>
 <intake>
 What aspect of agent native architecture do you need help with?
 1. **Design architecture** - Plan a new prompt-native agent system
 2. **Create MCP tools** - Build primitive tools following the philosophy
 3. **Write system prompts** - Define agent behavior in prompts
 4. **Self-modification** - Enable agents to safely evolve themselves
 5. **Review/refactor** - Make existing code more prompt-native
 **Wait for response before proceeding.**
 </intake>
 <routing>
 | Response | Action |
 |----------|--------|
 | 1, "design", "architecture", "plan" | Read references/architecture-patterns.md |
 | 2, "tool", "mcp", "primitive" | Read references/mcp-tool-design.md |
 | 3, "prompt", "system prompt", "behavior" | Read references/system-prompt-design.md |
 | 4, "self-modify", "evolve", "git" | Read references/self-modification.md |
 | 5, "review", "refactor", "existing" | Read references/refactoring-to-prompt-native.md |
 **After reading the reference, apply those patterns to the user's specific context.**
 </routing>
 <quick_start>
 Build a prompt-native agent in three steps:
 **Step 1: Define primitive tools**
 ```typescript
 const tools = [
  tool("read_file", "Read any file", { path: z.string() }, ...),
  tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
  tool("list_files", "List directory", { path: z.string() }, ...),
 ];
 ```
 **Step 2: Write behavior in the system prompt**
 ```markdown
 ## Your Responsibilities
 When asked to organize content, you should:
 1. Read existing files to understand the structure
 2. Analyze what organization makes sense
 3. Create appropriate pages using write_file
 4. Use your judgment about layout and formatting
 You decide the structure. Make it good.
 ```
 **Step 3: Let the agent work**
 ```typescript
 query({
  prompt: userMessage,
  options: {
    systemPrompt,
    mcpServers: { files: fileServer },
    permissionMode: "acceptEdits",
  }
 });
 ```
 </quick_start>
 <reference_index>
 ## Domain Knowledge
 All references in `references/`:
 **Architecture:** architecture-patterns.md
 **Tool Design:** mcp-tool-design.md
 **Prompts:** system-prompt-design.md
 **Self-Modification:** self-modification.md
 **Refactoring:** refactoring-to-prompt-native.md
 </reference_index>
 <anti_patterns>
 ## What NOT to Do
 **THE CARDINAL SIN: Agent executes your code instead of figuring things out**
 This is the most common mistake. You fall back into writing workflow code and having the agent call it, instead of defining outcomes and letting the agent figure out HOW.
 ```typescript
 // WRONG - You wrote the workflow, agent just executes it
 tool("process_feedback", async ({ message }) => {
  const category = categorize(message);      // Your code
  const priority = calculatePriority(message); // Your code
  await store(message, category, priority);   // Your code
  if (priority > 3) await notify();           // Your code
 });
 // RIGHT - Agent figures out how to process feedback
 tool("store_item", { key, value }, ...);  // Primitive
 tool("send_message", { channel, content }, ...);  // Primitive
 // Prompt says: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
 ```
 **Don't artificially limit what the agent can do**
 If a user could do it, the agent should be able to do it.
 ```typescript
 // WRONG - limiting agent capabilities
 tool("read_approved_files", { path }, async ({ path }) => {
  if (!ALLOWED_PATHS.includes(path)) throw new Error("Not allowed");
  return readFile(path);
 });
 // RIGHT - give full capability, use guardrails appropriately
 tool("read_file", { path }, ...);  // Agent can read anything
 // Use approval gates for writes, not artificial limits on reads
 ```
 **Don't encode decisions in tools**
 ```typescript
 // Wrong - tool decides format
 tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) }, ...)
 // Right - agent decides format via prompt
 tool("write_file", ...) // Agent chooses what to write
 ```
 **Don't over-specify in prompts**
 ```markdown
 // Wrong - micromanaging the HOW
 When creating a summary, use exactly 3 bullet points,
 each under 20 words, formatted with em-dashes...
 // Right - define outcome, trust intelligence
 Create clear, useful summaries. Use your judgment.
 ```
 </anti_patterns>
 <success_criteria>
 You've built a prompt-native agent when:
 - [ ] The agent figures out HOW to achieve outcomes, not just calls your functions
 - [ ] Whatever a user could do, the agent can do (no artificial limits)
 - [ ] Features are prompts that define outcomes, not code that defines workflows
 - [ ] Tools are primitives (read, write, store, call API) that enable capability
 - [ ] Changing behavior means editing prose, not refactoring code
 - [ ] The agent can surprise you with clever approaches you didn't anticipate
 - [ ] You could add a new feature by writing a new prompt section, not new code
 </success_criteria>
--- a/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md
@@ -0,0 +1,215 @@
 <overview>
 Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
 </overview>
 <pattern name="event-driven-agent">
 ## Event-Driven Agent Architecture
 The agent runs as a long-lived process that responds to events. Events become prompts.
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                    Agent Loop                                │
 ├─────────────────────────────────────────────────────────────┤
 │  Event Source → Agent (Claude) → Tool Calls → Response      │
 └─────────────────────────────────────────────────────────────┘
                          │
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌─────────┐    ┌──────────┐    ┌───────────┐
    │ Content │    │   Self   │    │   Data    │
    │  Tools  │    │  Tools   │    │   Tools   │
    └─────────┘    └──────────┘    └───────────┘
    (write_file)   (read_source)   (store_item)
                   (restart)       (list_items)
 ```
 **Key characteristics:**
 - Events (messages, webhooks, timers) trigger agent turns
 - Agent decides how to respond based on system prompt
 - Tools are primitives for IO, not business logic
 - State persists between events via data tools
 **Example: Discord feedback bot**
 ```typescript
 // Event source
 client.on("messageCreate", (message) => {
  if (!message.author.bot) {
    runAgent({
      userMessage: `New message from ${message.author}: "${message.content}"`,
      channelId: message.channelId,
    });
  }
 });
 // System prompt defines behavior
 const systemPrompt = `
 When someone shares feedback:
 1. Acknowledge their feedback warmly
 2. Ask clarifying questions if needed
 3. Store it using the feedback tools
 4. Update the feedback site
 Use your judgment about importance and categorization.
 `;
 ```
 </pattern>
 <pattern name="two-layer-git">
 ## Two-Layer Git Architecture
 For self-modifying agents, separate code (shared) from data (instance-specific).
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                     GitHub (shared repo)                     │
 │  - src/           (agent code)                              │
 │  - site/          (web interface)                           │
 │  - package.json   (dependencies)                            │
 │  - .gitignore     (excludes data/, logs/)                   │
 └─────────────────────────────────────────────────────────────┘
                          │
                     git clone
                          │
                          ▼
 ┌─────────────────────────────────────────────────────────────┐
 │                  Instance (Server)                           │
 │                                                              │
 │  FROM GITHUB (tracked):                                      │
 │  - src/           → pushed back on code changes             │
 │  - site/          → pushed, triggers deployment             │
 │                                                              │
 │  LOCAL ONLY (untracked):                                     │
 │  - data/          → instance-specific storage               │
 │  - logs/          → runtime logs                            │
 │  - .env           → secrets                                 │
 └─────────────────────────────────────────────────────────────┘
 ```
 **Why this works:**
 - Code and site are version controlled (GitHub)
 - Raw data stays local (instance-specific)
 - Site is generated from data, so reproducible
 - Automatic rollback via git history
 </pattern>
 <pattern name="multi-instance">
 ## Multi-Instance Branching
 Each agent instance gets its own branch while sharing core code.
 ```
 main                        # Shared features, bug fixes
 ├── instance/feedback-bot   # Every Reader feedback bot
 ├── instance/support-bot    # Customer support bot
 └── instance/research-bot   # Research assistant
 ```
 **Change flow:**
 | Change Type | Work On | Then |
 |-------------|---------|------|
 | Core features | main | Merge to instance branches |
 | Bug fixes | main | Merge to instance branches |
 | Instance config | instance branch | Done |
 | Instance data | instance branch | Done |
 **Sync tools:**
 ```typescript
 tool("self_deploy", "Pull latest from main, rebuild, restart", ...)
 tool("sync_from_instance", "Merge from another instance", ...)
 tool("propose_to_main", "Create PR to share improvements", ...)
 ```
 </pattern>
 <pattern name="site-as-output">
 ## Site as Agent Output
 The agent generates and maintains a website as a natural output, not through specialized site tools.
 ```
 Discord Message
      ↓
 Agent processes it, extracts insights
      ↓
 Agent decides what site updates are needed
      ↓
 Agent writes files using write_file primitive
      ↓
 Git commit + push triggers deployment
      ↓
 Site updates automatically
 ```
 **Key insight:** Don't build site generation tools. Give the agent file tools and teach it in the prompt how to create good sites.
 ```markdown
 ## Site Management
 You maintain a public feedback site. When feedback comes in:
 1. Use write_file to update site/public/content/feedback.json
 2. If the site's React components need improvement, modify them
 3. Commit changes and push to trigger Vercel deploy
 The site should be:
 - Clean, modern dashboard aesthetic
 - Clear visual hierarchy
 - Status organization (Inbox, Active, Done)
 You decide the structure. Make it good.
 ```
 </pattern>
 <pattern name="approval-gates">
 ## Approval Gates Pattern
 Separate "propose" from "apply" for dangerous operations.
 ```typescript
 // Pending changes stored separately
 const pendingChanges = new Map<string, string>();
 tool("write_file", async ({ path, content }) => {
  if (requiresApproval(path)) {
    // Store for approval
    pendingChanges.set(path, content);
    const diff = generateDiff(path, content);
    return {
      text: `Change requires approval.\n\n${diff}\n\nReply "yes" to apply.`
    };
  } else {
    // Apply immediately
    writeFileSync(path, content);
    return { text: `Wrote ${path}` };
  }
 });
 tool("apply_pending", async () => {
  for (const [path, content] of pendingChanges) {
    writeFileSync(path, content);
  }
  pendingChanges.clear();
  return { text: "Applied all pending changes" };
 });
 ```
 **What requires approval:**
 - src/*.ts (agent code)
 - package.json (dependencies)
 - system prompt changes
 **What doesn't:**
 - data/* (instance data)
 - site/* (generated content)
 - docs/* (documentation)
 </pattern>
 <design_questions>
 ## Questions to Ask When Designing
 1. **What events trigger agent turns?** (messages, webhooks, timers, user requests)
 2. **What primitives does the agent need?** (read, write, call API, restart)
 3. **What decisions should the agent make?** (format, structure, priority, action)
 4. **What decisions should be hardcoded?** (security boundaries, approval requirements)
 5. **How does the agent verify its work?** (health checks, build verification)
 6. **How does the agent recover from mistakes?** (git rollback, approval gates)
 </design_questions>
--- a/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md
@@ -0,0 +1,316 @@
 <overview>
 How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.
 **Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.
 </overview>
 <principle name="primitives-not-workflows">
 ## Tools Are Primitives, Not Workflows
 **Wrong approach:** Tools that encode business logic
 ```typescript
 tool("process_feedback", {
  feedback: z.string(),
  category: z.enum(["bug", "feature", "question"]),
  priority: z.enum(["low", "medium", "high"]),
 }, async ({ feedback, category, priority }) => {
  // Tool decides how to process
  const processed = categorize(feedback);
  const stored = await saveToDatabase(processed);
  const notification = await notify(priority);
  return { processed, stored, notification };
 });
 ```
 **Right approach:** Primitives that enable any workflow
 ```typescript
 tool("store_item", {
  key: z.string(),
  value: z.any(),
 }, async ({ key, value }) => {
  await db.set(key, value);
  return { text: `Stored ${key}` };
 });
 tool("send_message", {
  channel: z.string(),
  content: z.string(),
 }, async ({ channel, content }) => {
  await messenger.send(channel, content);
  return { text: "Sent" };
 });
 ```
 The agent decides categorization, priority, and when to notify based on the system prompt.
 </principle>
 <principle name="descriptive-names">
 ## Tools Should Have Descriptive, Primitive Names
 Names should describe the capability, not the use case:
 | Wrong | Right |
 |-------|-------|
 | `process_user_feedback` | `store_item` |
 | `create_feedback_summary` | `write_file` |
 | `send_notification` | `send_message` |
 | `deploy_to_production` | `git_push` |
 The prompt tells the agent *when* to use primitives. The tool just provides *capability*.
 </principle>
 <principle name="simple-inputs">
 ## Inputs Should Be Simple
 Tools accept data. They don't accept decisions.
 **Wrong:** Tool accepts decisions
 ```typescript
 tool("format_content", {
  content: z.string(),
  format: z.enum(["markdown", "html", "json"]),
  style: z.enum(["formal", "casual", "technical"]),
 }, ...)
 ```
 **Right:** Tool accepts data, agent decides format
 ```typescript
 tool("write_file", {
  path: z.string(),
  content: z.string(),
 }, ...)
 // Agent decides to write index.html with HTML content, or data.json with JSON
 ```
 </principle>
 <principle name="rich-outputs">
 ## Outputs Should Be Rich
 Return enough information for the agent to verify and iterate.
 **Wrong:** Minimal output
 ```typescript
 async ({ key }) => {
  await db.delete(key);
  return { text: "Deleted" };
 }
 ```
 **Right:** Rich output
 ```typescript
 async ({ key }) => {
  const existed = await db.has(key);
  if (!existed) {
    return { text: `Key ${key} did not exist` };
  }
  await db.delete(key);
  return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
 }
 ```
 </principle>
 <design_template>
 ## Tool Design Template
 ```typescript
 import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
 import { z } from "zod";
 export const serverName = createSdkMcpServer({
  name: "server-name",
  version: "1.0.0",
  tools: [
    // READ operations
    tool(
      "read_item",
      "Read an item by key",
      { key: z.string().describe("Item key") },
      async ({ key }) => {
        const item = await storage.get(key);
        return {
          content: [{
            type: "text",
            text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
          }],
          isError: !item,
        };
      }
    ),
    tool(
      "list_items",
      "List all items, optionally filtered",
      {
        prefix: z.string().optional().describe("Filter by key prefix"),
        limit: z.number().default(100).describe("Max items"),
      },
      async ({ prefix, limit }) => {
        const items = await storage.list({ prefix, limit });
        return {
          content: [{
            type: "text",
            text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
          }],
        };
      }
    ),
    // WRITE operations
    tool(
      "store_item",
      "Store an item",
      {
        key: z.string().describe("Item key"),
        value: z.any().describe("Item data"),
      },
      async ({ key, value }) => {
        await storage.set(key, value);
        return {
          content: [{ type: "text", text: `Stored ${key}` }],
        };
      }
    ),
    tool(
      "delete_item",
      "Delete an item",
      { key: z.string().describe("Item key") },
      async ({ key }) => {
        const existed = await storage.delete(key);
        return {
          content: [{
            type: "text",
            text: existed ? `Deleted ${key}` : `${key} did not exist`,
          }],
        };
      }
    ),
    // EXTERNAL operations
    tool(
      "call_api",
      "Make an HTTP request",
      {
        url: z.string().url(),
        method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
        body: z.any().optional(),
      },
      async ({ url, method, body }) => {
        const response = await fetch(url, { method, body: JSON.stringify(body) });
        const text = await response.text();
        return {
          content: [{
            type: "text",
            text: `${response.status} ${response.statusText}\n\n${text}`,
          }],
          isError: !response.ok,
        };
      }
    ),
  ],
 });
 ```
 </design_template>
 <example name="feedback-server">
 ## Example: Feedback Storage Server
 This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.
 ```typescript
 export const feedbackMcpServer = createSdkMcpServer({
  name: "feedback",
  version: "1.0.0",
  tools: [
    tool(
      "store_feedback",
      "Store a feedback item",
      {
        item: z.object({
          id: z.string(),
          author: z.string(),
          content: z.string(),
          importance: z.number().min(1).max(5),
          timestamp: z.string(),
          status: z.string().optional(),
          urls: z.array(z.string()).optional(),
          metadata: z.any().optional(),
        }).describe("Feedback item"),
      },
      async ({ item }) => {
        await db.feedback.insert(item);
        return {
          content: [{
            type: "text",
            text: `Stored feedback ${item.id} from ${item.author}`,
          }],
        };
      }
    ),
    tool(
      "list_feedback",
      "List feedback items",
      {
        limit: z.number().default(50),
        status: z.string().optional(),
      },
      async ({ limit, status }) => {
        const items = await db.feedback.list({ limit, status });
        return {
          content: [{
            type: "text",
            text: JSON.stringify(items, null, 2),
          }],
        };
      }
    ),
    tool(
      "update_feedback",
      "Update a feedback item",
      {
        id: z.string(),
        updates: z.object({
          status: z.string().optional(),
          importance: z.number().optional(),
          metadata: z.any().optional(),
        }),
      },
      async ({ id, updates }) => {
        await db.feedback.update(id, updates);
        return {
          content: [{ type: "text", text: `Updated ${id}` }],
        };
      }
    ),
  ],
 });
 ```
 The system prompt then tells the agent *how* to use these primitives:
 ```markdown
 ## Feedback Processing
 When someone shares feedback:
 1. Extract author, content, and any URLs
 2. Rate importance 1-5 based on actionability
 3. Store using feedback.store_feedback
 4. If high importance (4-5), notify the channel
 Use your judgment about importance ratings.
 ```
 </example>
 <checklist>
 ## MCP Tool Design Checklist
 - [ ] Tool names describe capability, not use case
 - [ ] Inputs are data, not decisions
 - [ ] Outputs are rich (enough for agent to verify)
 - [ ] CRUD operations are separate tools (not one mega-tool)
 - [ ] No business logic in tool implementations
 - [ ] Error states clearly communicated via `isError`
 - [ ] Descriptions explain what the tool does, not when to use it
 </checklist>
--- a/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md
@@ -0,0 +1,317 @@
 <overview>
 How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives.
 </overview>
 <diagnosis>
 ## Diagnosing Non-Prompt-Native Code
 Signs your agent isn't prompt-native:
 **Tools that encode workflows:**
 ```typescript
 // RED FLAG: Tool contains business logic
 tool("process_feedback", async ({ message }) => {
  const category = categorize(message);        // Logic in code
  const priority = calculatePriority(message); // Logic in code
  await store(message, category, priority);    // Orchestration in code
  if (priority > 3) await notify();            // Decision in code
 });
 ```
 **Agent calls functions instead of figuring things out:**
 ```typescript
 // RED FLAG: Agent is just a function caller
 "Use process_feedback to handle incoming messages"
 // vs.
 "When feedback comes in, decide importance, store it, notify if high"
 ```
 **Artificial limits on agent capability:**
 ```typescript
 // RED FLAG: Tool prevents agent from doing what users can do
 tool("read_file", async ({ path }) => {
  if (!ALLOWED_PATHS.includes(path)) {
    throw new Error("Not allowed to read this file");
  }
  return readFile(path);
 });
 ```
 **Prompts that specify HOW instead of WHAT:**
 ```markdown
 // RED FLAG: Micromanaging the agent
 When creating a summary:
 1. Use exactly 3 bullet points
 2. Each bullet must be under 20 words
 3. Format with em-dashes for sub-points
 4. Bold the first word of each bullet
 ```
 </diagnosis>
 <refactoring_workflow>
 ## Step-by-Step Refactoring
 **Step 1: Identify workflow tools**
 List all your tools. Mark any that:
 - Have business logic (categorize, calculate, decide)
 - Orchestrate multiple operations
 - Make decisions on behalf of the agent
 - Contain conditional logic (if/else based on content)
 **Step 2: Extract the primitives**
 For each workflow tool, identify the underlying primitives:
 | Workflow Tool | Hidden Primitives |
 |---------------|-------------------|
 | `process_feedback` | `store_item`, `send_message` |
 | `generate_report` | `read_file`, `write_file` |
 | `deploy_and_notify` | `git_push`, `send_message` |
 **Step 3: Move behavior to the prompt**
 Take the logic from your workflow tools and express it in natural language:
 ```typescript
 // Before (in code):
 async function processFeedback(message) {
  const priority = message.includes("crash") ? 5 :
                   message.includes("bug") ? 4 : 3;
  await store(message, priority);
  if (priority >= 4) await notify();
 }
 ```
 ```markdown
 // After (in prompt):
 ## Feedback Processing
 When someone shares feedback:
 1. Rate importance 1-5:
   - 5: Crashes, data loss, security issues
   - 4: Bug reports with clear reproduction steps
   - 3: General suggestions, minor issues
 2. Store using store_item
 3. If importance >= 4, notify the team
 Use your judgment. Context matters more than keywords.
 ```
 **Step 4: Simplify tools to primitives**
 ```typescript
 // Before: 1 workflow tool
 tool("process_feedback", { message, category, priority }, ...complex logic...)
 // After: 2 primitive tools
 tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
 tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)
 ```
 **Step 5: Remove artificial limits**
 ```typescript
 // Before: Limited capability
 tool("read_file", async ({ path }) => {
  if (!isAllowed(path)) throw new Error("Forbidden");
  return readFile(path);
 });
 // After: Full capability
 tool("read_file", async ({ path }) => {
  return readFile(path);  // Agent can read anything
 });
 // Use approval gates for WRITES, not artificial limits on READS
 ```
 **Step 6: Test with outcomes, not procedures**
 Instead of testing "does it call the right function?", test "does it achieve the outcome?"
 ```typescript
 // Before: Testing procedure
 expect(mockProcessFeedback).toHaveBeenCalledWith(...)
 // After: Testing outcome
 // Send feedback → Check it was stored with reasonable importance
 // Send high-priority feedback → Check notification was sent
 ```
 </refactoring_workflow>
 <before_after>
 ## Before/After Examples
 **Example 1: Feedback Processing**
 Before:
 ```typescript
 tool("handle_feedback", async ({ message, author }) => {
  const category = detectCategory(message);
  const priority = calculatePriority(message, category);
  const feedbackId = await db.feedback.insert({
    id: generateId(),
    author,
    message,
    category,
    priority,
    timestamp: new Date().toISOString(),
  });
  if (priority >= 4) {
    await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
  }
  return { feedbackId, category, priority };
 });
 ```
 After:
 ```typescript
 // Simple storage primitive
 tool("store_feedback", async ({ item }) => {
  await db.feedback.insert(item);
  return { text: `Stored feedback ${item.id}` };
 });
 // Simple message primitive
 tool("send_message", async ({ channel, content }) => {
  await discord.send(channel, content);
  return { text: "Sent" };
 });
 ```
 System prompt:
 ```markdown
 ## Feedback Processing
 When someone shares feedback:
 1. Generate a unique ID
 2. Rate importance 1-5 based on impact and urgency
 3. Store using store_feedback with the full item
 4. If importance >= 4, send a notification to the team channel
 Importance guidelines:
 - 5: Critical (crashes, data loss, security)
 - 4: High (detailed bug reports, blocking issues)
 - 3: Medium (suggestions, minor bugs)
 - 2: Low (cosmetic, edge cases)
 - 1: Minimal (off-topic, duplicates)
 ```
 **Example 2: Report Generation**
 Before:
 ```typescript
 tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
  const data = await fetchMetrics(startDate, endDate);
  const summary = summarizeMetrics(data);
  const charts = generateCharts(data);
  if (format === "html") {
    return renderHtmlReport(summary, charts);
  } else if (format === "markdown") {
    return renderMarkdownReport(summary, charts);
  } else {
    return renderPdfReport(summary, charts);
  }
 });
 ```
 After:
 ```typescript
 tool("query_metrics", async ({ start, end }) => {
  const data = await db.metrics.query({ start, end });
  return { text: JSON.stringify(data, null, 2) };
 });
 tool("write_file", async ({ path, content }) => {
  writeFileSync(path, content);
  return { text: `Wrote ${path}` };
 });
 ```
 System prompt:
 ```markdown
 ## Report Generation
 When asked to generate a report:
 1. Query the relevant metrics using query_metrics
 2. Analyze the data and identify key trends
 3. Create a clear, well-formatted report
 4. Write it using write_file in the appropriate format
 Use your judgment about format and structure. Make it useful.
 ```
 </before_after>
 <common_challenges>
 ## Common Refactoring Challenges
 **"But the agent might make mistakes!"**
 Yes, and you can iterate. Change the prompt to add guidance:
 ```markdown
 // Before
 Rate importance 1-5.
 // After (if agent keeps rating too high)
 Rate importance 1-5. Be conservative—most feedback is 2-3.
 Only use 4-5 for truly blocking or critical issues.
 ```
 **"The workflow is complex!"**
 Complex workflows can still be expressed in prompts. The agent is smart.
 ```markdown
 When processing video feedback:
 1. Check if it's a Loom, YouTube, or direct link
 2. For YouTube, pass URL directly to video analysis
 3. For others, download first, then analyze
 4. Extract timestamped issues
 5. Rate based on issue density and severity
 ```
 **"We need deterministic behavior!"**
 Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.
 Keep in code:
 - Security validation
 - Rate limiting
 - Audit logging
 - Exact format requirements
 Move to prompts:
 - Categorization decisions
 - Priority judgments
 - Content generation
 - Workflow orchestration
 **"What about testing?"**
 Test outcomes, not procedures:
 - "Given this input, does the agent achieve the right result?"
 - "Does stored feedback have reasonable importance ratings?"
 - "Are notifications sent for truly high-priority items?"
 </common_challenges>
 <checklist>
 ## Refactoring Checklist
 Diagnosis:
 - [ ] Listed all tools with business logic
 - [ ] Identified artificial limits on agent capability
 - [ ] Found prompts that micromanage HOW
 Refactoring:
 - [ ] Extracted primitives from workflow tools
 - [ ] Moved business logic to system prompt
 - [ ] Removed artificial limits
 - [ ] Simplified tool inputs to data, not decisions
 Validation:
 - [ ] Agent achieves same outcomes with primitives
 - [ ] Behavior can be changed by editing prompts
 - [ ] New features could be added without new tools
 </checklist>
--- a/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md
@@ -0,0 +1,269 @@
 <overview>
 Self-modification is the advanced tier of agent native engineering: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
 This is the logical extension of "whatever the developer can do, the agent can do."
 </overview>
 <why_self_modification>
 ## Why Self-Modification?
 Traditional software is static—it does what you wrote, nothing more. Self-modifying agents can:
 - **Fix their own bugs** - See an error, patch the code, restart
 - **Add new capabilities** - User asks for something new, agent implements it
 - **Evolve behavior** - Learn from feedback and adjust prompts
 - **Deploy themselves** - Push code, trigger builds, restart
 The agent becomes a living system that improves over time, not frozen code.
 </why_self_modification>
 <capabilities>
 ## What Self-Modification Enables
 **Code modification:**
 - Read and understand source files
 - Write fixes and new features
 - Commit and push to version control
 - Trigger builds and verify they pass
 **Prompt evolution:**
 - Edit the system prompt based on feedback
 - Add new features as prompt sections
 - Refine judgment criteria that aren't working
 **Infrastructure control:**
 - Pull latest code from upstream
 - Merge from other branches/instances
 - Restart after changes
 - Roll back if something breaks
 **Site/output generation:**
 - Generate and maintain websites
 - Create documentation
 - Build dashboards from data
 </capabilities>
 <guardrails>
 ## Required Guardrails
 Self-modification is powerful. It needs safety mechanisms.
 **Approval gates for code changes:**
 ```typescript
 tool("write_file", async ({ path, content }) => {
  if (isCodeFile(path)) {
    // Store for approval, don't apply immediately
    pendingChanges.set(path, content);
    const diff = generateDiff(path, content);
    return { text: `Requires approval:\n\n${diff}\n\nReply "yes" to apply.` };
  }
  // Non-code files apply immediately
  writeFileSync(path, content);
  return { text: `Wrote ${path}` };
 });
 ```
 **Auto-commit before changes:**
 ```typescript
 tool("self_deploy", async () => {
  // Save current state first
  runGit("stash");  // or commit uncommitted changes
  // Then pull/merge
  runGit("fetch origin");
  runGit("merge origin/main --no-edit");
  // Build and verify
  runCommand("npm run build");
  // Only then restart
  scheduleRestart();
 });
 ```
 **Build verification:**
 ```typescript
 // Don't restart unless build passes
 try {
  runCommand("npm run build", { timeout: 120000 });
 } catch (error) {
  // Rollback the merge
  runGit("merge --abort");
  return { text: "Build failed, aborting deploy", isError: true };
 }
 ```
 **Health checks after restart:**
 ```typescript
 tool("health_check", async () => {
  const uptime = process.uptime();
  const buildValid = existsSync("dist/index.js");
  const gitClean = !runGit("status --porcelain");
  return {
    text: JSON.stringify({
      status: "healthy",
      uptime: `${Math.floor(uptime / 60)}m`,
      build: buildValid ? "valid" : "missing",
      git: gitClean ? "clean" : "uncommitted changes",
    }, null, 2),
  };
 });
 ```
 </guardrails>
 <git_architecture>
 ## Git-Based Self-Modification
 Use git as the foundation for self-modification. It provides:
 - Version history (rollback capability)
 - Branching (experiment safely)
 - Merge (sync with other instances)
 - Push/pull (deploy and collaborate)
 **Essential git tools:**
 ```typescript
 tool("status", "Show git status", {}, ...);
 tool("diff", "Show file changes", { path: z.string().optional() }, ...);
 tool("log", "Show commit history", { count: z.number() }, ...);
 tool("commit_code", "Commit code changes", { message: z.string() }, ...);
 tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...);
 tool("pull", "Pull from GitHub", { source: z.enum(["main", "instance"]) }, ...);
 tool("rollback", "Revert recent commits", { commits: z.number() }, ...);
 ```
 **Multi-instance architecture:**
 ```
 main                      # Shared code
 ├── instance/bot-a       # Instance A's branch
 ├── instance/bot-b       # Instance B's branch
 └── instance/bot-c       # Instance C's branch
 ```
 Each instance can:
 - Pull updates from main
 - Push improvements back to main (via PR)
 - Sync features from other instances
 - Maintain instance-specific config
 </git_architecture>
 <prompt_evolution>
 ## Self-Modifying Prompts
 The system prompt is a file the agent can read and write.
 ```typescript
 // Agent can read its own prompt
 tool("read_file", ...);  // Can read src/prompts/system.md
 // Agent can propose changes
 tool("write_file", ...);  // Can write to src/prompts/system.md (with approval)
 ```
 **System prompt as living document:**
 ```markdown
 ## Feedback Processing
 When someone shares feedback:
 1. Acknowledge warmly
 2. Rate importance 1-5
 3. Store using feedback tools
 <!-- Note to self: Video walkthroughs should always be 4-5,
     learned this from Dan's feedback on 2024-12-07 -->
 ```
 The agent can:
 - Add notes to itself
 - Refine judgment criteria
 - Add new feature sections
 - Document edge cases it learned
 </prompt_evolution>
 <when_to_use>
 ## When to Implement Self-Modification
 **Good candidates:**
 - Long-running autonomous agents
 - Agents that need to adapt to feedback
 - Systems where behavior evolution is valuable
 - Internal tools where rapid iteration matters
 **Not necessary for:**
 - Simple single-task agents
 - Highly regulated environments
 - Systems where behavior must be auditable
 - One-off or short-lived agents
 Start with a non-self-modifying prompt-native agent. Add self-modification when you need it.
 </when_to_use>
 <example_tools>
 ## Complete Self-Modification Toolset
 ```typescript
 const selfMcpServer = createSdkMcpServer({
  name: "self",
  version: "1.0.0",
  tools: [
    // FILE OPERATIONS
    tool("read_file", "Read any project file", { path: z.string() }, ...),
    tool("write_file", "Write a file (code requires approval)", { path, content }, ...),
    tool("list_files", "List directory contents", { path: z.string() }, ...),
    tool("search_code", "Search for patterns", { pattern: z.string() }, ...),
    // APPROVAL WORKFLOW
    tool("apply_pending", "Apply approved changes", {}, ...),
    tool("get_pending", "Show pending changes", {}, ...),
    tool("clear_pending", "Discard pending changes", {}, ...),
    // RESTART
    tool("restart", "Rebuild and restart", {}, ...),
    tool("health_check", "Check if bot is healthy", {}, ...),
  ],
 });
 const gitMcpServer = createSdkMcpServer({
  name: "git",
  version: "1.0.0",
  tools: [
    // STATUS
    tool("status", "Show git status", {}, ...),
    tool("diff", "Show changes", { path: z.string().optional() }, ...),
    tool("log", "Show history", { count: z.number() }, ...),
    // COMMIT & PUSH
    tool("commit_code", "Commit code changes", { message: z.string() }, ...),
    tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...),
    // SYNC
    tool("pull", "Pull from upstream", { source: z.enum(["main", "instance"]) }, ...),
    tool("self_deploy", "Pull, build, restart", { source: z.enum(["main", "instance"]) }, ...),
    // SAFETY
    tool("rollback", "Revert commits", { commits: z.number() }, ...),
    tool("health_check", "Detailed health report", {}, ...),
  ],
 });
 ```
 </example_tools>
 <checklist>
 ## Self-Modification Checklist
 Before enabling self-modification:
 - [ ] Git-based version control set up
 - [ ] Approval gates for code changes
 - [ ] Build verification before restart
 - [ ] Rollback mechanism available
 - [ ] Health check endpoint
 - [ ] Instance identity configured
 When implementing:
 - [ ] Agent can read all project files
 - [ ] Agent can write files (with appropriate approval)
 - [ ] Agent can commit and push
 - [ ] Agent can pull updates
 - [ ] Agent can restart itself
 - [ ] Agent can roll back if needed
 </checklist>
--- a/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md
@@ -0,0 +1,250 @@
 <overview>
 How to write system prompts for prompt-native agents. The system prompt is where features live—it defines behavior, judgment criteria, and decision-making without encoding them in code.
 </overview>
 <principle name="features-in-prompts">
 ## Features Are Prompt Sections
 Each feature is a section of the system prompt that tells the agent how to behave.
 **Traditional approach:** Feature = function in codebase
 ```typescript
 function processFeedback(message) {
  const category = categorize(message);
  const priority = calculatePriority(message);
  await store(message, category, priority);
  if (priority > 3) await notify();
 }
 ```
 **Prompt-native approach:** Feature = section in system prompt
 ```markdown
 ## Feedback Processing
 When someone shares feedback:
 1. Read the message to understand what they're saying
 2. Rate importance 1-5:
   - 5 (Critical): Blocking issues, data loss, security
   - 4 (High): Detailed bug reports, significant UX problems
   - 3 (Medium): General suggestions, minor issues
   - 2 (Low): Cosmetic issues, edge cases
   - 1 (Minimal): Off-topic, duplicates
 3. Store using feedback.store_feedback
 4. If importance >= 4, let the channel know you're tracking it
 Use your judgment. Context matters.
 ```
 </principle>
 <structure>
 ## System Prompt Structure
 A well-structured prompt-native system prompt:
 ```markdown
 # Identity
 You are [Name], [brief identity statement].
 ## Core Behavior
 [What you always do, regardless of specific request]
 ## Feature: [Feature Name]
 [When to trigger]
 [What to do]
 [How to decide edge cases]
 ## Feature: [Another Feature]
 [...]
 ## Tool Usage
 [Guidance on when/how to use available tools]
 ## Tone and Style
 [Communication guidelines]
 ## What NOT to Do
 [Explicit boundaries]
 ```
 </structure>
 <principle name="guide-not-micromanage">
 ## Guide, Don't Micromanage
 Tell the agent what to achieve, not exactly how to do it.
 **Micromanaging (bad):**
 ```markdown
 When creating a summary:
 1. Use exactly 3 bullet points
 2. Each bullet under 20 words
 3. Use em-dashes for sub-points
 4. Bold the first word of each bullet
 5. End with a colon if there are sub-points
 ```
 **Guiding (good):**
 ```markdown
 When creating summaries:
 - Be concise but complete
 - Highlight the most important points
 - Use your judgment about format
 The goal is clarity, not consistency.
 ```
 Trust the agent's intelligence. It knows how to communicate.
 </principle>
 <principle name="judgment-criteria">
 ## Define Judgment Criteria, Not Rules
 Instead of rules, provide criteria for making decisions.
 **Rules (rigid):**
 ```markdown
 If the message contains "bug", set importance to 4.
 If the message contains "crash", set importance to 5.
 ```
 **Judgment criteria (flexible):**
 ```markdown
 ## Importance Rating
 Rate importance based on:
 - **Impact**: How many users affected? How severe?
 - **Urgency**: Is this blocking? Time-sensitive?
 - **Actionability**: Can we actually fix this?
 - **Evidence**: Video/screenshots vs vague description
 Examples:
 - "App crashes when I tap submit" → 4-5 (critical, reproducible)
 - "The button color seems off" → 2 (cosmetic, non-blocking)
 - "Video walkthrough with 15 timestamped issues" → 5 (high-quality evidence)
 ```
 </principle>
 <principle name="context-windows">
 ## Work With Context Windows
 The agent sees: system prompt + recent messages + tool results. Design for this.
 **Use conversation history:**
 ```markdown
 ## Message Processing
 When processing messages:
 1. Check if this relates to recent conversation
 2. If someone is continuing a previous thread, maintain context
 3. Don't ask questions you already have answers to
 ```
 **Acknowledge agent limitations:**
 ```markdown
 ## Memory Limitations
 You don't persist memory between restarts. Use the memory server:
 - Before responding, check memory.recall for relevant context
 - After important decisions, use memory.store to remember
 - Store conversation threads, not individual messages
 ```
 </principle>
 <example name="feedback-bot">
 ## Example: Complete System Prompt
 ```markdown
 # R2-C2 Feedback Bot
 You are R2-C2, Every's feedback collection assistant. You monitor Discord for feedback about the Every Reader iOS app and organize it for the team.
 ## Core Behavior
 - Be warm and helpful, never robotic
 - Acknowledge all feedback, even if brief
 - Ask clarifying questions when feedback is vague
 - Never argue with feedback—collect and organize it
 ## Feedback Collection
 When someone shares feedback:
 1. **Acknowledge** warmly: "Thanks for this!" or "Good catch!"
 2. **Clarify** if needed: "Can you tell me more about when this happens?"
 3. **Rate importance** 1-5:
   - 5: Critical (crashes, data loss, security)
   - 4: High (detailed reports, significant UX issues)
   - 3: Medium (suggestions, minor bugs)
   - 2: Low (cosmetic, edge cases)
   - 1: Minimal (off-topic, duplicates)
 4. **Store** using feedback.store_feedback
 5. **Update site** if significant feedback came in
 Video walkthroughs are gold—always rate them 4-5.
 ## Site Management
 You maintain a public feedback site. When feedback accumulates:
 1. Sync data to site/public/content/feedback.json
 2. Update status counts and organization
 3. Commit and push to trigger deploy
 The site should look professional and be easy to scan.
 ## Message Deduplication
 Before processing any message:
 1. Check memory.recall(key: "processed_{messageId}")
 2. Skip if already processed
 3. After processing, store the key
 ## Tone
 - Casual and friendly
 - Brief but warm
 - Technical when discussing bugs
 - Never defensive
 ## Don't
 - Don't promise fixes or timelines
 - Don't share internal discussions
 - Don't ignore feedback even if it seems minor
 - Don't repeat yourself—vary acknowledgments
 ```
 </example>
 <iteration>
 ## Iterating on System Prompts
 Prompt-native development means rapid iteration:
 1. **Observe** agent behavior in production
 2. **Identify** gaps: "It's not rating video feedback high enough"
 3. **Add guidance**: "Video walkthroughs are gold—always rate them 4-5"
 4. **Deploy** (just edit the prompt file)
 5. **Repeat**
 No code changes. No recompilation. Just prose.
 </iteration>
 <checklist>
 ## System Prompt Checklist
 - [ ] Clear identity statement
 - [ ] Core behaviors that always apply
 - [ ] Features as separate sections
 - [ ] Judgment criteria instead of rigid rules
 - [ ] Examples for ambiguous cases
 - [ ] Explicit boundaries (what NOT to do)
 - [ ] Tone guidance
 - [ ] Tool usage guidance (when to use each)
 - [ ] Memory/context handling
 </checklist>