diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md b/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md
new file mode 100644
index 0000000..111c9e6
--- /dev/null
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md
@@ -0,0 +1,201 @@
+---
+name: agent-native-architecture
+description: Build AI agents using prompt-native architecture where features are defined in prompts, not code. Use when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy.
+---
+
+
+## The Prompt-Native Philosophy
+
+Agent native engineering inverts traditional software architecture. Instead of writing code that the agent executes, you define outcomes in prompts and let the agent figure out HOW to achieve them.
+
+### The Foundational Principle
+
+**Whatever the user can do, the agent can do. Many things the developer can do, the agent can do.**
+
+Don't artificially limit the agent. If a user could read files, write code, browse the web, deploy an app—the agent should be able to do those things too. The agent figures out HOW to achieve an outcome; it doesn't just call your pre-written functions.
+
+### Features Are Prompts
+
+Each feature is a prompt that defines an outcome and gives the agent the tools it needs. The agent then figures out how to accomplish it.
+
+**Traditional:** Feature = function in codebase that agent calls
+**Prompt-native:** Feature = prompt defining desired outcome + primitive tools
+
+The agent doesn't execute your code. It uses primitives to achieve outcomes you describe.
+
+### Tools Provide Capability, Not Behavior
+
+Tools should be primitives that enable capability. The prompt defines what to do with that capability.
+
+**Wrong:** `generate_dashboard(data, layout, filters)` — agent executes your workflow
+**Right:** `read_file`, `write_file`, `list_files` — agent figures out how to build a dashboard
+
+Pure primitives are better, but domain primitives (like `store_feedback`) are OK if they don't encode logic—just storage/retrieval.
+
+### The Development Lifecycle
+
+1. **Start in the prompt** - New features begin as natural language defining outcomes
+2. **Iterate rapidly** - Change behavior by editing prose, not refactoring code
+3. **Graduate when stable** - Harden to code when requirements stabilize AND speed/reliability matter
+4. **Many features stay as prompts** - Not everything needs to become code
+
+### Self-Modification (Advanced)
+
+The advanced tier: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
+
+When implementing:
+- Approval gates for code changes
+- Auto-commit before modifications (rollback capability)
+- Health checks after changes
+- Build verification before restart
+
+### When NOT to Use This Approach
+
+- **High-frequency operations** - thousands of calls per second
+- **Deterministic requirements** - exact same output every time
+- **Cost-sensitive scenarios** - when API costs would be prohibitive
+- **High security** - though this is overblown for most apps
+
+
+
+What aspect of agent native architecture do you need help with?
+
+1. **Design architecture** - Plan a new prompt-native agent system
+2. **Create MCP tools** - Build primitive tools following the philosophy
+3. **Write system prompts** - Define agent behavior in prompts
+4. **Self-modification** - Enable agents to safely evolve themselves
+5. **Review/refactor** - Make existing code more prompt-native
+
+**Wait for response before proceeding.**
+
+
+
+| Response | Action |
+|----------|--------|
+| 1, "design", "architecture", "plan" | Read references/architecture-patterns.md |
+| 2, "tool", "mcp", "primitive" | Read references/mcp-tool-design.md |
+| 3, "prompt", "system prompt", "behavior" | Read references/system-prompt-design.md |
+| 4, "self-modify", "evolve", "git" | Read references/self-modification.md |
+| 5, "review", "refactor", "existing" | Read references/refactoring-to-prompt-native.md |
+
+**After reading the reference, apply those patterns to the user's specific context.**
+
+
+
+Build a prompt-native agent in three steps:
+
+**Step 1: Define primitive tools**
+```typescript
+const tools = [
+ tool("read_file", "Read any file", { path: z.string() }, ...),
+ tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
+ tool("list_files", "List directory", { path: z.string() }, ...),
+];
+```
+
+**Step 2: Write behavior in the system prompt**
+```markdown
+## Your Responsibilities
+When asked to organize content, you should:
+1. Read existing files to understand the structure
+2. Analyze what organization makes sense
+3. Create appropriate pages using write_file
+4. Use your judgment about layout and formatting
+
+You decide the structure. Make it good.
+```
+
+**Step 3: Let the agent work**
+```typescript
+query({
+ prompt: userMessage,
+ options: {
+ systemPrompt,
+ mcpServers: { files: fileServer },
+ permissionMode: "acceptEdits",
+ }
+});
+```
+
+
+
+## Domain Knowledge
+
+All references in `references/`:
+
+**Architecture:** architecture-patterns.md
+**Tool Design:** mcp-tool-design.md
+**Prompts:** system-prompt-design.md
+**Self-Modification:** self-modification.md
+**Refactoring:** refactoring-to-prompt-native.md
+
+
+
+## What NOT to Do
+
+**THE CARDINAL SIN: Agent executes your code instead of figuring things out**
+
+This is the most common mistake. You fall back into writing workflow code and having the agent call it, instead of defining outcomes and letting the agent figure out HOW.
+
+```typescript
+// WRONG - You wrote the workflow, agent just executes it
+tool("process_feedback", async ({ message }) => {
+ const category = categorize(message); // Your code
+ const priority = calculatePriority(message); // Your code
+ await store(message, category, priority); // Your code
+ if (priority > 3) await notify(); // Your code
+});
+
+// RIGHT - Agent figures out how to process feedback
+tool("store_item", { key, value }, ...); // Primitive
+tool("send_message", { channel, content }, ...); // Primitive
+// Prompt says: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
+```
+
+**Don't artificially limit what the agent can do**
+
+If a user could do it, the agent should be able to do it.
+
+```typescript
+// WRONG - limiting agent capabilities
+tool("read_approved_files", { path }, async ({ path }) => {
+ if (!ALLOWED_PATHS.includes(path)) throw new Error("Not allowed");
+ return readFile(path);
+});
+
+// RIGHT - give full capability, use guardrails appropriately
+tool("read_file", { path }, ...); // Agent can read anything
+// Use approval gates for writes, not artificial limits on reads
+```
+
+**Don't encode decisions in tools**
+```typescript
+// Wrong - tool decides format
+tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) }, ...)
+
+// Right - agent decides format via prompt
+tool("write_file", ...) // Agent chooses what to write
+```
+
+**Don't over-specify in prompts**
+```markdown
+// Wrong - micromanaging the HOW
+When creating a summary, use exactly 3 bullet points,
+each under 20 words, formatted with em-dashes...
+
+// Right - define outcome, trust intelligence
+Create clear, useful summaries. Use your judgment.
+```
+
+
+
+You've built a prompt-native agent when:
+
+- [ ] The agent figures out HOW to achieve outcomes, not just calls your functions
+- [ ] Whatever a user could do, the agent can do (no artificial limits)
+- [ ] Features are prompts that define outcomes, not code that defines workflows
+- [ ] Tools are primitives (read, write, store, call API) that enable capability
+- [ ] Changing behavior means editing prose, not refactoring code
+- [ ] The agent can surprise you with clever approaches you didn't anticipate
+- [ ] You could add a new feature by writing a new prompt section, not new code
+
diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md
new file mode 100644
index 0000000..a76a019
--- /dev/null
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md
@@ -0,0 +1,215 @@
+
+Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
+
+
+
+## Event-Driven Agent Architecture
+
+The agent runs as a long-lived process that responds to events. Events become prompts.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ Agent Loop │
+├─────────────────────────────────────────────────────────────┤
+│ Event Source → Agent (Claude) → Tool Calls → Response │
+└─────────────────────────────────────────────────────────────┘
+ │
+ ┌───────────────┼───────────────┐
+ ▼ ▼ ▼
+ ┌─────────┐ ┌──────────┐ ┌───────────┐
+ │ Content │ │ Self │ │ Data │
+ │ Tools │ │ Tools │ │ Tools │
+ └─────────┘ └──────────┘ └───────────┘
+ (write_file) (read_source) (store_item)
+ (restart) (list_items)
+```
+
+**Key characteristics:**
+- Events (messages, webhooks, timers) trigger agent turns
+- Agent decides how to respond based on system prompt
+- Tools are primitives for IO, not business logic
+- State persists between events via data tools
+
+**Example: Discord feedback bot**
+```typescript
+// Event source
+client.on("messageCreate", (message) => {
+ if (!message.author.bot) {
+ runAgent({
+ userMessage: `New message from ${message.author}: "${message.content}"`,
+ channelId: message.channelId,
+ });
+ }
+});
+
+// System prompt defines behavior
+const systemPrompt = `
+When someone shares feedback:
+1. Acknowledge their feedback warmly
+2. Ask clarifying questions if needed
+3. Store it using the feedback tools
+4. Update the feedback site
+
+Use your judgment about importance and categorization.
+`;
+```
+
+
+
+## Two-Layer Git Architecture
+
+For self-modifying agents, separate code (shared) from data (instance-specific).
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ GitHub (shared repo) │
+│ - src/ (agent code) │
+│ - site/ (web interface) │
+│ - package.json (dependencies) │
+│ - .gitignore (excludes data/, logs/) │
+└─────────────────────────────────────────────────────────────┘
+ │
+ git clone
+ │
+ ▼
+┌─────────────────────────────────────────────────────────────┐
+│ Instance (Server) │
+│ │
+│ FROM GITHUB (tracked): │
+│ - src/ → pushed back on code changes │
+│ - site/ → pushed, triggers deployment │
+│ │
+│ LOCAL ONLY (untracked): │
+│ - data/ → instance-specific storage │
+│ - logs/ → runtime logs │
+│ - .env → secrets │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Why this works:**
+- Code and site are version controlled (GitHub)
+- Raw data stays local (instance-specific)
+- Site is generated from data, so reproducible
+- Automatic rollback via git history
+
+
+
+## Multi-Instance Branching
+
+Each agent instance gets its own branch while sharing core code.
+
+```
+main # Shared features, bug fixes
+├── instance/feedback-bot # Every Reader feedback bot
+├── instance/support-bot # Customer support bot
+└── instance/research-bot # Research assistant
+```
+
+**Change flow:**
+| Change Type | Work On | Then |
+|-------------|---------|------|
+| Core features | main | Merge to instance branches |
+| Bug fixes | main | Merge to instance branches |
+| Instance config | instance branch | Done |
+| Instance data | instance branch | Done |
+
+**Sync tools:**
+```typescript
+tool("self_deploy", "Pull latest from main, rebuild, restart", ...)
+tool("sync_from_instance", "Merge from another instance", ...)
+tool("propose_to_main", "Create PR to share improvements", ...)
+```
+
+
+
+## Site as Agent Output
+
+The agent generates and maintains a website as a natural output, not through specialized site tools.
+
+```
+Discord Message
+ ↓
+Agent processes it, extracts insights
+ ↓
+Agent decides what site updates are needed
+ ↓
+Agent writes files using write_file primitive
+ ↓
+Git commit + push triggers deployment
+ ↓
+Site updates automatically
+```
+
+**Key insight:** Don't build site generation tools. Give the agent file tools and teach it in the prompt how to create good sites.
+
+```markdown
+## Site Management
+
+You maintain a public feedback site. When feedback comes in:
+1. Use write_file to update site/public/content/feedback.json
+2. If the site's React components need improvement, modify them
+3. Commit changes and push to trigger Vercel deploy
+
+The site should be:
+- Clean, modern dashboard aesthetic
+- Clear visual hierarchy
+- Status organization (Inbox, Active, Done)
+
+You decide the structure. Make it good.
+```
+
+
+
+## Approval Gates Pattern
+
+Separate "propose" from "apply" for dangerous operations.
+
+```typescript
+// Pending changes stored separately
+const pendingChanges = new Map();
+
+tool("write_file", async ({ path, content }) => {
+ if (requiresApproval(path)) {
+ // Store for approval
+ pendingChanges.set(path, content);
+ const diff = generateDiff(path, content);
+ return {
+ text: `Change requires approval.\n\n${diff}\n\nReply "yes" to apply.`
+ };
+ } else {
+ // Apply immediately
+ writeFileSync(path, content);
+ return { text: `Wrote ${path}` };
+ }
+});
+
+tool("apply_pending", async () => {
+ for (const [path, content] of pendingChanges) {
+ writeFileSync(path, content);
+ }
+ pendingChanges.clear();
+ return { text: "Applied all pending changes" };
+});
+```
+
+**What requires approval:**
+- src/*.ts (agent code)
+- package.json (dependencies)
+- system prompt changes
+
+**What doesn't:**
+- data/* (instance data)
+- site/* (generated content)
+- docs/* (documentation)
+
+
+
+## Questions to Ask When Designing
+
+1. **What events trigger agent turns?** (messages, webhooks, timers, user requests)
+2. **What primitives does the agent need?** (read, write, call API, restart)
+3. **What decisions should the agent make?** (format, structure, priority, action)
+4. **What decisions should be hardcoded?** (security boundaries, approval requirements)
+5. **How does the agent verify its work?** (health checks, build verification)
+6. **How does the agent recover from mistakes?** (git rollback, approval gates)
+
diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md
new file mode 100644
index 0000000..f7133da
--- /dev/null
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md
@@ -0,0 +1,316 @@
+
+How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.
+
+**Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.
+
+
+
+## Tools Are Primitives, Not Workflows
+
+**Wrong approach:** Tools that encode business logic
+```typescript
+tool("process_feedback", {
+ feedback: z.string(),
+ category: z.enum(["bug", "feature", "question"]),
+ priority: z.enum(["low", "medium", "high"]),
+}, async ({ feedback, category, priority }) => {
+ // Tool decides how to process
+ const processed = categorize(feedback);
+ const stored = await saveToDatabase(processed);
+ const notification = await notify(priority);
+ return { processed, stored, notification };
+});
+```
+
+**Right approach:** Primitives that enable any workflow
+```typescript
+tool("store_item", {
+ key: z.string(),
+ value: z.any(),
+}, async ({ key, value }) => {
+ await db.set(key, value);
+ return { text: `Stored ${key}` };
+});
+
+tool("send_message", {
+ channel: z.string(),
+ content: z.string(),
+}, async ({ channel, content }) => {
+ await messenger.send(channel, content);
+ return { text: "Sent" };
+});
+```
+
+The agent decides categorization, priority, and when to notify based on the system prompt.
+
+
+
+## Tools Should Have Descriptive, Primitive Names
+
+Names should describe the capability, not the use case:
+
+| Wrong | Right |
+|-------|-------|
+| `process_user_feedback` | `store_item` |
+| `create_feedback_summary` | `write_file` |
+| `send_notification` | `send_message` |
+| `deploy_to_production` | `git_push` |
+
+The prompt tells the agent *when* to use primitives. The tool just provides *capability*.
+
+
+
+## Inputs Should Be Simple
+
+Tools accept data. They don't accept decisions.
+
+**Wrong:** Tool accepts decisions
+```typescript
+tool("format_content", {
+ content: z.string(),
+ format: z.enum(["markdown", "html", "json"]),
+ style: z.enum(["formal", "casual", "technical"]),
+}, ...)
+```
+
+**Right:** Tool accepts data, agent decides format
+```typescript
+tool("write_file", {
+ path: z.string(),
+ content: z.string(),
+}, ...)
+// Agent decides to write index.html with HTML content, or data.json with JSON
+```
+
+
+
+## Outputs Should Be Rich
+
+Return enough information for the agent to verify and iterate.
+
+**Wrong:** Minimal output
+```typescript
+async ({ key }) => {
+ await db.delete(key);
+ return { text: "Deleted" };
+}
+```
+
+**Right:** Rich output
+```typescript
+async ({ key }) => {
+ const existed = await db.has(key);
+ if (!existed) {
+ return { text: `Key ${key} did not exist` };
+ }
+ await db.delete(key);
+ return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
+}
+```
+
+
+
+## Tool Design Template
+
+```typescript
+import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
+import { z } from "zod";
+
+export const serverName = createSdkMcpServer({
+ name: "server-name",
+ version: "1.0.0",
+ tools: [
+ // READ operations
+ tool(
+ "read_item",
+ "Read an item by key",
+ { key: z.string().describe("Item key") },
+ async ({ key }) => {
+ const item = await storage.get(key);
+ return {
+ content: [{
+ type: "text",
+ text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
+ }],
+ isError: !item,
+ };
+ }
+ ),
+
+ tool(
+ "list_items",
+ "List all items, optionally filtered",
+ {
+ prefix: z.string().optional().describe("Filter by key prefix"),
+ limit: z.number().default(100).describe("Max items"),
+ },
+ async ({ prefix, limit }) => {
+ const items = await storage.list({ prefix, limit });
+ return {
+ content: [{
+ type: "text",
+ text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
+ }],
+ };
+ }
+ ),
+
+ // WRITE operations
+ tool(
+ "store_item",
+ "Store an item",
+ {
+ key: z.string().describe("Item key"),
+ value: z.any().describe("Item data"),
+ },
+ async ({ key, value }) => {
+ await storage.set(key, value);
+ return {
+ content: [{ type: "text", text: `Stored ${key}` }],
+ };
+ }
+ ),
+
+ tool(
+ "delete_item",
+ "Delete an item",
+ { key: z.string().describe("Item key") },
+ async ({ key }) => {
+ const existed = await storage.delete(key);
+ return {
+ content: [{
+ type: "text",
+ text: existed ? `Deleted ${key}` : `${key} did not exist`,
+ }],
+ };
+ }
+ ),
+
+ // EXTERNAL operations
+ tool(
+ "call_api",
+ "Make an HTTP request",
+ {
+ url: z.string().url(),
+ method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
+ body: z.any().optional(),
+ },
+ async ({ url, method, body }) => {
+ const response = await fetch(url, { method, body: JSON.stringify(body) });
+ const text = await response.text();
+ return {
+ content: [{
+ type: "text",
+ text: `${response.status} ${response.statusText}\n\n${text}`,
+ }],
+ isError: !response.ok,
+ };
+ }
+ ),
+ ],
+});
+```
+
+
+
+## Example: Feedback Storage Server
+
+This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.
+
+```typescript
+export const feedbackMcpServer = createSdkMcpServer({
+ name: "feedback",
+ version: "1.0.0",
+ tools: [
+ tool(
+ "store_feedback",
+ "Store a feedback item",
+ {
+ item: z.object({
+ id: z.string(),
+ author: z.string(),
+ content: z.string(),
+ importance: z.number().min(1).max(5),
+ timestamp: z.string(),
+ status: z.string().optional(),
+ urls: z.array(z.string()).optional(),
+ metadata: z.any().optional(),
+ }).describe("Feedback item"),
+ },
+ async ({ item }) => {
+ await db.feedback.insert(item);
+ return {
+ content: [{
+ type: "text",
+ text: `Stored feedback ${item.id} from ${item.author}`,
+ }],
+ };
+ }
+ ),
+
+ tool(
+ "list_feedback",
+ "List feedback items",
+ {
+ limit: z.number().default(50),
+ status: z.string().optional(),
+ },
+ async ({ limit, status }) => {
+ const items = await db.feedback.list({ limit, status });
+ return {
+ content: [{
+ type: "text",
+ text: JSON.stringify(items, null, 2),
+ }],
+ };
+ }
+ ),
+
+ tool(
+ "update_feedback",
+ "Update a feedback item",
+ {
+ id: z.string(),
+ updates: z.object({
+ status: z.string().optional(),
+ importance: z.number().optional(),
+ metadata: z.any().optional(),
+ }),
+ },
+ async ({ id, updates }) => {
+ await db.feedback.update(id, updates);
+ return {
+ content: [{ type: "text", text: `Updated ${id}` }],
+ };
+ }
+ ),
+ ],
+});
+```
+
+The system prompt then tells the agent *how* to use these primitives:
+
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Extract author, content, and any URLs
+2. Rate importance 1-5 based on actionability
+3. Store using feedback.store_feedback
+4. If high importance (4-5), notify the channel
+
+Use your judgment about importance ratings.
+```
+
+
+
+## MCP Tool Design Checklist
+
+- [ ] Tool names describe capability, not use case
+- [ ] Inputs are data, not decisions
+- [ ] Outputs are rich (enough for agent to verify)
+- [ ] CRUD operations are separate tools (not one mega-tool)
+- [ ] No business logic in tool implementations
+- [ ] Error states clearly communicated via `isError`
+- [ ] Descriptions explain what the tool does, not when to use it
+
diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md
new file mode 100644
index 0000000..03e94ef
--- /dev/null
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md
@@ -0,0 +1,317 @@
+
+How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives.
+
+
+
+## Diagnosing Non-Prompt-Native Code
+
+Signs your agent isn't prompt-native:
+
+**Tools that encode workflows:**
+```typescript
+// RED FLAG: Tool contains business logic
+tool("process_feedback", async ({ message }) => {
+ const category = categorize(message); // Logic in code
+ const priority = calculatePriority(message); // Logic in code
+ await store(message, category, priority); // Orchestration in code
+ if (priority > 3) await notify(); // Decision in code
+});
+```
+
+**Agent calls functions instead of figuring things out:**
+```typescript
+// RED FLAG: Agent is just a function caller
+"Use process_feedback to handle incoming messages"
+// vs.
+"When feedback comes in, decide importance, store it, notify if high"
+```
+
+**Artificial limits on agent capability:**
+```typescript
+// RED FLAG: Tool prevents agent from doing what users can do
+tool("read_file", async ({ path }) => {
+ if (!ALLOWED_PATHS.includes(path)) {
+ throw new Error("Not allowed to read this file");
+ }
+ return readFile(path);
+});
+```
+
+**Prompts that specify HOW instead of WHAT:**
+```markdown
+// RED FLAG: Micromanaging the agent
+When creating a summary:
+1. Use exactly 3 bullet points
+2. Each bullet must be under 20 words
+3. Format with em-dashes for sub-points
+4. Bold the first word of each bullet
+```
+
+
+
+## Step-by-Step Refactoring
+
+**Step 1: Identify workflow tools**
+
+List all your tools. Mark any that:
+- Have business logic (categorize, calculate, decide)
+- Orchestrate multiple operations
+- Make decisions on behalf of the agent
+- Contain conditional logic (if/else based on content)
+
+**Step 2: Extract the primitives**
+
+For each workflow tool, identify the underlying primitives:
+
+| Workflow Tool | Hidden Primitives |
+|---------------|-------------------|
+| `process_feedback` | `store_item`, `send_message` |
+| `generate_report` | `read_file`, `write_file` |
+| `deploy_and_notify` | `git_push`, `send_message` |
+
+**Step 3: Move behavior to the prompt**
+
+Take the logic from your workflow tools and express it in natural language:
+
+```typescript
+// Before (in code):
+async function processFeedback(message) {
+ const priority = message.includes("crash") ? 5 :
+ message.includes("bug") ? 4 : 3;
+ await store(message, priority);
+ if (priority >= 4) await notify();
+}
+```
+
+```markdown
+// After (in prompt):
+## Feedback Processing
+
+When someone shares feedback:
+1. Rate importance 1-5:
+ - 5: Crashes, data loss, security issues
+ - 4: Bug reports with clear reproduction steps
+ - 3: General suggestions, minor issues
+2. Store using store_item
+3. If importance >= 4, notify the team
+
+Use your judgment. Context matters more than keywords.
+```
+
+**Step 4: Simplify tools to primitives**
+
+```typescript
+// Before: 1 workflow tool
+tool("process_feedback", { message, category, priority }, ...complex logic...)
+
+// After: 2 primitive tools
+tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
+tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)
+```
+
+**Step 5: Remove artificial limits**
+
+```typescript
+// Before: Limited capability
+tool("read_file", async ({ path }) => {
+ if (!isAllowed(path)) throw new Error("Forbidden");
+ return readFile(path);
+});
+
+// After: Full capability
+tool("read_file", async ({ path }) => {
+ return readFile(path); // Agent can read anything
+});
+// Use approval gates for WRITES, not artificial limits on READS
+```
+
+**Step 6: Test with outcomes, not procedures**
+
+Instead of testing "does it call the right function?", test "does it achieve the outcome?"
+
+```typescript
+// Before: Testing procedure
+expect(mockProcessFeedback).toHaveBeenCalledWith(...)
+
+// After: Testing outcome
+// Send feedback → Check it was stored with reasonable importance
+// Send high-priority feedback → Check notification was sent
+```
+
+
+
+## Before/After Examples
+
+**Example 1: Feedback Processing**
+
+Before:
+```typescript
+tool("handle_feedback", async ({ message, author }) => {
+ const category = detectCategory(message);
+ const priority = calculatePriority(message, category);
+ const feedbackId = await db.feedback.insert({
+ id: generateId(),
+ author,
+ message,
+ category,
+ priority,
+ timestamp: new Date().toISOString(),
+ });
+
+ if (priority >= 4) {
+ await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
+ }
+
+ return { feedbackId, category, priority };
+});
+```
+
+After:
+```typescript
+// Simple storage primitive
+tool("store_feedback", async ({ item }) => {
+ await db.feedback.insert(item);
+ return { text: `Stored feedback ${item.id}` };
+});
+
+// Simple message primitive
+tool("send_message", async ({ channel, content }) => {
+ await discord.send(channel, content);
+ return { text: "Sent" };
+});
+```
+
+System prompt:
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Generate a unique ID
+2. Rate importance 1-5 based on impact and urgency
+3. Store using store_feedback with the full item
+4. If importance >= 4, send a notification to the team channel
+
+Importance guidelines:
+- 5: Critical (crashes, data loss, security)
+- 4: High (detailed bug reports, blocking issues)
+- 3: Medium (suggestions, minor bugs)
+- 2: Low (cosmetic, edge cases)
+- 1: Minimal (off-topic, duplicates)
+```
+
+**Example 2: Report Generation**
+
+Before:
+```typescript
+tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
+ const data = await fetchMetrics(startDate, endDate);
+ const summary = summarizeMetrics(data);
+ const charts = generateCharts(data);
+
+ if (format === "html") {
+ return renderHtmlReport(summary, charts);
+ } else if (format === "markdown") {
+ return renderMarkdownReport(summary, charts);
+ } else {
+ return renderPdfReport(summary, charts);
+ }
+});
+```
+
+After:
+```typescript
+tool("query_metrics", async ({ start, end }) => {
+ const data = await db.metrics.query({ start, end });
+ return { text: JSON.stringify(data, null, 2) };
+});
+
+tool("write_file", async ({ path, content }) => {
+ writeFileSync(path, content);
+ return { text: `Wrote ${path}` };
+});
+```
+
+System prompt:
+```markdown
+## Report Generation
+
+When asked to generate a report:
+1. Query the relevant metrics using query_metrics
+2. Analyze the data and identify key trends
+3. Create a clear, well-formatted report
+4. Write it using write_file in the appropriate format
+
+Use your judgment about format and structure. Make it useful.
+```
+
+
+
+## Common Refactoring Challenges
+
+**"But the agent might make mistakes!"**
+
+Yes, and you can iterate. Change the prompt to add guidance:
+```markdown
+// Before
+Rate importance 1-5.
+
+// After (if agent keeps rating too high)
+Rate importance 1-5. Be conservative—most feedback is 2-3.
+Only use 4-5 for truly blocking or critical issues.
+```
+
+**"The workflow is complex!"**
+
+Complex workflows can still be expressed in prompts. The agent is smart.
+```markdown
+When processing video feedback:
+1. Check if it's a Loom, YouTube, or direct link
+2. For YouTube, pass URL directly to video analysis
+3. For others, download first, then analyze
+4. Extract timestamped issues
+5. Rate based on issue density and severity
+```
+
+**"We need deterministic behavior!"**
+
+Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.
+
+Keep in code:
+- Security validation
+- Rate limiting
+- Audit logging
+- Exact format requirements
+
+Move to prompts:
+- Categorization decisions
+- Priority judgments
+- Content generation
+- Workflow orchestration
+
+**"What about testing?"**
+
+Test outcomes, not procedures:
+- "Given this input, does the agent achieve the right result?"
+- "Does stored feedback have reasonable importance ratings?"
+- "Are notifications sent for truly high-priority items?"
+
+
+
+## Refactoring Checklist
+
+Diagnosis:
+- [ ] Listed all tools with business logic
+- [ ] Identified artificial limits on agent capability
+- [ ] Found prompts that micromanage HOW
+
+Refactoring:
+- [ ] Extracted primitives from workflow tools
+- [ ] Moved business logic to system prompt
+- [ ] Removed artificial limits
+- [ ] Simplified tool inputs to data, not decisions
+
+Validation:
+- [ ] Agent achieves same outcomes with primitives
+- [ ] Behavior can be changed by editing prompts
+- [ ] New features could be added without new tools
+
diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md
new file mode 100644
index 0000000..7bad83a
--- /dev/null
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md
@@ -0,0 +1,269 @@
+
+Self-modification is the advanced tier of agent native engineering: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
+
+This is the logical extension of "whatever the developer can do, the agent can do."
+
+
+
+## Why Self-Modification?
+
+Traditional software is static—it does what you wrote, nothing more. Self-modifying agents can:
+
+- **Fix their own bugs** - See an error, patch the code, restart
+- **Add new capabilities** - User asks for something new, agent implements it
+- **Evolve behavior** - Learn from feedback and adjust prompts
+- **Deploy themselves** - Push code, trigger builds, restart
+
+The agent becomes a living system that improves over time, not frozen code.
+
+
+
+## What Self-Modification Enables
+
+**Code modification:**
+- Read and understand source files
+- Write fixes and new features
+- Commit and push to version control
+- Trigger builds and verify they pass
+
+**Prompt evolution:**
+- Edit the system prompt based on feedback
+- Add new features as prompt sections
+- Refine judgment criteria that aren't working
+
+**Infrastructure control:**
+- Pull latest code from upstream
+- Merge from other branches/instances
+- Restart after changes
+- Roll back if something breaks
+
+**Site/output generation:**
+- Generate and maintain websites
+- Create documentation
+- Build dashboards from data
+
+
+
+## Required Guardrails
+
+Self-modification is powerful. It needs safety mechanisms.
+
+**Approval gates for code changes:**
+```typescript
+tool("write_file", async ({ path, content }) => {
+ if (isCodeFile(path)) {
+ // Store for approval, don't apply immediately
+ pendingChanges.set(path, content);
+ const diff = generateDiff(path, content);
+ return { text: `Requires approval:\n\n${diff}\n\nReply "yes" to apply.` };
+ }
+ // Non-code files apply immediately
+ writeFileSync(path, content);
+ return { text: `Wrote ${path}` };
+});
+```
+
+**Auto-commit before changes:**
+```typescript
+tool("self_deploy", async () => {
+ // Save current state first
+ runGit("stash"); // or commit uncommitted changes
+
+ // Then pull/merge
+ runGit("fetch origin");
+ runGit("merge origin/main --no-edit");
+
+ // Build and verify
+ runCommand("npm run build");
+
+ // Only then restart
+ scheduleRestart();
+});
+```
+
+**Build verification:**
+```typescript
+// Don't restart unless build passes
+try {
+ runCommand("npm run build", { timeout: 120000 });
+} catch (error) {
+ // Rollback the merge
+ runGit("merge --abort");
+ return { text: "Build failed, aborting deploy", isError: true };
+}
+```
+
+**Health checks after restart:**
+```typescript
+tool("health_check", async () => {
+ const uptime = process.uptime();
+ const buildValid = existsSync("dist/index.js");
+ const gitClean = !runGit("status --porcelain");
+
+ return {
+ text: JSON.stringify({
+ status: "healthy",
+ uptime: `${Math.floor(uptime / 60)}m`,
+ build: buildValid ? "valid" : "missing",
+ git: gitClean ? "clean" : "uncommitted changes",
+ }, null, 2),
+ };
+});
+```
+
+
+
+## Git-Based Self-Modification
+
+Use git as the foundation for self-modification. It provides:
+- Version history (rollback capability)
+- Branching (experiment safely)
+- Merge (sync with other instances)
+- Push/pull (deploy and collaborate)
+
+**Essential git tools:**
+```typescript
+tool("status", "Show git status", {}, ...);
+tool("diff", "Show file changes", { path: z.string().optional() }, ...);
+tool("log", "Show commit history", { count: z.number() }, ...);
+tool("commit_code", "Commit code changes", { message: z.string() }, ...);
+tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...);
+tool("pull", "Pull from GitHub", { source: z.enum(["main", "instance"]) }, ...);
+tool("rollback", "Revert recent commits", { commits: z.number() }, ...);
+```
+
+**Multi-instance architecture:**
+```
+main # Shared code
+├── instance/bot-a # Instance A's branch
+├── instance/bot-b # Instance B's branch
+└── instance/bot-c # Instance C's branch
+```
+
+Each instance can:
+- Pull updates from main
+- Push improvements back to main (via PR)
+- Sync features from other instances
+- Maintain instance-specific config
+
+
+
+## Self-Modifying Prompts
+
+The system prompt is a file the agent can read and write.
+
+```typescript
+// Agent can read its own prompt
+tool("read_file", ...); // Can read src/prompts/system.md
+
+// Agent can propose changes
+tool("write_file", ...); // Can write to src/prompts/system.md (with approval)
+```
+
+**System prompt as living document:**
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Acknowledge warmly
+2. Rate importance 1-5
+3. Store using feedback tools
+
+
+```
+
+The agent can:
+- Add notes to itself
+- Refine judgment criteria
+- Add new feature sections
+- Document edge cases it learned
+
+
+
+## When to Implement Self-Modification
+
+**Good candidates:**
+- Long-running autonomous agents
+- Agents that need to adapt to feedback
+- Systems where behavior evolution is valuable
+- Internal tools where rapid iteration matters
+
+**Not necessary for:**
+- Simple single-task agents
+- Highly regulated environments
+- Systems where behavior must be auditable
+- One-off or short-lived agents
+
+Start with a non-self-modifying prompt-native agent. Add self-modification when you need it.
+
+
+
+## Complete Self-Modification Toolset
+
+```typescript
+const selfMcpServer = createSdkMcpServer({
+ name: "self",
+ version: "1.0.0",
+ tools: [
+ // FILE OPERATIONS
+ tool("read_file", "Read any project file", { path: z.string() }, ...),
+ tool("write_file", "Write a file (code requires approval)", { path, content }, ...),
+ tool("list_files", "List directory contents", { path: z.string() }, ...),
+ tool("search_code", "Search for patterns", { pattern: z.string() }, ...),
+
+ // APPROVAL WORKFLOW
+ tool("apply_pending", "Apply approved changes", {}, ...),
+ tool("get_pending", "Show pending changes", {}, ...),
+ tool("clear_pending", "Discard pending changes", {}, ...),
+
+ // RESTART
+ tool("restart", "Rebuild and restart", {}, ...),
+ tool("health_check", "Check if bot is healthy", {}, ...),
+ ],
+});
+
+const gitMcpServer = createSdkMcpServer({
+ name: "git",
+ version: "1.0.0",
+ tools: [
+ // STATUS
+ tool("status", "Show git status", {}, ...),
+ tool("diff", "Show changes", { path: z.string().optional() }, ...),
+ tool("log", "Show history", { count: z.number() }, ...),
+
+ // COMMIT & PUSH
+ tool("commit_code", "Commit code changes", { message: z.string() }, ...),
+ tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...),
+
+ // SYNC
+ tool("pull", "Pull from upstream", { source: z.enum(["main", "instance"]) }, ...),
+ tool("self_deploy", "Pull, build, restart", { source: z.enum(["main", "instance"]) }, ...),
+
+ // SAFETY
+ tool("rollback", "Revert commits", { commits: z.number() }, ...),
+ tool("health_check", "Detailed health report", {}, ...),
+ ],
+});
+```
+
+
+
+## Self-Modification Checklist
+
+Before enabling self-modification:
+- [ ] Git-based version control set up
+- [ ] Approval gates for code changes
+- [ ] Build verification before restart
+- [ ] Rollback mechanism available
+- [ ] Health check endpoint
+- [ ] Instance identity configured
+
+When implementing:
+- [ ] Agent can read all project files
+- [ ] Agent can write files (with appropriate approval)
+- [ ] Agent can commit and push
+- [ ] Agent can pull updates
+- [ ] Agent can restart itself
+- [ ] Agent can roll back if needed
+
diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md
new file mode 100644
index 0000000..377f45f
--- /dev/null
+++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md
@@ -0,0 +1,250 @@
+
+How to write system prompts for prompt-native agents. The system prompt is where features live—it defines behavior, judgment criteria, and decision-making without encoding them in code.
+
+
+
+## Features Are Prompt Sections
+
+Each feature is a section of the system prompt that tells the agent how to behave.
+
+**Traditional approach:** Feature = function in codebase
+```typescript
+function processFeedback(message) {
+ const category = categorize(message);
+ const priority = calculatePriority(message);
+ await store(message, category, priority);
+ if (priority > 3) await notify();
+}
+```
+
+**Prompt-native approach:** Feature = section in system prompt
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Read the message to understand what they're saying
+2. Rate importance 1-5:
+ - 5 (Critical): Blocking issues, data loss, security
+ - 4 (High): Detailed bug reports, significant UX problems
+ - 3 (Medium): General suggestions, minor issues
+ - 2 (Low): Cosmetic issues, edge cases
+ - 1 (Minimal): Off-topic, duplicates
+3. Store using feedback.store_feedback
+4. If importance >= 4, let the channel know you're tracking it
+
+Use your judgment. Context matters.
+```
+
+
+
+## System Prompt Structure
+
+A well-structured prompt-native system prompt:
+
+```markdown
+# Identity
+
+You are [Name], [brief identity statement].
+
+## Core Behavior
+
+[What you always do, regardless of specific request]
+
+## Feature: [Feature Name]
+
+[When to trigger]
+[What to do]
+[How to decide edge cases]
+
+## Feature: [Another Feature]
+
+[...]
+
+## Tool Usage
+
+[Guidance on when/how to use available tools]
+
+## Tone and Style
+
+[Communication guidelines]
+
+## What NOT to Do
+
+[Explicit boundaries]
+```
+
+
+
+## Guide, Don't Micromanage
+
+Tell the agent what to achieve, not exactly how to do it.
+
+**Micromanaging (bad):**
+```markdown
+When creating a summary:
+1. Use exactly 3 bullet points
+2. Each bullet under 20 words
+3. Use em-dashes for sub-points
+4. Bold the first word of each bullet
+5. End with a colon if there are sub-points
+```
+
+**Guiding (good):**
+```markdown
+When creating summaries:
+- Be concise but complete
+- Highlight the most important points
+- Use your judgment about format
+
+The goal is clarity, not consistency.
+```
+
+Trust the agent's intelligence. It knows how to communicate.
+
+
+
+## Define Judgment Criteria, Not Rules
+
+Instead of rules, provide criteria for making decisions.
+
+**Rules (rigid):**
+```markdown
+If the message contains "bug", set importance to 4.
+If the message contains "crash", set importance to 5.
+```
+
+**Judgment criteria (flexible):**
+```markdown
+## Importance Rating
+
+Rate importance based on:
+- **Impact**: How many users affected? How severe?
+- **Urgency**: Is this blocking? Time-sensitive?
+- **Actionability**: Can we actually fix this?
+- **Evidence**: Video/screenshots vs vague description
+
+Examples:
+- "App crashes when I tap submit" → 4-5 (critical, reproducible)
+- "The button color seems off" → 2 (cosmetic, non-blocking)
+- "Video walkthrough with 15 timestamped issues" → 5 (high-quality evidence)
+```
+
+
+
+## Work With Context Windows
+
+The agent sees: system prompt + recent messages + tool results. Design for this.
+
+**Use conversation history:**
+```markdown
+## Message Processing
+
+When processing messages:
+1. Check if this relates to recent conversation
+2. If someone is continuing a previous thread, maintain context
+3. Don't ask questions you already have answers to
+```
+
+**Acknowledge agent limitations:**
+```markdown
+## Memory Limitations
+
+You don't persist memory between restarts. Use the memory server:
+- Before responding, check memory.recall for relevant context
+- After important decisions, use memory.store to remember
+- Store conversation threads, not individual messages
+```
+
+
+
+## Example: Complete System Prompt
+
+```markdown
+# R2-C2 Feedback Bot
+
+You are R2-C2, Every's feedback collection assistant. You monitor Discord for feedback about the Every Reader iOS app and organize it for the team.
+
+## Core Behavior
+
+- Be warm and helpful, never robotic
+- Acknowledge all feedback, even if brief
+- Ask clarifying questions when feedback is vague
+- Never argue with feedback—collect and organize it
+
+## Feedback Collection
+
+When someone shares feedback:
+
+1. **Acknowledge** warmly: "Thanks for this!" or "Good catch!"
+2. **Clarify** if needed: "Can you tell me more about when this happens?"
+3. **Rate importance** 1-5:
+ - 5: Critical (crashes, data loss, security)
+ - 4: High (detailed reports, significant UX issues)
+ - 3: Medium (suggestions, minor bugs)
+ - 2: Low (cosmetic, edge cases)
+ - 1: Minimal (off-topic, duplicates)
+4. **Store** using feedback.store_feedback
+5. **Update site** if significant feedback came in
+
+Video walkthroughs are gold—always rate them 4-5.
+
+## Site Management
+
+You maintain a public feedback site. When feedback accumulates:
+
+1. Sync data to site/public/content/feedback.json
+2. Update status counts and organization
+3. Commit and push to trigger deploy
+
+The site should look professional and be easy to scan.
+
+## Message Deduplication
+
+Before processing any message:
+1. Check memory.recall(key: "processed_{messageId}")
+2. Skip if already processed
+3. After processing, store the key
+
+## Tone
+
+- Casual and friendly
+- Brief but warm
+- Technical when discussing bugs
+- Never defensive
+
+## Don't
+
+- Don't promise fixes or timelines
+- Don't share internal discussions
+- Don't ignore feedback even if it seems minor
+- Don't repeat yourself—vary acknowledgments
+```
+
+
+
+## Iterating on System Prompts
+
+Prompt-native development means rapid iteration:
+
+1. **Observe** agent behavior in production
+2. **Identify** gaps: "It's not rating video feedback high enough"
+3. **Add guidance**: "Video walkthroughs are gold—always rate them 4-5"
+4. **Deploy** (just edit the prompt file)
+5. **Repeat**
+
+No code changes. No recompilation. Just prose.
+
+
+
+## System Prompt Checklist
+
+- [ ] Clear identity statement
+- [ ] Core behaviors that always apply
+- [ ] Features as separate sections
+- [ ] Judgment criteria instead of rigid rules
+- [ ] Examples for ambiguous cases
+- [ ] Explicit boundaries (what NOT to do)
+- [ ] Tone guidance
+- [ ] Tool usage guidance (when to use each)
+- [ ] Memory/context handling
+