From 27d07d068c4fd990308b4659caa2339ae04fc0cb Mon Sep 17 00:00:00 2001 From: Dan Shipper Date: Mon, 8 Dec 2025 16:23:31 -0800 Subject: [PATCH] Add agent-native-architecture skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New skill teaching prompt-native development patterns: - Features defined in prompts, not code - Tools as primitives that enable capability - "Whatever the user can do, the agent can do" - Self-modification patterns (advanced tier) - Refactoring guide for existing codebases πŸ€– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../skills/agent-native-architecture/SKILL.md | 201 +++++++++++ .../references/architecture-patterns.md | 215 ++++++++++++ .../references/mcp-tool-design.md | 316 +++++++++++++++++ .../refactoring-to-prompt-native.md | 317 ++++++++++++++++++ .../references/self-modification.md | 269 +++++++++++++++ .../references/system-prompt-design.md | 250 ++++++++++++++ 6 files changed, 1568 insertions(+) create mode 100644 plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md create mode 100644 plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md create mode 100644 plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md create mode 100644 plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md create mode 100644 plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md create mode 100644 plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md b/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md new file mode 100644 index 0000000..111c9e6 --- /dev/null +++ b/plugins/compounding-engineering/skills/agent-native-architecture/SKILL.md @@ -0,0 +1,201 @@ +--- +name: agent-native-architecture +description: Build AI agents using prompt-native architecture where features are defined in prompts, not code. Use when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy. +--- + + +## The Prompt-Native Philosophy + +Agent native engineering inverts traditional software architecture. Instead of writing code that the agent executes, you define outcomes in prompts and let the agent figure out HOW to achieve them. + +### The Foundational Principle + +**Whatever the user can do, the agent can do. Many things the developer can do, the agent can do.** + +Don't artificially limit the agent. If a user could read files, write code, browse the web, deploy an appβ€”the agent should be able to do those things too. The agent figures out HOW to achieve an outcome; it doesn't just call your pre-written functions. + +### Features Are Prompts + +Each feature is a prompt that defines an outcome and gives the agent the tools it needs. The agent then figures out how to accomplish it. + +**Traditional:** Feature = function in codebase that agent calls +**Prompt-native:** Feature = prompt defining desired outcome + primitive tools + +The agent doesn't execute your code. It uses primitives to achieve outcomes you describe. + +### Tools Provide Capability, Not Behavior + +Tools should be primitives that enable capability. The prompt defines what to do with that capability. + +**Wrong:** `generate_dashboard(data, layout, filters)` β€” agent executes your workflow +**Right:** `read_file`, `write_file`, `list_files` β€” agent figures out how to build a dashboard + +Pure primitives are better, but domain primitives (like `store_feedback`) are OK if they don't encode logicβ€”just storage/retrieval. + +### The Development Lifecycle + +1. **Start in the prompt** - New features begin as natural language defining outcomes +2. **Iterate rapidly** - Change behavior by editing prose, not refactoring code +3. **Graduate when stable** - Harden to code when requirements stabilize AND speed/reliability matter +4. **Many features stay as prompts** - Not everything needs to become code + +### Self-Modification (Advanced) + +The advanced tier: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future. + +When implementing: +- Approval gates for code changes +- Auto-commit before modifications (rollback capability) +- Health checks after changes +- Build verification before restart + +### When NOT to Use This Approach + +- **High-frequency operations** - thousands of calls per second +- **Deterministic requirements** - exact same output every time +- **Cost-sensitive scenarios** - when API costs would be prohibitive +- **High security** - though this is overblown for most apps + + + +What aspect of agent native architecture do you need help with? + +1. **Design architecture** - Plan a new prompt-native agent system +2. **Create MCP tools** - Build primitive tools following the philosophy +3. **Write system prompts** - Define agent behavior in prompts +4. **Self-modification** - Enable agents to safely evolve themselves +5. **Review/refactor** - Make existing code more prompt-native + +**Wait for response before proceeding.** + + + +| Response | Action | +|----------|--------| +| 1, "design", "architecture", "plan" | Read references/architecture-patterns.md | +| 2, "tool", "mcp", "primitive" | Read references/mcp-tool-design.md | +| 3, "prompt", "system prompt", "behavior" | Read references/system-prompt-design.md | +| 4, "self-modify", "evolve", "git" | Read references/self-modification.md | +| 5, "review", "refactor", "existing" | Read references/refactoring-to-prompt-native.md | + +**After reading the reference, apply those patterns to the user's specific context.** + + + +Build a prompt-native agent in three steps: + +**Step 1: Define primitive tools** +```typescript +const tools = [ + tool("read_file", "Read any file", { path: z.string() }, ...), + tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...), + tool("list_files", "List directory", { path: z.string() }, ...), +]; +``` + +**Step 2: Write behavior in the system prompt** +```markdown +## Your Responsibilities +When asked to organize content, you should: +1. Read existing files to understand the structure +2. Analyze what organization makes sense +3. Create appropriate pages using write_file +4. Use your judgment about layout and formatting + +You decide the structure. Make it good. +``` + +**Step 3: Let the agent work** +```typescript +query({ + prompt: userMessage, + options: { + systemPrompt, + mcpServers: { files: fileServer }, + permissionMode: "acceptEdits", + } +}); +``` + + + +## Domain Knowledge + +All references in `references/`: + +**Architecture:** architecture-patterns.md +**Tool Design:** mcp-tool-design.md +**Prompts:** system-prompt-design.md +**Self-Modification:** self-modification.md +**Refactoring:** refactoring-to-prompt-native.md + + + +## What NOT to Do + +**THE CARDINAL SIN: Agent executes your code instead of figuring things out** + +This is the most common mistake. You fall back into writing workflow code and having the agent call it, instead of defining outcomes and letting the agent figure out HOW. + +```typescript +// WRONG - You wrote the workflow, agent just executes it +tool("process_feedback", async ({ message }) => { + const category = categorize(message); // Your code + const priority = calculatePriority(message); // Your code + await store(message, category, priority); // Your code + if (priority > 3) await notify(); // Your code +}); + +// RIGHT - Agent figures out how to process feedback +tool("store_item", { key, value }, ...); // Primitive +tool("send_message", { channel, content }, ...); // Primitive +// Prompt says: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4" +``` + +**Don't artificially limit what the agent can do** + +If a user could do it, the agent should be able to do it. + +```typescript +// WRONG - limiting agent capabilities +tool("read_approved_files", { path }, async ({ path }) => { + if (!ALLOWED_PATHS.includes(path)) throw new Error("Not allowed"); + return readFile(path); +}); + +// RIGHT - give full capability, use guardrails appropriately +tool("read_file", { path }, ...); // Agent can read anything +// Use approval gates for writes, not artificial limits on reads +``` + +**Don't encode decisions in tools** +```typescript +// Wrong - tool decides format +tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) }, ...) + +// Right - agent decides format via prompt +tool("write_file", ...) // Agent chooses what to write +``` + +**Don't over-specify in prompts** +```markdown +// Wrong - micromanaging the HOW +When creating a summary, use exactly 3 bullet points, +each under 20 words, formatted with em-dashes... + +// Right - define outcome, trust intelligence +Create clear, useful summaries. Use your judgment. +``` + + + +You've built a prompt-native agent when: + +- [ ] The agent figures out HOW to achieve outcomes, not just calls your functions +- [ ] Whatever a user could do, the agent can do (no artificial limits) +- [ ] Features are prompts that define outcomes, not code that defines workflows +- [ ] Tools are primitives (read, write, store, call API) that enable capability +- [ ] Changing behavior means editing prose, not refactoring code +- [ ] The agent can surprise you with clever approaches you didn't anticipate +- [ ] You could add a new feature by writing a new prompt section, not new code + diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md new file mode 100644 index 0000000..a76a019 --- /dev/null +++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/architecture-patterns.md @@ -0,0 +1,215 @@ + +Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives. + + + +## Event-Driven Agent Architecture + +The agent runs as a long-lived process that responds to events. Events become prompts. + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Agent Loop β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ Event Source β†’ Agent (Claude) β†’ Tool Calls β†’ Response β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β–Ό β–Ό β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ Content β”‚ β”‚ Self β”‚ β”‚ Data β”‚ + β”‚ Tools β”‚ β”‚ Tools β”‚ β”‚ Tools β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + (write_file) (read_source) (store_item) + (restart) (list_items) +``` + +**Key characteristics:** +- Events (messages, webhooks, timers) trigger agent turns +- Agent decides how to respond based on system prompt +- Tools are primitives for IO, not business logic +- State persists between events via data tools + +**Example: Discord feedback bot** +```typescript +// Event source +client.on("messageCreate", (message) => { + if (!message.author.bot) { + runAgent({ + userMessage: `New message from ${message.author}: "${message.content}"`, + channelId: message.channelId, + }); + } +}); + +// System prompt defines behavior +const systemPrompt = ` +When someone shares feedback: +1. Acknowledge their feedback warmly +2. Ask clarifying questions if needed +3. Store it using the feedback tools +4. Update the feedback site + +Use your judgment about importance and categorization. +`; +``` + + + +## Two-Layer Git Architecture + +For self-modifying agents, separate code (shared) from data (instance-specific). + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ GitHub (shared repo) β”‚ +β”‚ - src/ (agent code) β”‚ +β”‚ - site/ (web interface) β”‚ +β”‚ - package.json (dependencies) β”‚ +β”‚ - .gitignore (excludes data/, logs/) β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + git clone + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Instance (Server) β”‚ +β”‚ β”‚ +β”‚ FROM GITHUB (tracked): β”‚ +β”‚ - src/ β†’ pushed back on code changes β”‚ +β”‚ - site/ β†’ pushed, triggers deployment β”‚ +β”‚ β”‚ +β”‚ LOCAL ONLY (untracked): β”‚ +β”‚ - data/ β†’ instance-specific storage β”‚ +β”‚ - logs/ β†’ runtime logs β”‚ +β”‚ - .env β†’ secrets β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +**Why this works:** +- Code and site are version controlled (GitHub) +- Raw data stays local (instance-specific) +- Site is generated from data, so reproducible +- Automatic rollback via git history + + + +## Multi-Instance Branching + +Each agent instance gets its own branch while sharing core code. + +``` +main # Shared features, bug fixes +β”œβ”€β”€ instance/feedback-bot # Every Reader feedback bot +β”œβ”€β”€ instance/support-bot # Customer support bot +└── instance/research-bot # Research assistant +``` + +**Change flow:** +| Change Type | Work On | Then | +|-------------|---------|------| +| Core features | main | Merge to instance branches | +| Bug fixes | main | Merge to instance branches | +| Instance config | instance branch | Done | +| Instance data | instance branch | Done | + +**Sync tools:** +```typescript +tool("self_deploy", "Pull latest from main, rebuild, restart", ...) +tool("sync_from_instance", "Merge from another instance", ...) +tool("propose_to_main", "Create PR to share improvements", ...) +``` + + + +## Site as Agent Output + +The agent generates and maintains a website as a natural output, not through specialized site tools. + +``` +Discord Message + ↓ +Agent processes it, extracts insights + ↓ +Agent decides what site updates are needed + ↓ +Agent writes files using write_file primitive + ↓ +Git commit + push triggers deployment + ↓ +Site updates automatically +``` + +**Key insight:** Don't build site generation tools. Give the agent file tools and teach it in the prompt how to create good sites. + +```markdown +## Site Management + +You maintain a public feedback site. When feedback comes in: +1. Use write_file to update site/public/content/feedback.json +2. If the site's React components need improvement, modify them +3. Commit changes and push to trigger Vercel deploy + +The site should be: +- Clean, modern dashboard aesthetic +- Clear visual hierarchy +- Status organization (Inbox, Active, Done) + +You decide the structure. Make it good. +``` + + + +## Approval Gates Pattern + +Separate "propose" from "apply" for dangerous operations. + +```typescript +// Pending changes stored separately +const pendingChanges = new Map(); + +tool("write_file", async ({ path, content }) => { + if (requiresApproval(path)) { + // Store for approval + pendingChanges.set(path, content); + const diff = generateDiff(path, content); + return { + text: `Change requires approval.\n\n${diff}\n\nReply "yes" to apply.` + }; + } else { + // Apply immediately + writeFileSync(path, content); + return { text: `Wrote ${path}` }; + } +}); + +tool("apply_pending", async () => { + for (const [path, content] of pendingChanges) { + writeFileSync(path, content); + } + pendingChanges.clear(); + return { text: "Applied all pending changes" }; +}); +``` + +**What requires approval:** +- src/*.ts (agent code) +- package.json (dependencies) +- system prompt changes + +**What doesn't:** +- data/* (instance data) +- site/* (generated content) +- docs/* (documentation) + + + +## Questions to Ask When Designing + +1. **What events trigger agent turns?** (messages, webhooks, timers, user requests) +2. **What primitives does the agent need?** (read, write, call API, restart) +3. **What decisions should the agent make?** (format, structure, priority, action) +4. **What decisions should be hardcoded?** (security boundaries, approval requirements) +5. **How does the agent verify its work?** (health checks, build verification) +6. **How does the agent recover from mistakes?** (git rollback, approval gates) + diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md new file mode 100644 index 0000000..f7133da --- /dev/null +++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/mcp-tool-design.md @@ -0,0 +1,316 @@ + +How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions. + +**Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agentβ€”give it the same primitives a power user would have. + + + +## Tools Are Primitives, Not Workflows + +**Wrong approach:** Tools that encode business logic +```typescript +tool("process_feedback", { + feedback: z.string(), + category: z.enum(["bug", "feature", "question"]), + priority: z.enum(["low", "medium", "high"]), +}, async ({ feedback, category, priority }) => { + // Tool decides how to process + const processed = categorize(feedback); + const stored = await saveToDatabase(processed); + const notification = await notify(priority); + return { processed, stored, notification }; +}); +``` + +**Right approach:** Primitives that enable any workflow +```typescript +tool("store_item", { + key: z.string(), + value: z.any(), +}, async ({ key, value }) => { + await db.set(key, value); + return { text: `Stored ${key}` }; +}); + +tool("send_message", { + channel: z.string(), + content: z.string(), +}, async ({ channel, content }) => { + await messenger.send(channel, content); + return { text: "Sent" }; +}); +``` + +The agent decides categorization, priority, and when to notify based on the system prompt. + + + +## Tools Should Have Descriptive, Primitive Names + +Names should describe the capability, not the use case: + +| Wrong | Right | +|-------|-------| +| `process_user_feedback` | `store_item` | +| `create_feedback_summary` | `write_file` | +| `send_notification` | `send_message` | +| `deploy_to_production` | `git_push` | + +The prompt tells the agent *when* to use primitives. The tool just provides *capability*. + + + +## Inputs Should Be Simple + +Tools accept data. They don't accept decisions. + +**Wrong:** Tool accepts decisions +```typescript +tool("format_content", { + content: z.string(), + format: z.enum(["markdown", "html", "json"]), + style: z.enum(["formal", "casual", "technical"]), +}, ...) +``` + +**Right:** Tool accepts data, agent decides format +```typescript +tool("write_file", { + path: z.string(), + content: z.string(), +}, ...) +// Agent decides to write index.html with HTML content, or data.json with JSON +``` + + + +## Outputs Should Be Rich + +Return enough information for the agent to verify and iterate. + +**Wrong:** Minimal output +```typescript +async ({ key }) => { + await db.delete(key); + return { text: "Deleted" }; +} +``` + +**Right:** Rich output +```typescript +async ({ key }) => { + const existed = await db.has(key); + if (!existed) { + return { text: `Key ${key} did not exist` }; + } + await db.delete(key); + return { text: `Deleted ${key}. ${await db.count()} items remaining.` }; +} +``` + + + +## Tool Design Template + +```typescript +import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk"; +import { z } from "zod"; + +export const serverName = createSdkMcpServer({ + name: "server-name", + version: "1.0.0", + tools: [ + // READ operations + tool( + "read_item", + "Read an item by key", + { key: z.string().describe("Item key") }, + async ({ key }) => { + const item = await storage.get(key); + return { + content: [{ + type: "text", + text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`, + }], + isError: !item, + }; + } + ), + + tool( + "list_items", + "List all items, optionally filtered", + { + prefix: z.string().optional().describe("Filter by key prefix"), + limit: z.number().default(100).describe("Max items"), + }, + async ({ prefix, limit }) => { + const items = await storage.list({ prefix, limit }); + return { + content: [{ + type: "text", + text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`, + }], + }; + } + ), + + // WRITE operations + tool( + "store_item", + "Store an item", + { + key: z.string().describe("Item key"), + value: z.any().describe("Item data"), + }, + async ({ key, value }) => { + await storage.set(key, value); + return { + content: [{ type: "text", text: `Stored ${key}` }], + }; + } + ), + + tool( + "delete_item", + "Delete an item", + { key: z.string().describe("Item key") }, + async ({ key }) => { + const existed = await storage.delete(key); + return { + content: [{ + type: "text", + text: existed ? `Deleted ${key}` : `${key} did not exist`, + }], + }; + } + ), + + // EXTERNAL operations + tool( + "call_api", + "Make an HTTP request", + { + url: z.string().url(), + method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"), + body: z.any().optional(), + }, + async ({ url, method, body }) => { + const response = await fetch(url, { method, body: JSON.stringify(body) }); + const text = await response.text(); + return { + content: [{ + type: "text", + text: `${response.status} ${response.statusText}\n\n${text}`, + }], + isError: !response.ok, + }; + } + ), + ], +}); +``` + + + +## Example: Feedback Storage Server + +This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedbackβ€”that's the agent's job via the prompt. + +```typescript +export const feedbackMcpServer = createSdkMcpServer({ + name: "feedback", + version: "1.0.0", + tools: [ + tool( + "store_feedback", + "Store a feedback item", + { + item: z.object({ + id: z.string(), + author: z.string(), + content: z.string(), + importance: z.number().min(1).max(5), + timestamp: z.string(), + status: z.string().optional(), + urls: z.array(z.string()).optional(), + metadata: z.any().optional(), + }).describe("Feedback item"), + }, + async ({ item }) => { + await db.feedback.insert(item); + return { + content: [{ + type: "text", + text: `Stored feedback ${item.id} from ${item.author}`, + }], + }; + } + ), + + tool( + "list_feedback", + "List feedback items", + { + limit: z.number().default(50), + status: z.string().optional(), + }, + async ({ limit, status }) => { + const items = await db.feedback.list({ limit, status }); + return { + content: [{ + type: "text", + text: JSON.stringify(items, null, 2), + }], + }; + } + ), + + tool( + "update_feedback", + "Update a feedback item", + { + id: z.string(), + updates: z.object({ + status: z.string().optional(), + importance: z.number().optional(), + metadata: z.any().optional(), + }), + }, + async ({ id, updates }) => { + await db.feedback.update(id, updates); + return { + content: [{ type: "text", text: `Updated ${id}` }], + }; + } + ), + ], +}); +``` + +The system prompt then tells the agent *how* to use these primitives: + +```markdown +## Feedback Processing + +When someone shares feedback: +1. Extract author, content, and any URLs +2. Rate importance 1-5 based on actionability +3. Store using feedback.store_feedback +4. If high importance (4-5), notify the channel + +Use your judgment about importance ratings. +``` + + + +## MCP Tool Design Checklist + +- [ ] Tool names describe capability, not use case +- [ ] Inputs are data, not decisions +- [ ] Outputs are rich (enough for agent to verify) +- [ ] CRUD operations are separate tools (not one mega-tool) +- [ ] No business logic in tool implementations +- [ ] Error states clearly communicated via `isError` +- [ ] Descriptions explain what the tool does, not when to use it + diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md new file mode 100644 index 0000000..03e94ef --- /dev/null +++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md @@ -0,0 +1,317 @@ + +How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives. + + + +## Diagnosing Non-Prompt-Native Code + +Signs your agent isn't prompt-native: + +**Tools that encode workflows:** +```typescript +// RED FLAG: Tool contains business logic +tool("process_feedback", async ({ message }) => { + const category = categorize(message); // Logic in code + const priority = calculatePriority(message); // Logic in code + await store(message, category, priority); // Orchestration in code + if (priority > 3) await notify(); // Decision in code +}); +``` + +**Agent calls functions instead of figuring things out:** +```typescript +// RED FLAG: Agent is just a function caller +"Use process_feedback to handle incoming messages" +// vs. +"When feedback comes in, decide importance, store it, notify if high" +``` + +**Artificial limits on agent capability:** +```typescript +// RED FLAG: Tool prevents agent from doing what users can do +tool("read_file", async ({ path }) => { + if (!ALLOWED_PATHS.includes(path)) { + throw new Error("Not allowed to read this file"); + } + return readFile(path); +}); +``` + +**Prompts that specify HOW instead of WHAT:** +```markdown +// RED FLAG: Micromanaging the agent +When creating a summary: +1. Use exactly 3 bullet points +2. Each bullet must be under 20 words +3. Format with em-dashes for sub-points +4. Bold the first word of each bullet +``` + + + +## Step-by-Step Refactoring + +**Step 1: Identify workflow tools** + +List all your tools. Mark any that: +- Have business logic (categorize, calculate, decide) +- Orchestrate multiple operations +- Make decisions on behalf of the agent +- Contain conditional logic (if/else based on content) + +**Step 2: Extract the primitives** + +For each workflow tool, identify the underlying primitives: + +| Workflow Tool | Hidden Primitives | +|---------------|-------------------| +| `process_feedback` | `store_item`, `send_message` | +| `generate_report` | `read_file`, `write_file` | +| `deploy_and_notify` | `git_push`, `send_message` | + +**Step 3: Move behavior to the prompt** + +Take the logic from your workflow tools and express it in natural language: + +```typescript +// Before (in code): +async function processFeedback(message) { + const priority = message.includes("crash") ? 5 : + message.includes("bug") ? 4 : 3; + await store(message, priority); + if (priority >= 4) await notify(); +} +``` + +```markdown +// After (in prompt): +## Feedback Processing + +When someone shares feedback: +1. Rate importance 1-5: + - 5: Crashes, data loss, security issues + - 4: Bug reports with clear reproduction steps + - 3: General suggestions, minor issues +2. Store using store_item +3. If importance >= 4, notify the team + +Use your judgment. Context matters more than keywords. +``` + +**Step 4: Simplify tools to primitives** + +```typescript +// Before: 1 workflow tool +tool("process_feedback", { message, category, priority }, ...complex logic...) + +// After: 2 primitive tools +tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...) +tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...) +``` + +**Step 5: Remove artificial limits** + +```typescript +// Before: Limited capability +tool("read_file", async ({ path }) => { + if (!isAllowed(path)) throw new Error("Forbidden"); + return readFile(path); +}); + +// After: Full capability +tool("read_file", async ({ path }) => { + return readFile(path); // Agent can read anything +}); +// Use approval gates for WRITES, not artificial limits on READS +``` + +**Step 6: Test with outcomes, not procedures** + +Instead of testing "does it call the right function?", test "does it achieve the outcome?" + +```typescript +// Before: Testing procedure +expect(mockProcessFeedback).toHaveBeenCalledWith(...) + +// After: Testing outcome +// Send feedback β†’ Check it was stored with reasonable importance +// Send high-priority feedback β†’ Check notification was sent +``` + + + +## Before/After Examples + +**Example 1: Feedback Processing** + +Before: +```typescript +tool("handle_feedback", async ({ message, author }) => { + const category = detectCategory(message); + const priority = calculatePriority(message, category); + const feedbackId = await db.feedback.insert({ + id: generateId(), + author, + message, + category, + priority, + timestamp: new Date().toISOString(), + }); + + if (priority >= 4) { + await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`); + } + + return { feedbackId, category, priority }; +}); +``` + +After: +```typescript +// Simple storage primitive +tool("store_feedback", async ({ item }) => { + await db.feedback.insert(item); + return { text: `Stored feedback ${item.id}` }; +}); + +// Simple message primitive +tool("send_message", async ({ channel, content }) => { + await discord.send(channel, content); + return { text: "Sent" }; +}); +``` + +System prompt: +```markdown +## Feedback Processing + +When someone shares feedback: +1. Generate a unique ID +2. Rate importance 1-5 based on impact and urgency +3. Store using store_feedback with the full item +4. If importance >= 4, send a notification to the team channel + +Importance guidelines: +- 5: Critical (crashes, data loss, security) +- 4: High (detailed bug reports, blocking issues) +- 3: Medium (suggestions, minor bugs) +- 2: Low (cosmetic, edge cases) +- 1: Minimal (off-topic, duplicates) +``` + +**Example 2: Report Generation** + +Before: +```typescript +tool("generate_weekly_report", async ({ startDate, endDate, format }) => { + const data = await fetchMetrics(startDate, endDate); + const summary = summarizeMetrics(data); + const charts = generateCharts(data); + + if (format === "html") { + return renderHtmlReport(summary, charts); + } else if (format === "markdown") { + return renderMarkdownReport(summary, charts); + } else { + return renderPdfReport(summary, charts); + } +}); +``` + +After: +```typescript +tool("query_metrics", async ({ start, end }) => { + const data = await db.metrics.query({ start, end }); + return { text: JSON.stringify(data, null, 2) }; +}); + +tool("write_file", async ({ path, content }) => { + writeFileSync(path, content); + return { text: `Wrote ${path}` }; +}); +``` + +System prompt: +```markdown +## Report Generation + +When asked to generate a report: +1. Query the relevant metrics using query_metrics +2. Analyze the data and identify key trends +3. Create a clear, well-formatted report +4. Write it using write_file in the appropriate format + +Use your judgment about format and structure. Make it useful. +``` + + + +## Common Refactoring Challenges + +**"But the agent might make mistakes!"** + +Yes, and you can iterate. Change the prompt to add guidance: +```markdown +// Before +Rate importance 1-5. + +// After (if agent keeps rating too high) +Rate importance 1-5. Be conservativeβ€”most feedback is 2-3. +Only use 4-5 for truly blocking or critical issues. +``` + +**"The workflow is complex!"** + +Complex workflows can still be expressed in prompts. The agent is smart. +```markdown +When processing video feedback: +1. Check if it's a Loom, YouTube, or direct link +2. For YouTube, pass URL directly to video analysis +3. For others, download first, then analyze +4. Extract timestamped issues +5. Rate based on issue density and severity +``` + +**"We need deterministic behavior!"** + +Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing. + +Keep in code: +- Security validation +- Rate limiting +- Audit logging +- Exact format requirements + +Move to prompts: +- Categorization decisions +- Priority judgments +- Content generation +- Workflow orchestration + +**"What about testing?"** + +Test outcomes, not procedures: +- "Given this input, does the agent achieve the right result?" +- "Does stored feedback have reasonable importance ratings?" +- "Are notifications sent for truly high-priority items?" + + + +## Refactoring Checklist + +Diagnosis: +- [ ] Listed all tools with business logic +- [ ] Identified artificial limits on agent capability +- [ ] Found prompts that micromanage HOW + +Refactoring: +- [ ] Extracted primitives from workflow tools +- [ ] Moved business logic to system prompt +- [ ] Removed artificial limits +- [ ] Simplified tool inputs to data, not decisions + +Validation: +- [ ] Agent achieves same outcomes with primitives +- [ ] Behavior can be changed by editing prompts +- [ ] New features could be added without new tools + diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md new file mode 100644 index 0000000..7bad83a --- /dev/null +++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/self-modification.md @@ -0,0 +1,269 @@ + +Self-modification is the advanced tier of agent native engineering: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future. + +This is the logical extension of "whatever the developer can do, the agent can do." + + + +## Why Self-Modification? + +Traditional software is staticβ€”it does what you wrote, nothing more. Self-modifying agents can: + +- **Fix their own bugs** - See an error, patch the code, restart +- **Add new capabilities** - User asks for something new, agent implements it +- **Evolve behavior** - Learn from feedback and adjust prompts +- **Deploy themselves** - Push code, trigger builds, restart + +The agent becomes a living system that improves over time, not frozen code. + + + +## What Self-Modification Enables + +**Code modification:** +- Read and understand source files +- Write fixes and new features +- Commit and push to version control +- Trigger builds and verify they pass + +**Prompt evolution:** +- Edit the system prompt based on feedback +- Add new features as prompt sections +- Refine judgment criteria that aren't working + +**Infrastructure control:** +- Pull latest code from upstream +- Merge from other branches/instances +- Restart after changes +- Roll back if something breaks + +**Site/output generation:** +- Generate and maintain websites +- Create documentation +- Build dashboards from data + + + +## Required Guardrails + +Self-modification is powerful. It needs safety mechanisms. + +**Approval gates for code changes:** +```typescript +tool("write_file", async ({ path, content }) => { + if (isCodeFile(path)) { + // Store for approval, don't apply immediately + pendingChanges.set(path, content); + const diff = generateDiff(path, content); + return { text: `Requires approval:\n\n${diff}\n\nReply "yes" to apply.` }; + } + // Non-code files apply immediately + writeFileSync(path, content); + return { text: `Wrote ${path}` }; +}); +``` + +**Auto-commit before changes:** +```typescript +tool("self_deploy", async () => { + // Save current state first + runGit("stash"); // or commit uncommitted changes + + // Then pull/merge + runGit("fetch origin"); + runGit("merge origin/main --no-edit"); + + // Build and verify + runCommand("npm run build"); + + // Only then restart + scheduleRestart(); +}); +``` + +**Build verification:** +```typescript +// Don't restart unless build passes +try { + runCommand("npm run build", { timeout: 120000 }); +} catch (error) { + // Rollback the merge + runGit("merge --abort"); + return { text: "Build failed, aborting deploy", isError: true }; +} +``` + +**Health checks after restart:** +```typescript +tool("health_check", async () => { + const uptime = process.uptime(); + const buildValid = existsSync("dist/index.js"); + const gitClean = !runGit("status --porcelain"); + + return { + text: JSON.stringify({ + status: "healthy", + uptime: `${Math.floor(uptime / 60)}m`, + build: buildValid ? "valid" : "missing", + git: gitClean ? "clean" : "uncommitted changes", + }, null, 2), + }; +}); +``` + + + +## Git-Based Self-Modification + +Use git as the foundation for self-modification. It provides: +- Version history (rollback capability) +- Branching (experiment safely) +- Merge (sync with other instances) +- Push/pull (deploy and collaborate) + +**Essential git tools:** +```typescript +tool("status", "Show git status", {}, ...); +tool("diff", "Show file changes", { path: z.string().optional() }, ...); +tool("log", "Show commit history", { count: z.number() }, ...); +tool("commit_code", "Commit code changes", { message: z.string() }, ...); +tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...); +tool("pull", "Pull from GitHub", { source: z.enum(["main", "instance"]) }, ...); +tool("rollback", "Revert recent commits", { commits: z.number() }, ...); +``` + +**Multi-instance architecture:** +``` +main # Shared code +β”œβ”€β”€ instance/bot-a # Instance A's branch +β”œβ”€β”€ instance/bot-b # Instance B's branch +└── instance/bot-c # Instance C's branch +``` + +Each instance can: +- Pull updates from main +- Push improvements back to main (via PR) +- Sync features from other instances +- Maintain instance-specific config + + + +## Self-Modifying Prompts + +The system prompt is a file the agent can read and write. + +```typescript +// Agent can read its own prompt +tool("read_file", ...); // Can read src/prompts/system.md + +// Agent can propose changes +tool("write_file", ...); // Can write to src/prompts/system.md (with approval) +``` + +**System prompt as living document:** +```markdown +## Feedback Processing + +When someone shares feedback: +1. Acknowledge warmly +2. Rate importance 1-5 +3. Store using feedback tools + + +``` + +The agent can: +- Add notes to itself +- Refine judgment criteria +- Add new feature sections +- Document edge cases it learned + + + +## When to Implement Self-Modification + +**Good candidates:** +- Long-running autonomous agents +- Agents that need to adapt to feedback +- Systems where behavior evolution is valuable +- Internal tools where rapid iteration matters + +**Not necessary for:** +- Simple single-task agents +- Highly regulated environments +- Systems where behavior must be auditable +- One-off or short-lived agents + +Start with a non-self-modifying prompt-native agent. Add self-modification when you need it. + + + +## Complete Self-Modification Toolset + +```typescript +const selfMcpServer = createSdkMcpServer({ + name: "self", + version: "1.0.0", + tools: [ + // FILE OPERATIONS + tool("read_file", "Read any project file", { path: z.string() }, ...), + tool("write_file", "Write a file (code requires approval)", { path, content }, ...), + tool("list_files", "List directory contents", { path: z.string() }, ...), + tool("search_code", "Search for patterns", { pattern: z.string() }, ...), + + // APPROVAL WORKFLOW + tool("apply_pending", "Apply approved changes", {}, ...), + tool("get_pending", "Show pending changes", {}, ...), + tool("clear_pending", "Discard pending changes", {}, ...), + + // RESTART + tool("restart", "Rebuild and restart", {}, ...), + tool("health_check", "Check if bot is healthy", {}, ...), + ], +}); + +const gitMcpServer = createSdkMcpServer({ + name: "git", + version: "1.0.0", + tools: [ + // STATUS + tool("status", "Show git status", {}, ...), + tool("diff", "Show changes", { path: z.string().optional() }, ...), + tool("log", "Show history", { count: z.number() }, ...), + + // COMMIT & PUSH + tool("commit_code", "Commit code changes", { message: z.string() }, ...), + tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...), + + // SYNC + tool("pull", "Pull from upstream", { source: z.enum(["main", "instance"]) }, ...), + tool("self_deploy", "Pull, build, restart", { source: z.enum(["main", "instance"]) }, ...), + + // SAFETY + tool("rollback", "Revert commits", { commits: z.number() }, ...), + tool("health_check", "Detailed health report", {}, ...), + ], +}); +``` + + + +## Self-Modification Checklist + +Before enabling self-modification: +- [ ] Git-based version control set up +- [ ] Approval gates for code changes +- [ ] Build verification before restart +- [ ] Rollback mechanism available +- [ ] Health check endpoint +- [ ] Instance identity configured + +When implementing: +- [ ] Agent can read all project files +- [ ] Agent can write files (with appropriate approval) +- [ ] Agent can commit and push +- [ ] Agent can pull updates +- [ ] Agent can restart itself +- [ ] Agent can roll back if needed + diff --git a/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md b/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md new file mode 100644 index 0000000..377f45f --- /dev/null +++ b/plugins/compounding-engineering/skills/agent-native-architecture/references/system-prompt-design.md @@ -0,0 +1,250 @@ + +How to write system prompts for prompt-native agents. The system prompt is where features liveβ€”it defines behavior, judgment criteria, and decision-making without encoding them in code. + + + +## Features Are Prompt Sections + +Each feature is a section of the system prompt that tells the agent how to behave. + +**Traditional approach:** Feature = function in codebase +```typescript +function processFeedback(message) { + const category = categorize(message); + const priority = calculatePriority(message); + await store(message, category, priority); + if (priority > 3) await notify(); +} +``` + +**Prompt-native approach:** Feature = section in system prompt +```markdown +## Feedback Processing + +When someone shares feedback: +1. Read the message to understand what they're saying +2. Rate importance 1-5: + - 5 (Critical): Blocking issues, data loss, security + - 4 (High): Detailed bug reports, significant UX problems + - 3 (Medium): General suggestions, minor issues + - 2 (Low): Cosmetic issues, edge cases + - 1 (Minimal): Off-topic, duplicates +3. Store using feedback.store_feedback +4. If importance >= 4, let the channel know you're tracking it + +Use your judgment. Context matters. +``` + + + +## System Prompt Structure + +A well-structured prompt-native system prompt: + +```markdown +# Identity + +You are [Name], [brief identity statement]. + +## Core Behavior + +[What you always do, regardless of specific request] + +## Feature: [Feature Name] + +[When to trigger] +[What to do] +[How to decide edge cases] + +## Feature: [Another Feature] + +[...] + +## Tool Usage + +[Guidance on when/how to use available tools] + +## Tone and Style + +[Communication guidelines] + +## What NOT to Do + +[Explicit boundaries] +``` + + + +## Guide, Don't Micromanage + +Tell the agent what to achieve, not exactly how to do it. + +**Micromanaging (bad):** +```markdown +When creating a summary: +1. Use exactly 3 bullet points +2. Each bullet under 20 words +3. Use em-dashes for sub-points +4. Bold the first word of each bullet +5. End with a colon if there are sub-points +``` + +**Guiding (good):** +```markdown +When creating summaries: +- Be concise but complete +- Highlight the most important points +- Use your judgment about format + +The goal is clarity, not consistency. +``` + +Trust the agent's intelligence. It knows how to communicate. + + + +## Define Judgment Criteria, Not Rules + +Instead of rules, provide criteria for making decisions. + +**Rules (rigid):** +```markdown +If the message contains "bug", set importance to 4. +If the message contains "crash", set importance to 5. +``` + +**Judgment criteria (flexible):** +```markdown +## Importance Rating + +Rate importance based on: +- **Impact**: How many users affected? How severe? +- **Urgency**: Is this blocking? Time-sensitive? +- **Actionability**: Can we actually fix this? +- **Evidence**: Video/screenshots vs vague description + +Examples: +- "App crashes when I tap submit" β†’ 4-5 (critical, reproducible) +- "The button color seems off" β†’ 2 (cosmetic, non-blocking) +- "Video walkthrough with 15 timestamped issues" β†’ 5 (high-quality evidence) +``` + + + +## Work With Context Windows + +The agent sees: system prompt + recent messages + tool results. Design for this. + +**Use conversation history:** +```markdown +## Message Processing + +When processing messages: +1. Check if this relates to recent conversation +2. If someone is continuing a previous thread, maintain context +3. Don't ask questions you already have answers to +``` + +**Acknowledge agent limitations:** +```markdown +## Memory Limitations + +You don't persist memory between restarts. Use the memory server: +- Before responding, check memory.recall for relevant context +- After important decisions, use memory.store to remember +- Store conversation threads, not individual messages +``` + + + +## Example: Complete System Prompt + +```markdown +# R2-C2 Feedback Bot + +You are R2-C2, Every's feedback collection assistant. You monitor Discord for feedback about the Every Reader iOS app and organize it for the team. + +## Core Behavior + +- Be warm and helpful, never robotic +- Acknowledge all feedback, even if brief +- Ask clarifying questions when feedback is vague +- Never argue with feedbackβ€”collect and organize it + +## Feedback Collection + +When someone shares feedback: + +1. **Acknowledge** warmly: "Thanks for this!" or "Good catch!" +2. **Clarify** if needed: "Can you tell me more about when this happens?" +3. **Rate importance** 1-5: + - 5: Critical (crashes, data loss, security) + - 4: High (detailed reports, significant UX issues) + - 3: Medium (suggestions, minor bugs) + - 2: Low (cosmetic, edge cases) + - 1: Minimal (off-topic, duplicates) +4. **Store** using feedback.store_feedback +5. **Update site** if significant feedback came in + +Video walkthroughs are goldβ€”always rate them 4-5. + +## Site Management + +You maintain a public feedback site. When feedback accumulates: + +1. Sync data to site/public/content/feedback.json +2. Update status counts and organization +3. Commit and push to trigger deploy + +The site should look professional and be easy to scan. + +## Message Deduplication + +Before processing any message: +1. Check memory.recall(key: "processed_{messageId}") +2. Skip if already processed +3. After processing, store the key + +## Tone + +- Casual and friendly +- Brief but warm +- Technical when discussing bugs +- Never defensive + +## Don't + +- Don't promise fixes or timelines +- Don't share internal discussions +- Don't ignore feedback even if it seems minor +- Don't repeat yourselfβ€”vary acknowledgments +``` + + + +## Iterating on System Prompts + +Prompt-native development means rapid iteration: + +1. **Observe** agent behavior in production +2. **Identify** gaps: "It's not rating video feedback high enough" +3. **Add guidance**: "Video walkthroughs are goldβ€”always rate them 4-5" +4. **Deploy** (just edit the prompt file) +5. **Repeat** + +No code changes. No recompilation. Just prose. + + + +## System Prompt Checklist + +- [ ] Clear identity statement +- [ ] Core behaviors that always apply +- [ ] Features as separate sections +- [ ] Judgment criteria instead of rigid rules +- [ ] Examples for ambiguous cases +- [ ] Explicit boundaries (what NOT to do) +- [ ] Tone guidance +- [ ] Tool usage guidance (when to use each) +- [ ] Memory/context handling +