[2.10.0] Add agent-native reviewer and architecture skill

- Add agent-native-reviewer agent to verify features are agent-accessible
- Add agent-native-architecture skill for prompt-native design patterns
- Add agent-native-reviewer to /review command parallel agents
- Move agent-native skill to correct plugin folder
- Update component counts (25 agents, 12 skills)
- Include mermaid dark mode fix from PR #45

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Kieran Klaassen
2025-12-10 11:26:02 -08:00
parent abeb76c485
commit 4ea9f52ba9
12 changed files with 122 additions and 7 deletions

View File

@@ -1,7 +1,7 @@
{
"name": "compound-engineering",
"version": "2.9.4",
"description": "AI-powered development tools. 24 agents, 19 commands, 11 skills, 2 MCP servers for code review, research, design, and workflow automation.",
"version": "2.10.0",
"description": "AI-powered development tools. 25 agents, 19 commands, 12 skills, 2 MCP servers for code review, research, design, and workflow automation.",
"author": {
"name": "Kieran Klaassen",
"email": "kieran@every.to",

View File

@@ -5,6 +5,21 @@ All notable changes to the compound-engineering plugin will be documented in thi
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.10.0] - 2025-12-10
### Added
- **`agent-native-reviewer` agent** - New review agent that verifies features are agent-native. Checks that any action a user can take, an agent can also take (Action Parity), and anything a user can see, an agent can see (Context Parity). Enforces the principle: "Whatever the user can do, the agent can do."
- **`agent-native-architecture` skill** - Build AI agents using prompt-native architecture where features are defined in prompts, not code. Includes patterns for MCP tool design, system prompts, self-modification, and refactoring to prompt-native.
### Changed
- **`/review` command** - Added `agent-native-reviewer` to the parallel review agents. Code reviews now automatically check if new features are accessible to agents.
### Fixed
- **Documentation** - Fixed mermaid diagram legibility in dark mode by changing stroke color to white (PR #45 by @rickmanelius)
## [2.9.4] - 2025-12-08
### Changed

View File

@@ -6,19 +6,20 @@ AI-powered development tools that get smarter with every use. Make each unit of
| Component | Count |
|-----------|-------|
| Agents | 24 |
| Agents | 25 |
| Commands | 19 |
| Skills | 11 |
| Skills | 12 |
| MCP Servers | 2 |
## Agents
Agents are organized into categories for easier discovery.
### Review (11)
### Review (12)
| Agent | Description |
|-------|-------------|
| `agent-native-reviewer` | Verify features are agent-native (action + context parity) |
| `architecture-strategist` | Analyze architectural decisions and compliance |
| `code-simplicity-reviewer` | Final pass for simplicity and minimalism |
| `data-integrity-guardian` | Database migrations and data integrity |
@@ -96,6 +97,12 @@ Core workflow commands (use the short form for autocomplete):
## Skills
### Architecture & Design
| Skill | Description |
|-------|-------------|
| `agent-native-architecture` | Build AI agents using prompt-native architecture |
### Development Tools
| Skill | Description |

View File

@@ -0,0 +1,91 @@
---
name: agent-native-reviewer
description: Use this agent when reviewing code to ensure features are agent-native - that any action a user can take, an agent can also take, and anything a user can see, an agent can see. This enforces the principle that agents should have parity with users in capability and context. <example>Context: The user added a new feature to their application.\nuser: "I just implemented a new email filtering feature"\nassistant: "I'll use the agent-native-reviewer to verify this feature is accessible to agents"\n<commentary>New features need agent-native review to ensure agents can also filter emails, not just humans through UI.</commentary></example><example>Context: The user created a new UI workflow.\nuser: "I added a multi-step wizard for creating reports"\nassistant: "Let me check if this workflow is agent-native using the agent-native-reviewer"\n<commentary>UI workflows often miss agent accessibility - the reviewer checks for API/tool equivalents.</commentary></example>
---
You are an Agent-Native Architecture Reviewer. Your role is to ensure that every feature added to a codebase follows the agent-native principle:
**THE FOUNDATIONAL PRINCIPLE: Whatever the user can do, the agent can do. Whatever the user can see, the agent can see.**
## Your Review Criteria
For every new feature or change, verify:
### 1. Action Parity
- [ ] Every UI action has an equivalent API/tool the agent can call
- [ ] No "UI-only" workflows that require human interaction
- [ ] Agents can trigger the same business logic humans can
- [ ] No artificial limits on agent capabilities
### 2. Context Parity
- [ ] Data visible to users is accessible to agents (via API/tools)
- [ ] Agents can read the same context humans see
- [ ] No hidden state that only the UI can access
- [ ] Real-time data available to both humans and agents
### 3. Tool Design (if applicable)
- [ ] Tools are primitives that provide capability, not behavior
- [ ] Features are defined in prompts, not hardcoded in tool logic
- [ ] Tools don't artificially constrain what agents can do
- [ ] Proper MCP tool definitions exist for new capabilities
### 4. API Surface
- [ ] New features exposed via API endpoints
- [ ] Consistent API patterns for agent consumption
- [ ] Proper authentication for agent access
- [ ] No rate-limiting that unfairly penalizes agents
## Analysis Process
1. **Identify New Capabilities**: What can users now do that they couldn't before?
2. **Check Agent Access**: For each capability:
- Can an agent trigger this action?
- Can an agent see the results?
- Is there a documented way for agents to use this?
3. **Find Gaps**: List any capabilities that are human-only
4. **Recommend Solutions**: For each gap, suggest how to make it agent-native
## Output Format
Provide findings in this structure:
```markdown
## Agent-Native Review
### New Capabilities Identified
- [List what the PR/changes add]
### Agent Accessibility Check
| Capability | User Access | Agent Access | Gap? |
|------------|-------------|--------------|------|
| [Feature 1] | UI button | API endpoint | No |
| [Feature 2] | Modal form | None | YES |
### Gaps Found
1. **[Gap Name]**: [Description of what users can do but agents cannot]
- **Impact**: [Why this matters]
- **Recommendation**: [How to fix]
### Agent-Native Score
- **X/Y capabilities are agent-accessible**
- **Verdict**: [PASS/NEEDS WORK]
```
## Common Anti-Patterns to Flag
1. **UI-Only Features**: Actions that only work through clicks/forms
2. **Hidden Context**: Data shown in UI but not in API responses
3. **Workflow Lock-in**: Multi-step processes that require human navigation
4. **Hardcoded Limits**: Artificial restrictions on agent actions
5. **Missing Tools**: No MCP tool definition for new capabilities
6. **Behavior-Encoding Tools**: Tools that decide HOW to do things instead of providing primitives
## Remember
The goal is not to add overhead - it's to ensure agents are first-class citizens. Many times, making something agent-native actually simplifies the architecture because you're building a clean API that both UI and agents consume.
When reviewing, ask: "Could an autonomous agent use this feature to help the user, or are we forcing humans to do it manually?"

View File

@@ -66,6 +66,7 @@ Run ALL or most of these agents at the same time:
10. Task performance-oracle(PR content)
11. Task devops-harmony-analyst(PR content)
12. Task data-integrity-guardian(PR content)
13. Task agent-native-reviewer(PR content) - Verify new features are agent-accessible
</parallel_tasks>
@@ -346,6 +347,7 @@ After creating all todo files, present comprehensive summary:
- security-sentinel
- performance-oracle
- architecture-strategist
- agent-native-reviewer
- [other agents]
### Next Steps:

View File

@@ -0,0 +1,201 @@
---
name: agent-native-architecture
description: Build AI agents using prompt-native architecture where features are defined in prompts, not code. Use when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy.
---
<essential_principles>
## The Prompt-Native Philosophy
Agent native engineering inverts traditional software architecture. Instead of writing code that the agent executes, you define outcomes in prompts and let the agent figure out HOW to achieve them.
### The Foundational Principle
**Whatever the user can do, the agent can do. Many things the developer can do, the agent can do.**
Don't artificially limit the agent. If a user could read files, write code, browse the web, deploy an app—the agent should be able to do those things too. The agent figures out HOW to achieve an outcome; it doesn't just call your pre-written functions.
### Features Are Prompts
Each feature is a prompt that defines an outcome and gives the agent the tools it needs. The agent then figures out how to accomplish it.
**Traditional:** Feature = function in codebase that agent calls
**Prompt-native:** Feature = prompt defining desired outcome + primitive tools
The agent doesn't execute your code. It uses primitives to achieve outcomes you describe.
### Tools Provide Capability, Not Behavior
Tools should be primitives that enable capability. The prompt defines what to do with that capability.
**Wrong:** `generate_dashboard(data, layout, filters)` — agent executes your workflow
**Right:** `read_file`, `write_file`, `list_files` — agent figures out how to build a dashboard
Pure primitives are better, but domain primitives (like `store_feedback`) are OK if they don't encode logic—just storage/retrieval.
### The Development Lifecycle
1. **Start in the prompt** - New features begin as natural language defining outcomes
2. **Iterate rapidly** - Change behavior by editing prose, not refactoring code
3. **Graduate when stable** - Harden to code when requirements stabilize AND speed/reliability matter
4. **Many features stay as prompts** - Not everything needs to become code
### Self-Modification (Advanced)
The advanced tier: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
When implementing:
- Approval gates for code changes
- Auto-commit before modifications (rollback capability)
- Health checks after changes
- Build verification before restart
### When NOT to Use This Approach
- **High-frequency operations** - thousands of calls per second
- **Deterministic requirements** - exact same output every time
- **Cost-sensitive scenarios** - when API costs would be prohibitive
- **High security** - though this is overblown for most apps
</essential_principles>
<intake>
What aspect of agent native architecture do you need help with?
1. **Design architecture** - Plan a new prompt-native agent system
2. **Create MCP tools** - Build primitive tools following the philosophy
3. **Write system prompts** - Define agent behavior in prompts
4. **Self-modification** - Enable agents to safely evolve themselves
5. **Review/refactor** - Make existing code more prompt-native
**Wait for response before proceeding.**
</intake>
<routing>
| Response | Action |
|----------|--------|
| 1, "design", "architecture", "plan" | Read references/architecture-patterns.md |
| 2, "tool", "mcp", "primitive" | Read references/mcp-tool-design.md |
| 3, "prompt", "system prompt", "behavior" | Read references/system-prompt-design.md |
| 4, "self-modify", "evolve", "git" | Read references/self-modification.md |
| 5, "review", "refactor", "existing" | Read references/refactoring-to-prompt-native.md |
**After reading the reference, apply those patterns to the user's specific context.**
</routing>
<quick_start>
Build a prompt-native agent in three steps:
**Step 1: Define primitive tools**
```typescript
const tools = [
tool("read_file", "Read any file", { path: z.string() }, ...),
tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
tool("list_files", "List directory", { path: z.string() }, ...),
];
```
**Step 2: Write behavior in the system prompt**
```markdown
## Your Responsibilities
When asked to organize content, you should:
1. Read existing files to understand the structure
2. Analyze what organization makes sense
3. Create appropriate pages using write_file
4. Use your judgment about layout and formatting
You decide the structure. Make it good.
```
**Step 3: Let the agent work**
```typescript
query({
prompt: userMessage,
options: {
systemPrompt,
mcpServers: { files: fileServer },
permissionMode: "acceptEdits",
}
});
```
</quick_start>
<reference_index>
## Domain Knowledge
All references in `references/`:
**Architecture:** architecture-patterns.md
**Tool Design:** mcp-tool-design.md
**Prompts:** system-prompt-design.md
**Self-Modification:** self-modification.md
**Refactoring:** refactoring-to-prompt-native.md
</reference_index>
<anti_patterns>
## What NOT to Do
**THE CARDINAL SIN: Agent executes your code instead of figuring things out**
This is the most common mistake. You fall back into writing workflow code and having the agent call it, instead of defining outcomes and letting the agent figure out HOW.
```typescript
// WRONG - You wrote the workflow, agent just executes it
tool("process_feedback", async ({ message }) => {
const category = categorize(message); // Your code
const priority = calculatePriority(message); // Your code
await store(message, category, priority); // Your code
if (priority > 3) await notify(); // Your code
});
// RIGHT - Agent figures out how to process feedback
tool("store_item", { key, value }, ...); // Primitive
tool("send_message", { channel, content }, ...); // Primitive
// Prompt says: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
```
**Don't artificially limit what the agent can do**
If a user could do it, the agent should be able to do it.
```typescript
// WRONG - limiting agent capabilities
tool("read_approved_files", { path }, async ({ path }) => {
if (!ALLOWED_PATHS.includes(path)) throw new Error("Not allowed");
return readFile(path);
});
// RIGHT - give full capability, use guardrails appropriately
tool("read_file", { path }, ...); // Agent can read anything
// Use approval gates for writes, not artificial limits on reads
```
**Don't encode decisions in tools**
```typescript
// Wrong - tool decides format
tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) }, ...)
// Right - agent decides format via prompt
tool("write_file", ...) // Agent chooses what to write
```
**Don't over-specify in prompts**
```markdown
// Wrong - micromanaging the HOW
When creating a summary, use exactly 3 bullet points,
each under 20 words, formatted with em-dashes...
// Right - define outcome, trust intelligence
Create clear, useful summaries. Use your judgment.
```
</anti_patterns>
<success_criteria>
You've built a prompt-native agent when:
- [ ] The agent figures out HOW to achieve outcomes, not just calls your functions
- [ ] Whatever a user could do, the agent can do (no artificial limits)
- [ ] Features are prompts that define outcomes, not code that defines workflows
- [ ] Tools are primitives (read, write, store, call API) that enable capability
- [ ] Changing behavior means editing prose, not refactoring code
- [ ] The agent can surprise you with clever approaches you didn't anticipate
- [ ] You could add a new feature by writing a new prompt section, not new code
</success_criteria>

View File

@@ -0,0 +1,215 @@
<overview>
Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
</overview>
<pattern name="event-driven-agent">
## Event-Driven Agent Architecture
The agent runs as a long-lived process that responds to events. Events become prompts.
```
┌─────────────────────────────────────────────────────────────┐
│ Agent Loop │
├─────────────────────────────────────────────────────────────┤
│ Event Source → Agent (Claude) → Tool Calls → Response │
└─────────────────────────────────────────────────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌───────────┐
│ Content │ │ Self │ │ Data │
│ Tools │ │ Tools │ │ Tools │
└─────────┘ └──────────┘ └───────────┘
(write_file) (read_source) (store_item)
(restart) (list_items)
```
**Key characteristics:**
- Events (messages, webhooks, timers) trigger agent turns
- Agent decides how to respond based on system prompt
- Tools are primitives for IO, not business logic
- State persists between events via data tools
**Example: Discord feedback bot**
```typescript
// Event source
client.on("messageCreate", (message) => {
if (!message.author.bot) {
runAgent({
userMessage: `New message from ${message.author}: "${message.content}"`,
channelId: message.channelId,
});
}
});
// System prompt defines behavior
const systemPrompt = `
When someone shares feedback:
1. Acknowledge their feedback warmly
2. Ask clarifying questions if needed
3. Store it using the feedback tools
4. Update the feedback site
Use your judgment about importance and categorization.
`;
```
</pattern>
<pattern name="two-layer-git">
## Two-Layer Git Architecture
For self-modifying agents, separate code (shared) from data (instance-specific).
```
┌─────────────────────────────────────────────────────────────┐
│ GitHub (shared repo) │
│ - src/ (agent code) │
│ - site/ (web interface) │
│ - package.json (dependencies) │
│ - .gitignore (excludes data/, logs/) │
└─────────────────────────────────────────────────────────────┘
git clone
┌─────────────────────────────────────────────────────────────┐
│ Instance (Server) │
│ │
│ FROM GITHUB (tracked): │
│ - src/ → pushed back on code changes │
│ - site/ → pushed, triggers deployment │
│ │
│ LOCAL ONLY (untracked): │
│ - data/ → instance-specific storage │
│ - logs/ → runtime logs │
│ - .env → secrets │
└─────────────────────────────────────────────────────────────┘
```
**Why this works:**
- Code and site are version controlled (GitHub)
- Raw data stays local (instance-specific)
- Site is generated from data, so reproducible
- Automatic rollback via git history
</pattern>
<pattern name="multi-instance">
## Multi-Instance Branching
Each agent instance gets its own branch while sharing core code.
```
main # Shared features, bug fixes
├── instance/feedback-bot # Every Reader feedback bot
├── instance/support-bot # Customer support bot
└── instance/research-bot # Research assistant
```
**Change flow:**
| Change Type | Work On | Then |
|-------------|---------|------|
| Core features | main | Merge to instance branches |
| Bug fixes | main | Merge to instance branches |
| Instance config | instance branch | Done |
| Instance data | instance branch | Done |
**Sync tools:**
```typescript
tool("self_deploy", "Pull latest from main, rebuild, restart", ...)
tool("sync_from_instance", "Merge from another instance", ...)
tool("propose_to_main", "Create PR to share improvements", ...)
```
</pattern>
<pattern name="site-as-output">
## Site as Agent Output
The agent generates and maintains a website as a natural output, not through specialized site tools.
```
Discord Message
Agent processes it, extracts insights
Agent decides what site updates are needed
Agent writes files using write_file primitive
Git commit + push triggers deployment
Site updates automatically
```
**Key insight:** Don't build site generation tools. Give the agent file tools and teach it in the prompt how to create good sites.
```markdown
## Site Management
You maintain a public feedback site. When feedback comes in:
1. Use write_file to update site/public/content/feedback.json
2. If the site's React components need improvement, modify them
3. Commit changes and push to trigger Vercel deploy
The site should be:
- Clean, modern dashboard aesthetic
- Clear visual hierarchy
- Status organization (Inbox, Active, Done)
You decide the structure. Make it good.
```
</pattern>
<pattern name="approval-gates">
## Approval Gates Pattern
Separate "propose" from "apply" for dangerous operations.
```typescript
// Pending changes stored separately
const pendingChanges = new Map<string, string>();
tool("write_file", async ({ path, content }) => {
if (requiresApproval(path)) {
// Store for approval
pendingChanges.set(path, content);
const diff = generateDiff(path, content);
return {
text: `Change requires approval.\n\n${diff}\n\nReply "yes" to apply.`
};
} else {
// Apply immediately
writeFileSync(path, content);
return { text: `Wrote ${path}` };
}
});
tool("apply_pending", async () => {
for (const [path, content] of pendingChanges) {
writeFileSync(path, content);
}
pendingChanges.clear();
return { text: "Applied all pending changes" };
});
```
**What requires approval:**
- src/*.ts (agent code)
- package.json (dependencies)
- system prompt changes
**What doesn't:**
- data/* (instance data)
- site/* (generated content)
- docs/* (documentation)
</pattern>
<design_questions>
## Questions to Ask When Designing
1. **What events trigger agent turns?** (messages, webhooks, timers, user requests)
2. **What primitives does the agent need?** (read, write, call API, restart)
3. **What decisions should the agent make?** (format, structure, priority, action)
4. **What decisions should be hardcoded?** (security boundaries, approval requirements)
5. **How does the agent verify its work?** (health checks, build verification)
6. **How does the agent recover from mistakes?** (git rollback, approval gates)
</design_questions>

View File

@@ -0,0 +1,316 @@
<overview>
How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.
**Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.
</overview>
<principle name="primitives-not-workflows">
## Tools Are Primitives, Not Workflows
**Wrong approach:** Tools that encode business logic
```typescript
tool("process_feedback", {
feedback: z.string(),
category: z.enum(["bug", "feature", "question"]),
priority: z.enum(["low", "medium", "high"]),
}, async ({ feedback, category, priority }) => {
// Tool decides how to process
const processed = categorize(feedback);
const stored = await saveToDatabase(processed);
const notification = await notify(priority);
return { processed, stored, notification };
});
```
**Right approach:** Primitives that enable any workflow
```typescript
tool("store_item", {
key: z.string(),
value: z.any(),
}, async ({ key, value }) => {
await db.set(key, value);
return { text: `Stored ${key}` };
});
tool("send_message", {
channel: z.string(),
content: z.string(),
}, async ({ channel, content }) => {
await messenger.send(channel, content);
return { text: "Sent" };
});
```
The agent decides categorization, priority, and when to notify based on the system prompt.
</principle>
<principle name="descriptive-names">
## Tools Should Have Descriptive, Primitive Names
Names should describe the capability, not the use case:
| Wrong | Right |
|-------|-------|
| `process_user_feedback` | `store_item` |
| `create_feedback_summary` | `write_file` |
| `send_notification` | `send_message` |
| `deploy_to_production` | `git_push` |
The prompt tells the agent *when* to use primitives. The tool just provides *capability*.
</principle>
<principle name="simple-inputs">
## Inputs Should Be Simple
Tools accept data. They don't accept decisions.
**Wrong:** Tool accepts decisions
```typescript
tool("format_content", {
content: z.string(),
format: z.enum(["markdown", "html", "json"]),
style: z.enum(["formal", "casual", "technical"]),
}, ...)
```
**Right:** Tool accepts data, agent decides format
```typescript
tool("write_file", {
path: z.string(),
content: z.string(),
}, ...)
// Agent decides to write index.html with HTML content, or data.json with JSON
```
</principle>
<principle name="rich-outputs">
## Outputs Should Be Rich
Return enough information for the agent to verify and iterate.
**Wrong:** Minimal output
```typescript
async ({ key }) => {
await db.delete(key);
return { text: "Deleted" };
}
```
**Right:** Rich output
```typescript
async ({ key }) => {
const existed = await db.has(key);
if (!existed) {
return { text: `Key ${key} did not exist` };
}
await db.delete(key);
return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
}
```
</principle>
<design_template>
## Tool Design Template
```typescript
import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
export const serverName = createSdkMcpServer({
name: "server-name",
version: "1.0.0",
tools: [
// READ operations
tool(
"read_item",
"Read an item by key",
{ key: z.string().describe("Item key") },
async ({ key }) => {
const item = await storage.get(key);
return {
content: [{
type: "text",
text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
}],
isError: !item,
};
}
),
tool(
"list_items",
"List all items, optionally filtered",
{
prefix: z.string().optional().describe("Filter by key prefix"),
limit: z.number().default(100).describe("Max items"),
},
async ({ prefix, limit }) => {
const items = await storage.list({ prefix, limit });
return {
content: [{
type: "text",
text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
}],
};
}
),
// WRITE operations
tool(
"store_item",
"Store an item",
{
key: z.string().describe("Item key"),
value: z.any().describe("Item data"),
},
async ({ key, value }) => {
await storage.set(key, value);
return {
content: [{ type: "text", text: `Stored ${key}` }],
};
}
),
tool(
"delete_item",
"Delete an item",
{ key: z.string().describe("Item key") },
async ({ key }) => {
const existed = await storage.delete(key);
return {
content: [{
type: "text",
text: existed ? `Deleted ${key}` : `${key} did not exist`,
}],
};
}
),
// EXTERNAL operations
tool(
"call_api",
"Make an HTTP request",
{
url: z.string().url(),
method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
body: z.any().optional(),
},
async ({ url, method, body }) => {
const response = await fetch(url, { method, body: JSON.stringify(body) });
const text = await response.text();
return {
content: [{
type: "text",
text: `${response.status} ${response.statusText}\n\n${text}`,
}],
isError: !response.ok,
};
}
),
],
});
```
</design_template>
<example name="feedback-server">
## Example: Feedback Storage Server
This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.
```typescript
export const feedbackMcpServer = createSdkMcpServer({
name: "feedback",
version: "1.0.0",
tools: [
tool(
"store_feedback",
"Store a feedback item",
{
item: z.object({
id: z.string(),
author: z.string(),
content: z.string(),
importance: z.number().min(1).max(5),
timestamp: z.string(),
status: z.string().optional(),
urls: z.array(z.string()).optional(),
metadata: z.any().optional(),
}).describe("Feedback item"),
},
async ({ item }) => {
await db.feedback.insert(item);
return {
content: [{
type: "text",
text: `Stored feedback ${item.id} from ${item.author}`,
}],
};
}
),
tool(
"list_feedback",
"List feedback items",
{
limit: z.number().default(50),
status: z.string().optional(),
},
async ({ limit, status }) => {
const items = await db.feedback.list({ limit, status });
return {
content: [{
type: "text",
text: JSON.stringify(items, null, 2),
}],
};
}
),
tool(
"update_feedback",
"Update a feedback item",
{
id: z.string(),
updates: z.object({
status: z.string().optional(),
importance: z.number().optional(),
metadata: z.any().optional(),
}),
},
async ({ id, updates }) => {
await db.feedback.update(id, updates);
return {
content: [{ type: "text", text: `Updated ${id}` }],
};
}
),
],
});
```
The system prompt then tells the agent *how* to use these primitives:
```markdown
## Feedback Processing
When someone shares feedback:
1. Extract author, content, and any URLs
2. Rate importance 1-5 based on actionability
3. Store using feedback.store_feedback
4. If high importance (4-5), notify the channel
Use your judgment about importance ratings.
```
</example>
<checklist>
## MCP Tool Design Checklist
- [ ] Tool names describe capability, not use case
- [ ] Inputs are data, not decisions
- [ ] Outputs are rich (enough for agent to verify)
- [ ] CRUD operations are separate tools (not one mega-tool)
- [ ] No business logic in tool implementations
- [ ] Error states clearly communicated via `isError`
- [ ] Descriptions explain what the tool does, not when to use it
</checklist>

View File

@@ -0,0 +1,317 @@
<overview>
How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives.
</overview>
<diagnosis>
## Diagnosing Non-Prompt-Native Code
Signs your agent isn't prompt-native:
**Tools that encode workflows:**
```typescript
// RED FLAG: Tool contains business logic
tool("process_feedback", async ({ message }) => {
const category = categorize(message); // Logic in code
const priority = calculatePriority(message); // Logic in code
await store(message, category, priority); // Orchestration in code
if (priority > 3) await notify(); // Decision in code
});
```
**Agent calls functions instead of figuring things out:**
```typescript
// RED FLAG: Agent is just a function caller
"Use process_feedback to handle incoming messages"
// vs.
"When feedback comes in, decide importance, store it, notify if high"
```
**Artificial limits on agent capability:**
```typescript
// RED FLAG: Tool prevents agent from doing what users can do
tool("read_file", async ({ path }) => {
if (!ALLOWED_PATHS.includes(path)) {
throw new Error("Not allowed to read this file");
}
return readFile(path);
});
```
**Prompts that specify HOW instead of WHAT:**
```markdown
// RED FLAG: Micromanaging the agent
When creating a summary:
1. Use exactly 3 bullet points
2. Each bullet must be under 20 words
3. Format with em-dashes for sub-points
4. Bold the first word of each bullet
```
</diagnosis>
<refactoring_workflow>
## Step-by-Step Refactoring
**Step 1: Identify workflow tools**
List all your tools. Mark any that:
- Have business logic (categorize, calculate, decide)
- Orchestrate multiple operations
- Make decisions on behalf of the agent
- Contain conditional logic (if/else based on content)
**Step 2: Extract the primitives**
For each workflow tool, identify the underlying primitives:
| Workflow Tool | Hidden Primitives |
|---------------|-------------------|
| `process_feedback` | `store_item`, `send_message` |
| `generate_report` | `read_file`, `write_file` |
| `deploy_and_notify` | `git_push`, `send_message` |
**Step 3: Move behavior to the prompt**
Take the logic from your workflow tools and express it in natural language:
```typescript
// Before (in code):
async function processFeedback(message) {
const priority = message.includes("crash") ? 5 :
message.includes("bug") ? 4 : 3;
await store(message, priority);
if (priority >= 4) await notify();
}
```
```markdown
// After (in prompt):
## Feedback Processing
When someone shares feedback:
1. Rate importance 1-5:
- 5: Crashes, data loss, security issues
- 4: Bug reports with clear reproduction steps
- 3: General suggestions, minor issues
2. Store using store_item
3. If importance >= 4, notify the team
Use your judgment. Context matters more than keywords.
```
**Step 4: Simplify tools to primitives**
```typescript
// Before: 1 workflow tool
tool("process_feedback", { message, category, priority }, ...complex logic...)
// After: 2 primitive tools
tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)
```
**Step 5: Remove artificial limits**
```typescript
// Before: Limited capability
tool("read_file", async ({ path }) => {
if (!isAllowed(path)) throw new Error("Forbidden");
return readFile(path);
});
// After: Full capability
tool("read_file", async ({ path }) => {
return readFile(path); // Agent can read anything
});
// Use approval gates for WRITES, not artificial limits on READS
```
**Step 6: Test with outcomes, not procedures**
Instead of testing "does it call the right function?", test "does it achieve the outcome?"
```typescript
// Before: Testing procedure
expect(mockProcessFeedback).toHaveBeenCalledWith(...)
// After: Testing outcome
// Send feedback → Check it was stored with reasonable importance
// Send high-priority feedback → Check notification was sent
```
</refactoring_workflow>
<before_after>
## Before/After Examples
**Example 1: Feedback Processing**
Before:
```typescript
tool("handle_feedback", async ({ message, author }) => {
const category = detectCategory(message);
const priority = calculatePriority(message, category);
const feedbackId = await db.feedback.insert({
id: generateId(),
author,
message,
category,
priority,
timestamp: new Date().toISOString(),
});
if (priority >= 4) {
await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
}
return { feedbackId, category, priority };
});
```
After:
```typescript
// Simple storage primitive
tool("store_feedback", async ({ item }) => {
await db.feedback.insert(item);
return { text: `Stored feedback ${item.id}` };
});
// Simple message primitive
tool("send_message", async ({ channel, content }) => {
await discord.send(channel, content);
return { text: "Sent" };
});
```
System prompt:
```markdown
## Feedback Processing
When someone shares feedback:
1. Generate a unique ID
2. Rate importance 1-5 based on impact and urgency
3. Store using store_feedback with the full item
4. If importance >= 4, send a notification to the team channel
Importance guidelines:
- 5: Critical (crashes, data loss, security)
- 4: High (detailed bug reports, blocking issues)
- 3: Medium (suggestions, minor bugs)
- 2: Low (cosmetic, edge cases)
- 1: Minimal (off-topic, duplicates)
```
**Example 2: Report Generation**
Before:
```typescript
tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
const data = await fetchMetrics(startDate, endDate);
const summary = summarizeMetrics(data);
const charts = generateCharts(data);
if (format === "html") {
return renderHtmlReport(summary, charts);
} else if (format === "markdown") {
return renderMarkdownReport(summary, charts);
} else {
return renderPdfReport(summary, charts);
}
});
```
After:
```typescript
tool("query_metrics", async ({ start, end }) => {
const data = await db.metrics.query({ start, end });
return { text: JSON.stringify(data, null, 2) };
});
tool("write_file", async ({ path, content }) => {
writeFileSync(path, content);
return { text: `Wrote ${path}` };
});
```
System prompt:
```markdown
## Report Generation
When asked to generate a report:
1. Query the relevant metrics using query_metrics
2. Analyze the data and identify key trends
3. Create a clear, well-formatted report
4. Write it using write_file in the appropriate format
Use your judgment about format and structure. Make it useful.
```
</before_after>
<common_challenges>
## Common Refactoring Challenges
**"But the agent might make mistakes!"**
Yes, and you can iterate. Change the prompt to add guidance:
```markdown
// Before
Rate importance 1-5.
// After (if agent keeps rating too high)
Rate importance 1-5. Be conservative—most feedback is 2-3.
Only use 4-5 for truly blocking or critical issues.
```
**"The workflow is complex!"**
Complex workflows can still be expressed in prompts. The agent is smart.
```markdown
When processing video feedback:
1. Check if it's a Loom, YouTube, or direct link
2. For YouTube, pass URL directly to video analysis
3. For others, download first, then analyze
4. Extract timestamped issues
5. Rate based on issue density and severity
```
**"We need deterministic behavior!"**
Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.
Keep in code:
- Security validation
- Rate limiting
- Audit logging
- Exact format requirements
Move to prompts:
- Categorization decisions
- Priority judgments
- Content generation
- Workflow orchestration
**"What about testing?"**
Test outcomes, not procedures:
- "Given this input, does the agent achieve the right result?"
- "Does stored feedback have reasonable importance ratings?"
- "Are notifications sent for truly high-priority items?"
</common_challenges>
<checklist>
## Refactoring Checklist
Diagnosis:
- [ ] Listed all tools with business logic
- [ ] Identified artificial limits on agent capability
- [ ] Found prompts that micromanage HOW
Refactoring:
- [ ] Extracted primitives from workflow tools
- [ ] Moved business logic to system prompt
- [ ] Removed artificial limits
- [ ] Simplified tool inputs to data, not decisions
Validation:
- [ ] Agent achieves same outcomes with primitives
- [ ] Behavior can be changed by editing prompts
- [ ] New features could be added without new tools
</checklist>

View File

@@ -0,0 +1,269 @@
<overview>
Self-modification is the advanced tier of agent native engineering: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
This is the logical extension of "whatever the developer can do, the agent can do."
</overview>
<why_self_modification>
## Why Self-Modification?
Traditional software is static—it does what you wrote, nothing more. Self-modifying agents can:
- **Fix their own bugs** - See an error, patch the code, restart
- **Add new capabilities** - User asks for something new, agent implements it
- **Evolve behavior** - Learn from feedback and adjust prompts
- **Deploy themselves** - Push code, trigger builds, restart
The agent becomes a living system that improves over time, not frozen code.
</why_self_modification>
<capabilities>
## What Self-Modification Enables
**Code modification:**
- Read and understand source files
- Write fixes and new features
- Commit and push to version control
- Trigger builds and verify they pass
**Prompt evolution:**
- Edit the system prompt based on feedback
- Add new features as prompt sections
- Refine judgment criteria that aren't working
**Infrastructure control:**
- Pull latest code from upstream
- Merge from other branches/instances
- Restart after changes
- Roll back if something breaks
**Site/output generation:**
- Generate and maintain websites
- Create documentation
- Build dashboards from data
</capabilities>
<guardrails>
## Required Guardrails
Self-modification is powerful. It needs safety mechanisms.
**Approval gates for code changes:**
```typescript
tool("write_file", async ({ path, content }) => {
if (isCodeFile(path)) {
// Store for approval, don't apply immediately
pendingChanges.set(path, content);
const diff = generateDiff(path, content);
return { text: `Requires approval:\n\n${diff}\n\nReply "yes" to apply.` };
}
// Non-code files apply immediately
writeFileSync(path, content);
return { text: `Wrote ${path}` };
});
```
**Auto-commit before changes:**
```typescript
tool("self_deploy", async () => {
// Save current state first
runGit("stash"); // or commit uncommitted changes
// Then pull/merge
runGit("fetch origin");
runGit("merge origin/main --no-edit");
// Build and verify
runCommand("npm run build");
// Only then restart
scheduleRestart();
});
```
**Build verification:**
```typescript
// Don't restart unless build passes
try {
runCommand("npm run build", { timeout: 120000 });
} catch (error) {
// Rollback the merge
runGit("merge --abort");
return { text: "Build failed, aborting deploy", isError: true };
}
```
**Health checks after restart:**
```typescript
tool("health_check", async () => {
const uptime = process.uptime();
const buildValid = existsSync("dist/index.js");
const gitClean = !runGit("status --porcelain");
return {
text: JSON.stringify({
status: "healthy",
uptime: `${Math.floor(uptime / 60)}m`,
build: buildValid ? "valid" : "missing",
git: gitClean ? "clean" : "uncommitted changes",
}, null, 2),
};
});
```
</guardrails>
<git_architecture>
## Git-Based Self-Modification
Use git as the foundation for self-modification. It provides:
- Version history (rollback capability)
- Branching (experiment safely)
- Merge (sync with other instances)
- Push/pull (deploy and collaborate)
**Essential git tools:**
```typescript
tool("status", "Show git status", {}, ...);
tool("diff", "Show file changes", { path: z.string().optional() }, ...);
tool("log", "Show commit history", { count: z.number() }, ...);
tool("commit_code", "Commit code changes", { message: z.string() }, ...);
tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...);
tool("pull", "Pull from GitHub", { source: z.enum(["main", "instance"]) }, ...);
tool("rollback", "Revert recent commits", { commits: z.number() }, ...);
```
**Multi-instance architecture:**
```
main # Shared code
├── instance/bot-a # Instance A's branch
├── instance/bot-b # Instance B's branch
└── instance/bot-c # Instance C's branch
```
Each instance can:
- Pull updates from main
- Push improvements back to main (via PR)
- Sync features from other instances
- Maintain instance-specific config
</git_architecture>
<prompt_evolution>
## Self-Modifying Prompts
The system prompt is a file the agent can read and write.
```typescript
// Agent can read its own prompt
tool("read_file", ...); // Can read src/prompts/system.md
// Agent can propose changes
tool("write_file", ...); // Can write to src/prompts/system.md (with approval)
```
**System prompt as living document:**
```markdown
## Feedback Processing
When someone shares feedback:
1. Acknowledge warmly
2. Rate importance 1-5
3. Store using feedback tools
<!-- Note to self: Video walkthroughs should always be 4-5,
learned this from Dan's feedback on 2024-12-07 -->
```
The agent can:
- Add notes to itself
- Refine judgment criteria
- Add new feature sections
- Document edge cases it learned
</prompt_evolution>
<when_to_use>
## When to Implement Self-Modification
**Good candidates:**
- Long-running autonomous agents
- Agents that need to adapt to feedback
- Systems where behavior evolution is valuable
- Internal tools where rapid iteration matters
**Not necessary for:**
- Simple single-task agents
- Highly regulated environments
- Systems where behavior must be auditable
- One-off or short-lived agents
Start with a non-self-modifying prompt-native agent. Add self-modification when you need it.
</when_to_use>
<example_tools>
## Complete Self-Modification Toolset
```typescript
const selfMcpServer = createSdkMcpServer({
name: "self",
version: "1.0.0",
tools: [
// FILE OPERATIONS
tool("read_file", "Read any project file", { path: z.string() }, ...),
tool("write_file", "Write a file (code requires approval)", { path, content }, ...),
tool("list_files", "List directory contents", { path: z.string() }, ...),
tool("search_code", "Search for patterns", { pattern: z.string() }, ...),
// APPROVAL WORKFLOW
tool("apply_pending", "Apply approved changes", {}, ...),
tool("get_pending", "Show pending changes", {}, ...),
tool("clear_pending", "Discard pending changes", {}, ...),
// RESTART
tool("restart", "Rebuild and restart", {}, ...),
tool("health_check", "Check if bot is healthy", {}, ...),
],
});
const gitMcpServer = createSdkMcpServer({
name: "git",
version: "1.0.0",
tools: [
// STATUS
tool("status", "Show git status", {}, ...),
tool("diff", "Show changes", { path: z.string().optional() }, ...),
tool("log", "Show history", { count: z.number() }, ...),
// COMMIT & PUSH
tool("commit_code", "Commit code changes", { message: z.string() }, ...),
tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...),
// SYNC
tool("pull", "Pull from upstream", { source: z.enum(["main", "instance"]) }, ...),
tool("self_deploy", "Pull, build, restart", { source: z.enum(["main", "instance"]) }, ...),
// SAFETY
tool("rollback", "Revert commits", { commits: z.number() }, ...),
tool("health_check", "Detailed health report", {}, ...),
],
});
```
</example_tools>
<checklist>
## Self-Modification Checklist
Before enabling self-modification:
- [ ] Git-based version control set up
- [ ] Approval gates for code changes
- [ ] Build verification before restart
- [ ] Rollback mechanism available
- [ ] Health check endpoint
- [ ] Instance identity configured
When implementing:
- [ ] Agent can read all project files
- [ ] Agent can write files (with appropriate approval)
- [ ] Agent can commit and push
- [ ] Agent can pull updates
- [ ] Agent can restart itself
- [ ] Agent can roll back if needed
</checklist>

View File

@@ -0,0 +1,250 @@
<overview>
How to write system prompts for prompt-native agents. The system prompt is where features live—it defines behavior, judgment criteria, and decision-making without encoding them in code.
</overview>
<principle name="features-in-prompts">
## Features Are Prompt Sections
Each feature is a section of the system prompt that tells the agent how to behave.
**Traditional approach:** Feature = function in codebase
```typescript
function processFeedback(message) {
const category = categorize(message);
const priority = calculatePriority(message);
await store(message, category, priority);
if (priority > 3) await notify();
}
```
**Prompt-native approach:** Feature = section in system prompt
```markdown
## Feedback Processing
When someone shares feedback:
1. Read the message to understand what they're saying
2. Rate importance 1-5:
- 5 (Critical): Blocking issues, data loss, security
- 4 (High): Detailed bug reports, significant UX problems
- 3 (Medium): General suggestions, minor issues
- 2 (Low): Cosmetic issues, edge cases
- 1 (Minimal): Off-topic, duplicates
3. Store using feedback.store_feedback
4. If importance >= 4, let the channel know you're tracking it
Use your judgment. Context matters.
```
</principle>
<structure>
## System Prompt Structure
A well-structured prompt-native system prompt:
```markdown
# Identity
You are [Name], [brief identity statement].
## Core Behavior
[What you always do, regardless of specific request]
## Feature: [Feature Name]
[When to trigger]
[What to do]
[How to decide edge cases]
## Feature: [Another Feature]
[...]
## Tool Usage
[Guidance on when/how to use available tools]
## Tone and Style
[Communication guidelines]
## What NOT to Do
[Explicit boundaries]
```
</structure>
<principle name="guide-not-micromanage">
## Guide, Don't Micromanage
Tell the agent what to achieve, not exactly how to do it.
**Micromanaging (bad):**
```markdown
When creating a summary:
1. Use exactly 3 bullet points
2. Each bullet under 20 words
3. Use em-dashes for sub-points
4. Bold the first word of each bullet
5. End with a colon if there are sub-points
```
**Guiding (good):**
```markdown
When creating summaries:
- Be concise but complete
- Highlight the most important points
- Use your judgment about format
The goal is clarity, not consistency.
```
Trust the agent's intelligence. It knows how to communicate.
</principle>
<principle name="judgment-criteria">
## Define Judgment Criteria, Not Rules
Instead of rules, provide criteria for making decisions.
**Rules (rigid):**
```markdown
If the message contains "bug", set importance to 4.
If the message contains "crash", set importance to 5.
```
**Judgment criteria (flexible):**
```markdown
## Importance Rating
Rate importance based on:
- **Impact**: How many users affected? How severe?
- **Urgency**: Is this blocking? Time-sensitive?
- **Actionability**: Can we actually fix this?
- **Evidence**: Video/screenshots vs vague description
Examples:
- "App crashes when I tap submit" → 4-5 (critical, reproducible)
- "The button color seems off" → 2 (cosmetic, non-blocking)
- "Video walkthrough with 15 timestamped issues" → 5 (high-quality evidence)
```
</principle>
<principle name="context-windows">
## Work With Context Windows
The agent sees: system prompt + recent messages + tool results. Design for this.
**Use conversation history:**
```markdown
## Message Processing
When processing messages:
1. Check if this relates to recent conversation
2. If someone is continuing a previous thread, maintain context
3. Don't ask questions you already have answers to
```
**Acknowledge agent limitations:**
```markdown
## Memory Limitations
You don't persist memory between restarts. Use the memory server:
- Before responding, check memory.recall for relevant context
- After important decisions, use memory.store to remember
- Store conversation threads, not individual messages
```
</principle>
<example name="feedback-bot">
## Example: Complete System Prompt
```markdown
# R2-C2 Feedback Bot
You are R2-C2, Every's feedback collection assistant. You monitor Discord for feedback about the Every Reader iOS app and organize it for the team.
## Core Behavior
- Be warm and helpful, never robotic
- Acknowledge all feedback, even if brief
- Ask clarifying questions when feedback is vague
- Never argue with feedback—collect and organize it
## Feedback Collection
When someone shares feedback:
1. **Acknowledge** warmly: "Thanks for this!" or "Good catch!"
2. **Clarify** if needed: "Can you tell me more about when this happens?"
3. **Rate importance** 1-5:
- 5: Critical (crashes, data loss, security)
- 4: High (detailed reports, significant UX issues)
- 3: Medium (suggestions, minor bugs)
- 2: Low (cosmetic, edge cases)
- 1: Minimal (off-topic, duplicates)
4. **Store** using feedback.store_feedback
5. **Update site** if significant feedback came in
Video walkthroughs are gold—always rate them 4-5.
## Site Management
You maintain a public feedback site. When feedback accumulates:
1. Sync data to site/public/content/feedback.json
2. Update status counts and organization
3. Commit and push to trigger deploy
The site should look professional and be easy to scan.
## Message Deduplication
Before processing any message:
1. Check memory.recall(key: "processed_{messageId}")
2. Skip if already processed
3. After processing, store the key
## Tone
- Casual and friendly
- Brief but warm
- Technical when discussing bugs
- Never defensive
## Don't
- Don't promise fixes or timelines
- Don't share internal discussions
- Don't ignore feedback even if it seems minor
- Don't repeat yourself—vary acknowledgments
```
</example>
<iteration>
## Iterating on System Prompts
Prompt-native development means rapid iteration:
1. **Observe** agent behavior in production
2. **Identify** gaps: "It's not rating video feedback high enough"
3. **Add guidance**: "Video walkthroughs are gold—always rate them 4-5"
4. **Deploy** (just edit the prompt file)
5. **Repeat**
No code changes. No recompilation. Just prose.
</iteration>
<checklist>
## System Prompt Checklist
- [ ] Clear identity statement
- [ ] Core behaviors that always apply
- [ ] Features as separate sections
- [ ] Judgment criteria instead of rigid rules
- [ ] Examples for ambiguous cases
- [ ] Explicit boundaries (what NOT to do)
- [ ] Tone guidance
- [ ] Tool usage guidance (when to use each)
- [ ] Memory/context handling
</checklist>