Add agent-native-architecture skill
New skill teaching prompt-native development patterns: - Features defined in prompts, not code - Tools as primitives that enable capability - "Whatever the user can do, the agent can do" - Self-modification patterns (advanced tier) - Refactoring guide for existing codebases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,201 @@
|
|||||||
|
---
|
||||||
|
name: agent-native-architecture
|
||||||
|
description: Build AI agents using prompt-native architecture where features are defined in prompts, not code. Use when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy.
|
||||||
|
---
|
||||||
|
|
||||||
|
<essential_principles>
|
||||||
|
## The Prompt-Native Philosophy
|
||||||
|
|
||||||
|
Agent native engineering inverts traditional software architecture. Instead of writing code that the agent executes, you define outcomes in prompts and let the agent figure out HOW to achieve them.
|
||||||
|
|
||||||
|
### The Foundational Principle
|
||||||
|
|
||||||
|
**Whatever the user can do, the agent can do. Many things the developer can do, the agent can do.**
|
||||||
|
|
||||||
|
Don't artificially limit the agent. If a user could read files, write code, browse the web, deploy an app—the agent should be able to do those things too. The agent figures out HOW to achieve an outcome; it doesn't just call your pre-written functions.
|
||||||
|
|
||||||
|
### Features Are Prompts
|
||||||
|
|
||||||
|
Each feature is a prompt that defines an outcome and gives the agent the tools it needs. The agent then figures out how to accomplish it.
|
||||||
|
|
||||||
|
**Traditional:** Feature = function in codebase that agent calls
|
||||||
|
**Prompt-native:** Feature = prompt defining desired outcome + primitive tools
|
||||||
|
|
||||||
|
The agent doesn't execute your code. It uses primitives to achieve outcomes you describe.
|
||||||
|
|
||||||
|
### Tools Provide Capability, Not Behavior
|
||||||
|
|
||||||
|
Tools should be primitives that enable capability. The prompt defines what to do with that capability.
|
||||||
|
|
||||||
|
**Wrong:** `generate_dashboard(data, layout, filters)` — agent executes your workflow
|
||||||
|
**Right:** `read_file`, `write_file`, `list_files` — agent figures out how to build a dashboard
|
||||||
|
|
||||||
|
Pure primitives are better, but domain primitives (like `store_feedback`) are OK if they don't encode logic—just storage/retrieval.
|
||||||
|
|
||||||
|
### The Development Lifecycle
|
||||||
|
|
||||||
|
1. **Start in the prompt** - New features begin as natural language defining outcomes
|
||||||
|
2. **Iterate rapidly** - Change behavior by editing prose, not refactoring code
|
||||||
|
3. **Graduate when stable** - Harden to code when requirements stabilize AND speed/reliability matter
|
||||||
|
4. **Many features stay as prompts** - Not everything needs to become code
|
||||||
|
|
||||||
|
### Self-Modification (Advanced)
|
||||||
|
|
||||||
|
The advanced tier: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
|
||||||
|
|
||||||
|
When implementing:
|
||||||
|
- Approval gates for code changes
|
||||||
|
- Auto-commit before modifications (rollback capability)
|
||||||
|
- Health checks after changes
|
||||||
|
- Build verification before restart
|
||||||
|
|
||||||
|
### When NOT to Use This Approach
|
||||||
|
|
||||||
|
- **High-frequency operations** - thousands of calls per second
|
||||||
|
- **Deterministic requirements** - exact same output every time
|
||||||
|
- **Cost-sensitive scenarios** - when API costs would be prohibitive
|
||||||
|
- **High security** - though this is overblown for most apps
|
||||||
|
</essential_principles>
|
||||||
|
|
||||||
|
<intake>
|
||||||
|
What aspect of agent native architecture do you need help with?
|
||||||
|
|
||||||
|
1. **Design architecture** - Plan a new prompt-native agent system
|
||||||
|
2. **Create MCP tools** - Build primitive tools following the philosophy
|
||||||
|
3. **Write system prompts** - Define agent behavior in prompts
|
||||||
|
4. **Self-modification** - Enable agents to safely evolve themselves
|
||||||
|
5. **Review/refactor** - Make existing code more prompt-native
|
||||||
|
|
||||||
|
**Wait for response before proceeding.**
|
||||||
|
</intake>
|
||||||
|
|
||||||
|
<routing>
|
||||||
|
| Response | Action |
|
||||||
|
|----------|--------|
|
||||||
|
| 1, "design", "architecture", "plan" | Read references/architecture-patterns.md |
|
||||||
|
| 2, "tool", "mcp", "primitive" | Read references/mcp-tool-design.md |
|
||||||
|
| 3, "prompt", "system prompt", "behavior" | Read references/system-prompt-design.md |
|
||||||
|
| 4, "self-modify", "evolve", "git" | Read references/self-modification.md |
|
||||||
|
| 5, "review", "refactor", "existing" | Read references/refactoring-to-prompt-native.md |
|
||||||
|
|
||||||
|
**After reading the reference, apply those patterns to the user's specific context.**
|
||||||
|
</routing>
|
||||||
|
|
||||||
|
<quick_start>
|
||||||
|
Build a prompt-native agent in three steps:
|
||||||
|
|
||||||
|
**Step 1: Define primitive tools**
|
||||||
|
```typescript
|
||||||
|
const tools = [
|
||||||
|
tool("read_file", "Read any file", { path: z.string() }, ...),
|
||||||
|
tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
|
||||||
|
tool("list_files", "List directory", { path: z.string() }, ...),
|
||||||
|
];
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 2: Write behavior in the system prompt**
|
||||||
|
```markdown
|
||||||
|
## Your Responsibilities
|
||||||
|
When asked to organize content, you should:
|
||||||
|
1. Read existing files to understand the structure
|
||||||
|
2. Analyze what organization makes sense
|
||||||
|
3. Create appropriate pages using write_file
|
||||||
|
4. Use your judgment about layout and formatting
|
||||||
|
|
||||||
|
You decide the structure. Make it good.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3: Let the agent work**
|
||||||
|
```typescript
|
||||||
|
query({
|
||||||
|
prompt: userMessage,
|
||||||
|
options: {
|
||||||
|
systemPrompt,
|
||||||
|
mcpServers: { files: fileServer },
|
||||||
|
permissionMode: "acceptEdits",
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
</quick_start>
|
||||||
|
|
||||||
|
<reference_index>
|
||||||
|
## Domain Knowledge
|
||||||
|
|
||||||
|
All references in `references/`:
|
||||||
|
|
||||||
|
**Architecture:** architecture-patterns.md
|
||||||
|
**Tool Design:** mcp-tool-design.md
|
||||||
|
**Prompts:** system-prompt-design.md
|
||||||
|
**Self-Modification:** self-modification.md
|
||||||
|
**Refactoring:** refactoring-to-prompt-native.md
|
||||||
|
</reference_index>
|
||||||
|
|
||||||
|
<anti_patterns>
|
||||||
|
## What NOT to Do
|
||||||
|
|
||||||
|
**THE CARDINAL SIN: Agent executes your code instead of figuring things out**
|
||||||
|
|
||||||
|
This is the most common mistake. You fall back into writing workflow code and having the agent call it, instead of defining outcomes and letting the agent figure out HOW.
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// WRONG - You wrote the workflow, agent just executes it
|
||||||
|
tool("process_feedback", async ({ message }) => {
|
||||||
|
const category = categorize(message); // Your code
|
||||||
|
const priority = calculatePriority(message); // Your code
|
||||||
|
await store(message, category, priority); // Your code
|
||||||
|
if (priority > 3) await notify(); // Your code
|
||||||
|
});
|
||||||
|
|
||||||
|
// RIGHT - Agent figures out how to process feedback
|
||||||
|
tool("store_item", { key, value }, ...); // Primitive
|
||||||
|
tool("send_message", { channel, content }, ...); // Primitive
|
||||||
|
// Prompt says: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Don't artificially limit what the agent can do**
|
||||||
|
|
||||||
|
If a user could do it, the agent should be able to do it.
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// WRONG - limiting agent capabilities
|
||||||
|
tool("read_approved_files", { path }, async ({ path }) => {
|
||||||
|
if (!ALLOWED_PATHS.includes(path)) throw new Error("Not allowed");
|
||||||
|
return readFile(path);
|
||||||
|
});
|
||||||
|
|
||||||
|
// RIGHT - give full capability, use guardrails appropriately
|
||||||
|
tool("read_file", { path }, ...); // Agent can read anything
|
||||||
|
// Use approval gates for writes, not artificial limits on reads
|
||||||
|
```
|
||||||
|
|
||||||
|
**Don't encode decisions in tools**
|
||||||
|
```typescript
|
||||||
|
// Wrong - tool decides format
|
||||||
|
tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) }, ...)
|
||||||
|
|
||||||
|
// Right - agent decides format via prompt
|
||||||
|
tool("write_file", ...) // Agent chooses what to write
|
||||||
|
```
|
||||||
|
|
||||||
|
**Don't over-specify in prompts**
|
||||||
|
```markdown
|
||||||
|
// Wrong - micromanaging the HOW
|
||||||
|
When creating a summary, use exactly 3 bullet points,
|
||||||
|
each under 20 words, formatted with em-dashes...
|
||||||
|
|
||||||
|
// Right - define outcome, trust intelligence
|
||||||
|
Create clear, useful summaries. Use your judgment.
|
||||||
|
```
|
||||||
|
</anti_patterns>
|
||||||
|
|
||||||
|
<success_criteria>
|
||||||
|
You've built a prompt-native agent when:
|
||||||
|
|
||||||
|
- [ ] The agent figures out HOW to achieve outcomes, not just calls your functions
|
||||||
|
- [ ] Whatever a user could do, the agent can do (no artificial limits)
|
||||||
|
- [ ] Features are prompts that define outcomes, not code that defines workflows
|
||||||
|
- [ ] Tools are primitives (read, write, store, call API) that enable capability
|
||||||
|
- [ ] Changing behavior means editing prose, not refactoring code
|
||||||
|
- [ ] The agent can surprise you with clever approaches you didn't anticipate
|
||||||
|
- [ ] You could add a new feature by writing a new prompt section, not new code
|
||||||
|
</success_criteria>
|
||||||
@@ -0,0 +1,215 @@
|
|||||||
|
<overview>
|
||||||
|
Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<pattern name="event-driven-agent">
|
||||||
|
## Event-Driven Agent Architecture
|
||||||
|
|
||||||
|
The agent runs as a long-lived process that responds to events. Events become prompts.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Agent Loop │
|
||||||
|
├─────────────────────────────────────────────────────────────┤
|
||||||
|
│ Event Source → Agent (Claude) → Tool Calls → Response │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
┌───────────────┼───────────────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌─────────┐ ┌──────────┐ ┌───────────┐
|
||||||
|
│ Content │ │ Self │ │ Data │
|
||||||
|
│ Tools │ │ Tools │ │ Tools │
|
||||||
|
└─────────┘ └──────────┘ └───────────┘
|
||||||
|
(write_file) (read_source) (store_item)
|
||||||
|
(restart) (list_items)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key characteristics:**
|
||||||
|
- Events (messages, webhooks, timers) trigger agent turns
|
||||||
|
- Agent decides how to respond based on system prompt
|
||||||
|
- Tools are primitives for IO, not business logic
|
||||||
|
- State persists between events via data tools
|
||||||
|
|
||||||
|
**Example: Discord feedback bot**
|
||||||
|
```typescript
|
||||||
|
// Event source
|
||||||
|
client.on("messageCreate", (message) => {
|
||||||
|
if (!message.author.bot) {
|
||||||
|
runAgent({
|
||||||
|
userMessage: `New message from ${message.author}: "${message.content}"`,
|
||||||
|
channelId: message.channelId,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// System prompt defines behavior
|
||||||
|
const systemPrompt = `
|
||||||
|
When someone shares feedback:
|
||||||
|
1. Acknowledge their feedback warmly
|
||||||
|
2. Ask clarifying questions if needed
|
||||||
|
3. Store it using the feedback tools
|
||||||
|
4. Update the feedback site
|
||||||
|
|
||||||
|
Use your judgment about importance and categorization.
|
||||||
|
`;
|
||||||
|
```
|
||||||
|
</pattern>
|
||||||
|
|
||||||
|
<pattern name="two-layer-git">
|
||||||
|
## Two-Layer Git Architecture
|
||||||
|
|
||||||
|
For self-modifying agents, separate code (shared) from data (instance-specific).
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ GitHub (shared repo) │
|
||||||
|
│ - src/ (agent code) │
|
||||||
|
│ - site/ (web interface) │
|
||||||
|
│ - package.json (dependencies) │
|
||||||
|
│ - .gitignore (excludes data/, logs/) │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
git clone
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ Instance (Server) │
|
||||||
|
│ │
|
||||||
|
│ FROM GITHUB (tracked): │
|
||||||
|
│ - src/ → pushed back on code changes │
|
||||||
|
│ - site/ → pushed, triggers deployment │
|
||||||
|
│ │
|
||||||
|
│ LOCAL ONLY (untracked): │
|
||||||
|
│ - data/ → instance-specific storage │
|
||||||
|
│ - logs/ → runtime logs │
|
||||||
|
│ - .env → secrets │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why this works:**
|
||||||
|
- Code and site are version controlled (GitHub)
|
||||||
|
- Raw data stays local (instance-specific)
|
||||||
|
- Site is generated from data, so reproducible
|
||||||
|
- Automatic rollback via git history
|
||||||
|
</pattern>
|
||||||
|
|
||||||
|
<pattern name="multi-instance">
|
||||||
|
## Multi-Instance Branching
|
||||||
|
|
||||||
|
Each agent instance gets its own branch while sharing core code.
|
||||||
|
|
||||||
|
```
|
||||||
|
main # Shared features, bug fixes
|
||||||
|
├── instance/feedback-bot # Every Reader feedback bot
|
||||||
|
├── instance/support-bot # Customer support bot
|
||||||
|
└── instance/research-bot # Research assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
**Change flow:**
|
||||||
|
| Change Type | Work On | Then |
|
||||||
|
|-------------|---------|------|
|
||||||
|
| Core features | main | Merge to instance branches |
|
||||||
|
| Bug fixes | main | Merge to instance branches |
|
||||||
|
| Instance config | instance branch | Done |
|
||||||
|
| Instance data | instance branch | Done |
|
||||||
|
|
||||||
|
**Sync tools:**
|
||||||
|
```typescript
|
||||||
|
tool("self_deploy", "Pull latest from main, rebuild, restart", ...)
|
||||||
|
tool("sync_from_instance", "Merge from another instance", ...)
|
||||||
|
tool("propose_to_main", "Create PR to share improvements", ...)
|
||||||
|
```
|
||||||
|
</pattern>
|
||||||
|
|
||||||
|
<pattern name="site-as-output">
|
||||||
|
## Site as Agent Output
|
||||||
|
|
||||||
|
The agent generates and maintains a website as a natural output, not through specialized site tools.
|
||||||
|
|
||||||
|
```
|
||||||
|
Discord Message
|
||||||
|
↓
|
||||||
|
Agent processes it, extracts insights
|
||||||
|
↓
|
||||||
|
Agent decides what site updates are needed
|
||||||
|
↓
|
||||||
|
Agent writes files using write_file primitive
|
||||||
|
↓
|
||||||
|
Git commit + push triggers deployment
|
||||||
|
↓
|
||||||
|
Site updates automatically
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key insight:** Don't build site generation tools. Give the agent file tools and teach it in the prompt how to create good sites.
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Site Management
|
||||||
|
|
||||||
|
You maintain a public feedback site. When feedback comes in:
|
||||||
|
1. Use write_file to update site/public/content/feedback.json
|
||||||
|
2. If the site's React components need improvement, modify them
|
||||||
|
3. Commit changes and push to trigger Vercel deploy
|
||||||
|
|
||||||
|
The site should be:
|
||||||
|
- Clean, modern dashboard aesthetic
|
||||||
|
- Clear visual hierarchy
|
||||||
|
- Status organization (Inbox, Active, Done)
|
||||||
|
|
||||||
|
You decide the structure. Make it good.
|
||||||
|
```
|
||||||
|
</pattern>
|
||||||
|
|
||||||
|
<pattern name="approval-gates">
|
||||||
|
## Approval Gates Pattern
|
||||||
|
|
||||||
|
Separate "propose" from "apply" for dangerous operations.
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Pending changes stored separately
|
||||||
|
const pendingChanges = new Map<string, string>();
|
||||||
|
|
||||||
|
tool("write_file", async ({ path, content }) => {
|
||||||
|
if (requiresApproval(path)) {
|
||||||
|
// Store for approval
|
||||||
|
pendingChanges.set(path, content);
|
||||||
|
const diff = generateDiff(path, content);
|
||||||
|
return {
|
||||||
|
text: `Change requires approval.\n\n${diff}\n\nReply "yes" to apply.`
|
||||||
|
};
|
||||||
|
} else {
|
||||||
|
// Apply immediately
|
||||||
|
writeFileSync(path, content);
|
||||||
|
return { text: `Wrote ${path}` };
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
tool("apply_pending", async () => {
|
||||||
|
for (const [path, content] of pendingChanges) {
|
||||||
|
writeFileSync(path, content);
|
||||||
|
}
|
||||||
|
pendingChanges.clear();
|
||||||
|
return { text: "Applied all pending changes" };
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
**What requires approval:**
|
||||||
|
- src/*.ts (agent code)
|
||||||
|
- package.json (dependencies)
|
||||||
|
- system prompt changes
|
||||||
|
|
||||||
|
**What doesn't:**
|
||||||
|
- data/* (instance data)
|
||||||
|
- site/* (generated content)
|
||||||
|
- docs/* (documentation)
|
||||||
|
</pattern>
|
||||||
|
|
||||||
|
<design_questions>
|
||||||
|
## Questions to Ask When Designing
|
||||||
|
|
||||||
|
1. **What events trigger agent turns?** (messages, webhooks, timers, user requests)
|
||||||
|
2. **What primitives does the agent need?** (read, write, call API, restart)
|
||||||
|
3. **What decisions should the agent make?** (format, structure, priority, action)
|
||||||
|
4. **What decisions should be hardcoded?** (security boundaries, approval requirements)
|
||||||
|
5. **How does the agent verify its work?** (health checks, build verification)
|
||||||
|
6. **How does the agent recover from mistakes?** (git rollback, approval gates)
|
||||||
|
</design_questions>
|
||||||
@@ -0,0 +1,316 @@
|
|||||||
|
<overview>
|
||||||
|
How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.
|
||||||
|
|
||||||
|
**Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<principle name="primitives-not-workflows">
|
||||||
|
## Tools Are Primitives, Not Workflows
|
||||||
|
|
||||||
|
**Wrong approach:** Tools that encode business logic
|
||||||
|
```typescript
|
||||||
|
tool("process_feedback", {
|
||||||
|
feedback: z.string(),
|
||||||
|
category: z.enum(["bug", "feature", "question"]),
|
||||||
|
priority: z.enum(["low", "medium", "high"]),
|
||||||
|
}, async ({ feedback, category, priority }) => {
|
||||||
|
// Tool decides how to process
|
||||||
|
const processed = categorize(feedback);
|
||||||
|
const stored = await saveToDatabase(processed);
|
||||||
|
const notification = await notify(priority);
|
||||||
|
return { processed, stored, notification };
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
**Right approach:** Primitives that enable any workflow
|
||||||
|
```typescript
|
||||||
|
tool("store_item", {
|
||||||
|
key: z.string(),
|
||||||
|
value: z.any(),
|
||||||
|
}, async ({ key, value }) => {
|
||||||
|
await db.set(key, value);
|
||||||
|
return { text: `Stored ${key}` };
|
||||||
|
});
|
||||||
|
|
||||||
|
tool("send_message", {
|
||||||
|
channel: z.string(),
|
||||||
|
content: z.string(),
|
||||||
|
}, async ({ channel, content }) => {
|
||||||
|
await messenger.send(channel, content);
|
||||||
|
return { text: "Sent" };
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent decides categorization, priority, and when to notify based on the system prompt.
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<principle name="descriptive-names">
|
||||||
|
## Tools Should Have Descriptive, Primitive Names
|
||||||
|
|
||||||
|
Names should describe the capability, not the use case:
|
||||||
|
|
||||||
|
| Wrong | Right |
|
||||||
|
|-------|-------|
|
||||||
|
| `process_user_feedback` | `store_item` |
|
||||||
|
| `create_feedback_summary` | `write_file` |
|
||||||
|
| `send_notification` | `send_message` |
|
||||||
|
| `deploy_to_production` | `git_push` |
|
||||||
|
|
||||||
|
The prompt tells the agent *when* to use primitives. The tool just provides *capability*.
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<principle name="simple-inputs">
|
||||||
|
## Inputs Should Be Simple
|
||||||
|
|
||||||
|
Tools accept data. They don't accept decisions.
|
||||||
|
|
||||||
|
**Wrong:** Tool accepts decisions
|
||||||
|
```typescript
|
||||||
|
tool("format_content", {
|
||||||
|
content: z.string(),
|
||||||
|
format: z.enum(["markdown", "html", "json"]),
|
||||||
|
style: z.enum(["formal", "casual", "technical"]),
|
||||||
|
}, ...)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Right:** Tool accepts data, agent decides format
|
||||||
|
```typescript
|
||||||
|
tool("write_file", {
|
||||||
|
path: z.string(),
|
||||||
|
content: z.string(),
|
||||||
|
}, ...)
|
||||||
|
// Agent decides to write index.html with HTML content, or data.json with JSON
|
||||||
|
```
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<principle name="rich-outputs">
|
||||||
|
## Outputs Should Be Rich
|
||||||
|
|
||||||
|
Return enough information for the agent to verify and iterate.
|
||||||
|
|
||||||
|
**Wrong:** Minimal output
|
||||||
|
```typescript
|
||||||
|
async ({ key }) => {
|
||||||
|
await db.delete(key);
|
||||||
|
return { text: "Deleted" };
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Right:** Rich output
|
||||||
|
```typescript
|
||||||
|
async ({ key }) => {
|
||||||
|
const existed = await db.has(key);
|
||||||
|
if (!existed) {
|
||||||
|
return { text: `Key ${key} did not exist` };
|
||||||
|
}
|
||||||
|
await db.delete(key);
|
||||||
|
return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
|
||||||
|
}
|
||||||
|
```
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<design_template>
|
||||||
|
## Tool Design Template
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
|
||||||
|
import { z } from "zod";
|
||||||
|
|
||||||
|
export const serverName = createSdkMcpServer({
|
||||||
|
name: "server-name",
|
||||||
|
version: "1.0.0",
|
||||||
|
tools: [
|
||||||
|
// READ operations
|
||||||
|
tool(
|
||||||
|
"read_item",
|
||||||
|
"Read an item by key",
|
||||||
|
{ key: z.string().describe("Item key") },
|
||||||
|
async ({ key }) => {
|
||||||
|
const item = await storage.get(key);
|
||||||
|
return {
|
||||||
|
content: [{
|
||||||
|
type: "text",
|
||||||
|
text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
|
||||||
|
}],
|
||||||
|
isError: !item,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
|
||||||
|
tool(
|
||||||
|
"list_items",
|
||||||
|
"List all items, optionally filtered",
|
||||||
|
{
|
||||||
|
prefix: z.string().optional().describe("Filter by key prefix"),
|
||||||
|
limit: z.number().default(100).describe("Max items"),
|
||||||
|
},
|
||||||
|
async ({ prefix, limit }) => {
|
||||||
|
const items = await storage.list({ prefix, limit });
|
||||||
|
return {
|
||||||
|
content: [{
|
||||||
|
type: "text",
|
||||||
|
text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
|
||||||
|
}],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
|
||||||
|
// WRITE operations
|
||||||
|
tool(
|
||||||
|
"store_item",
|
||||||
|
"Store an item",
|
||||||
|
{
|
||||||
|
key: z.string().describe("Item key"),
|
||||||
|
value: z.any().describe("Item data"),
|
||||||
|
},
|
||||||
|
async ({ key, value }) => {
|
||||||
|
await storage.set(key, value);
|
||||||
|
return {
|
||||||
|
content: [{ type: "text", text: `Stored ${key}` }],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
|
||||||
|
tool(
|
||||||
|
"delete_item",
|
||||||
|
"Delete an item",
|
||||||
|
{ key: z.string().describe("Item key") },
|
||||||
|
async ({ key }) => {
|
||||||
|
const existed = await storage.delete(key);
|
||||||
|
return {
|
||||||
|
content: [{
|
||||||
|
type: "text",
|
||||||
|
text: existed ? `Deleted ${key}` : `${key} did not exist`,
|
||||||
|
}],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
|
||||||
|
// EXTERNAL operations
|
||||||
|
tool(
|
||||||
|
"call_api",
|
||||||
|
"Make an HTTP request",
|
||||||
|
{
|
||||||
|
url: z.string().url(),
|
||||||
|
method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
|
||||||
|
body: z.any().optional(),
|
||||||
|
},
|
||||||
|
async ({ url, method, body }) => {
|
||||||
|
const response = await fetch(url, { method, body: JSON.stringify(body) });
|
||||||
|
const text = await response.text();
|
||||||
|
return {
|
||||||
|
content: [{
|
||||||
|
type: "text",
|
||||||
|
text: `${response.status} ${response.statusText}\n\n${text}`,
|
||||||
|
}],
|
||||||
|
isError: !response.ok,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
],
|
||||||
|
});
|
||||||
|
```
|
||||||
|
</design_template>
|
||||||
|
|
||||||
|
<example name="feedback-server">
|
||||||
|
## Example: Feedback Storage Server
|
||||||
|
|
||||||
|
This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
export const feedbackMcpServer = createSdkMcpServer({
|
||||||
|
name: "feedback",
|
||||||
|
version: "1.0.0",
|
||||||
|
tools: [
|
||||||
|
tool(
|
||||||
|
"store_feedback",
|
||||||
|
"Store a feedback item",
|
||||||
|
{
|
||||||
|
item: z.object({
|
||||||
|
id: z.string(),
|
||||||
|
author: z.string(),
|
||||||
|
content: z.string(),
|
||||||
|
importance: z.number().min(1).max(5),
|
||||||
|
timestamp: z.string(),
|
||||||
|
status: z.string().optional(),
|
||||||
|
urls: z.array(z.string()).optional(),
|
||||||
|
metadata: z.any().optional(),
|
||||||
|
}).describe("Feedback item"),
|
||||||
|
},
|
||||||
|
async ({ item }) => {
|
||||||
|
await db.feedback.insert(item);
|
||||||
|
return {
|
||||||
|
content: [{
|
||||||
|
type: "text",
|
||||||
|
text: `Stored feedback ${item.id} from ${item.author}`,
|
||||||
|
}],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
|
||||||
|
tool(
|
||||||
|
"list_feedback",
|
||||||
|
"List feedback items",
|
||||||
|
{
|
||||||
|
limit: z.number().default(50),
|
||||||
|
status: z.string().optional(),
|
||||||
|
},
|
||||||
|
async ({ limit, status }) => {
|
||||||
|
const items = await db.feedback.list({ limit, status });
|
||||||
|
return {
|
||||||
|
content: [{
|
||||||
|
type: "text",
|
||||||
|
text: JSON.stringify(items, null, 2),
|
||||||
|
}],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
|
||||||
|
tool(
|
||||||
|
"update_feedback",
|
||||||
|
"Update a feedback item",
|
||||||
|
{
|
||||||
|
id: z.string(),
|
||||||
|
updates: z.object({
|
||||||
|
status: z.string().optional(),
|
||||||
|
importance: z.number().optional(),
|
||||||
|
metadata: z.any().optional(),
|
||||||
|
}),
|
||||||
|
},
|
||||||
|
async ({ id, updates }) => {
|
||||||
|
await db.feedback.update(id, updates);
|
||||||
|
return {
|
||||||
|
content: [{ type: "text", text: `Updated ${id}` }],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
),
|
||||||
|
],
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
The system prompt then tells the agent *how* to use these primitives:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Feedback Processing
|
||||||
|
|
||||||
|
When someone shares feedback:
|
||||||
|
1. Extract author, content, and any URLs
|
||||||
|
2. Rate importance 1-5 based on actionability
|
||||||
|
3. Store using feedback.store_feedback
|
||||||
|
4. If high importance (4-5), notify the channel
|
||||||
|
|
||||||
|
Use your judgment about importance ratings.
|
||||||
|
```
|
||||||
|
</example>
|
||||||
|
|
||||||
|
<checklist>
|
||||||
|
## MCP Tool Design Checklist
|
||||||
|
|
||||||
|
- [ ] Tool names describe capability, not use case
|
||||||
|
- [ ] Inputs are data, not decisions
|
||||||
|
- [ ] Outputs are rich (enough for agent to verify)
|
||||||
|
- [ ] CRUD operations are separate tools (not one mega-tool)
|
||||||
|
- [ ] No business logic in tool implementations
|
||||||
|
- [ ] Error states clearly communicated via `isError`
|
||||||
|
- [ ] Descriptions explain what the tool does, not when to use it
|
||||||
|
</checklist>
|
||||||
@@ -0,0 +1,317 @@
|
|||||||
|
<overview>
|
||||||
|
How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives.
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<diagnosis>
|
||||||
|
## Diagnosing Non-Prompt-Native Code
|
||||||
|
|
||||||
|
Signs your agent isn't prompt-native:
|
||||||
|
|
||||||
|
**Tools that encode workflows:**
|
||||||
|
```typescript
|
||||||
|
// RED FLAG: Tool contains business logic
|
||||||
|
tool("process_feedback", async ({ message }) => {
|
||||||
|
const category = categorize(message); // Logic in code
|
||||||
|
const priority = calculatePriority(message); // Logic in code
|
||||||
|
await store(message, category, priority); // Orchestration in code
|
||||||
|
if (priority > 3) await notify(); // Decision in code
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
**Agent calls functions instead of figuring things out:**
|
||||||
|
```typescript
|
||||||
|
// RED FLAG: Agent is just a function caller
|
||||||
|
"Use process_feedback to handle incoming messages"
|
||||||
|
// vs.
|
||||||
|
"When feedback comes in, decide importance, store it, notify if high"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Artificial limits on agent capability:**
|
||||||
|
```typescript
|
||||||
|
// RED FLAG: Tool prevents agent from doing what users can do
|
||||||
|
tool("read_file", async ({ path }) => {
|
||||||
|
if (!ALLOWED_PATHS.includes(path)) {
|
||||||
|
throw new Error("Not allowed to read this file");
|
||||||
|
}
|
||||||
|
return readFile(path);
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prompts that specify HOW instead of WHAT:**
|
||||||
|
```markdown
|
||||||
|
// RED FLAG: Micromanaging the agent
|
||||||
|
When creating a summary:
|
||||||
|
1. Use exactly 3 bullet points
|
||||||
|
2. Each bullet must be under 20 words
|
||||||
|
3. Format with em-dashes for sub-points
|
||||||
|
4. Bold the first word of each bullet
|
||||||
|
```
|
||||||
|
</diagnosis>
|
||||||
|
|
||||||
|
<refactoring_workflow>
|
||||||
|
## Step-by-Step Refactoring
|
||||||
|
|
||||||
|
**Step 1: Identify workflow tools**
|
||||||
|
|
||||||
|
List all your tools. Mark any that:
|
||||||
|
- Have business logic (categorize, calculate, decide)
|
||||||
|
- Orchestrate multiple operations
|
||||||
|
- Make decisions on behalf of the agent
|
||||||
|
- Contain conditional logic (if/else based on content)
|
||||||
|
|
||||||
|
**Step 2: Extract the primitives**
|
||||||
|
|
||||||
|
For each workflow tool, identify the underlying primitives:
|
||||||
|
|
||||||
|
| Workflow Tool | Hidden Primitives |
|
||||||
|
|---------------|-------------------|
|
||||||
|
| `process_feedback` | `store_item`, `send_message` |
|
||||||
|
| `generate_report` | `read_file`, `write_file` |
|
||||||
|
| `deploy_and_notify` | `git_push`, `send_message` |
|
||||||
|
|
||||||
|
**Step 3: Move behavior to the prompt**
|
||||||
|
|
||||||
|
Take the logic from your workflow tools and express it in natural language:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Before (in code):
|
||||||
|
async function processFeedback(message) {
|
||||||
|
const priority = message.includes("crash") ? 5 :
|
||||||
|
message.includes("bug") ? 4 : 3;
|
||||||
|
await store(message, priority);
|
||||||
|
if (priority >= 4) await notify();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
// After (in prompt):
|
||||||
|
## Feedback Processing
|
||||||
|
|
||||||
|
When someone shares feedback:
|
||||||
|
1. Rate importance 1-5:
|
||||||
|
- 5: Crashes, data loss, security issues
|
||||||
|
- 4: Bug reports with clear reproduction steps
|
||||||
|
- 3: General suggestions, minor issues
|
||||||
|
2. Store using store_item
|
||||||
|
3. If importance >= 4, notify the team
|
||||||
|
|
||||||
|
Use your judgment. Context matters more than keywords.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 4: Simplify tools to primitives**
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Before: 1 workflow tool
|
||||||
|
tool("process_feedback", { message, category, priority }, ...complex logic...)
|
||||||
|
|
||||||
|
// After: 2 primitive tools
|
||||||
|
tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
|
||||||
|
tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 5: Remove artificial limits**
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Before: Limited capability
|
||||||
|
tool("read_file", async ({ path }) => {
|
||||||
|
if (!isAllowed(path)) throw new Error("Forbidden");
|
||||||
|
return readFile(path);
|
||||||
|
});
|
||||||
|
|
||||||
|
// After: Full capability
|
||||||
|
tool("read_file", async ({ path }) => {
|
||||||
|
return readFile(path); // Agent can read anything
|
||||||
|
});
|
||||||
|
// Use approval gates for WRITES, not artificial limits on READS
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 6: Test with outcomes, not procedures**
|
||||||
|
|
||||||
|
Instead of testing "does it call the right function?", test "does it achieve the outcome?"
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Before: Testing procedure
|
||||||
|
expect(mockProcessFeedback).toHaveBeenCalledWith(...)
|
||||||
|
|
||||||
|
// After: Testing outcome
|
||||||
|
// Send feedback → Check it was stored with reasonable importance
|
||||||
|
// Send high-priority feedback → Check notification was sent
|
||||||
|
```
|
||||||
|
</refactoring_workflow>
|
||||||
|
|
||||||
|
<before_after>
|
||||||
|
## Before/After Examples
|
||||||
|
|
||||||
|
**Example 1: Feedback Processing**
|
||||||
|
|
||||||
|
Before:
|
||||||
|
```typescript
|
||||||
|
tool("handle_feedback", async ({ message, author }) => {
|
||||||
|
const category = detectCategory(message);
|
||||||
|
const priority = calculatePriority(message, category);
|
||||||
|
const feedbackId = await db.feedback.insert({
|
||||||
|
id: generateId(),
|
||||||
|
author,
|
||||||
|
message,
|
||||||
|
category,
|
||||||
|
priority,
|
||||||
|
timestamp: new Date().toISOString(),
|
||||||
|
});
|
||||||
|
|
||||||
|
if (priority >= 4) {
|
||||||
|
await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
return { feedbackId, category, priority };
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
After:
|
||||||
|
```typescript
|
||||||
|
// Simple storage primitive
|
||||||
|
tool("store_feedback", async ({ item }) => {
|
||||||
|
await db.feedback.insert(item);
|
||||||
|
return { text: `Stored feedback ${item.id}` };
|
||||||
|
});
|
||||||
|
|
||||||
|
// Simple message primitive
|
||||||
|
tool("send_message", async ({ channel, content }) => {
|
||||||
|
await discord.send(channel, content);
|
||||||
|
return { text: "Sent" };
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
System prompt:
|
||||||
|
```markdown
|
||||||
|
## Feedback Processing
|
||||||
|
|
||||||
|
When someone shares feedback:
|
||||||
|
1. Generate a unique ID
|
||||||
|
2. Rate importance 1-5 based on impact and urgency
|
||||||
|
3. Store using store_feedback with the full item
|
||||||
|
4. If importance >= 4, send a notification to the team channel
|
||||||
|
|
||||||
|
Importance guidelines:
|
||||||
|
- 5: Critical (crashes, data loss, security)
|
||||||
|
- 4: High (detailed bug reports, blocking issues)
|
||||||
|
- 3: Medium (suggestions, minor bugs)
|
||||||
|
- 2: Low (cosmetic, edge cases)
|
||||||
|
- 1: Minimal (off-topic, duplicates)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example 2: Report Generation**
|
||||||
|
|
||||||
|
Before:
|
||||||
|
```typescript
|
||||||
|
tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
|
||||||
|
const data = await fetchMetrics(startDate, endDate);
|
||||||
|
const summary = summarizeMetrics(data);
|
||||||
|
const charts = generateCharts(data);
|
||||||
|
|
||||||
|
if (format === "html") {
|
||||||
|
return renderHtmlReport(summary, charts);
|
||||||
|
} else if (format === "markdown") {
|
||||||
|
return renderMarkdownReport(summary, charts);
|
||||||
|
} else {
|
||||||
|
return renderPdfReport(summary, charts);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
After:
|
||||||
|
```typescript
|
||||||
|
tool("query_metrics", async ({ start, end }) => {
|
||||||
|
const data = await db.metrics.query({ start, end });
|
||||||
|
return { text: JSON.stringify(data, null, 2) };
|
||||||
|
});
|
||||||
|
|
||||||
|
tool("write_file", async ({ path, content }) => {
|
||||||
|
writeFileSync(path, content);
|
||||||
|
return { text: `Wrote ${path}` };
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
System prompt:
|
||||||
|
```markdown
|
||||||
|
## Report Generation
|
||||||
|
|
||||||
|
When asked to generate a report:
|
||||||
|
1. Query the relevant metrics using query_metrics
|
||||||
|
2. Analyze the data and identify key trends
|
||||||
|
3. Create a clear, well-formatted report
|
||||||
|
4. Write it using write_file in the appropriate format
|
||||||
|
|
||||||
|
Use your judgment about format and structure. Make it useful.
|
||||||
|
```
|
||||||
|
</before_after>
|
||||||
|
|
||||||
|
<common_challenges>
|
||||||
|
## Common Refactoring Challenges
|
||||||
|
|
||||||
|
**"But the agent might make mistakes!"**
|
||||||
|
|
||||||
|
Yes, and you can iterate. Change the prompt to add guidance:
|
||||||
|
```markdown
|
||||||
|
// Before
|
||||||
|
Rate importance 1-5.
|
||||||
|
|
||||||
|
// After (if agent keeps rating too high)
|
||||||
|
Rate importance 1-5. Be conservative—most feedback is 2-3.
|
||||||
|
Only use 4-5 for truly blocking or critical issues.
|
||||||
|
```
|
||||||
|
|
||||||
|
**"The workflow is complex!"**
|
||||||
|
|
||||||
|
Complex workflows can still be expressed in prompts. The agent is smart.
|
||||||
|
```markdown
|
||||||
|
When processing video feedback:
|
||||||
|
1. Check if it's a Loom, YouTube, or direct link
|
||||||
|
2. For YouTube, pass URL directly to video analysis
|
||||||
|
3. For others, download first, then analyze
|
||||||
|
4. Extract timestamped issues
|
||||||
|
5. Rate based on issue density and severity
|
||||||
|
```
|
||||||
|
|
||||||
|
**"We need deterministic behavior!"**
|
||||||
|
|
||||||
|
Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.
|
||||||
|
|
||||||
|
Keep in code:
|
||||||
|
- Security validation
|
||||||
|
- Rate limiting
|
||||||
|
- Audit logging
|
||||||
|
- Exact format requirements
|
||||||
|
|
||||||
|
Move to prompts:
|
||||||
|
- Categorization decisions
|
||||||
|
- Priority judgments
|
||||||
|
- Content generation
|
||||||
|
- Workflow orchestration
|
||||||
|
|
||||||
|
**"What about testing?"**
|
||||||
|
|
||||||
|
Test outcomes, not procedures:
|
||||||
|
- "Given this input, does the agent achieve the right result?"
|
||||||
|
- "Does stored feedback have reasonable importance ratings?"
|
||||||
|
- "Are notifications sent for truly high-priority items?"
|
||||||
|
</common_challenges>
|
||||||
|
|
||||||
|
<checklist>
|
||||||
|
## Refactoring Checklist
|
||||||
|
|
||||||
|
Diagnosis:
|
||||||
|
- [ ] Listed all tools with business logic
|
||||||
|
- [ ] Identified artificial limits on agent capability
|
||||||
|
- [ ] Found prompts that micromanage HOW
|
||||||
|
|
||||||
|
Refactoring:
|
||||||
|
- [ ] Extracted primitives from workflow tools
|
||||||
|
- [ ] Moved business logic to system prompt
|
||||||
|
- [ ] Removed artificial limits
|
||||||
|
- [ ] Simplified tool inputs to data, not decisions
|
||||||
|
|
||||||
|
Validation:
|
||||||
|
- [ ] Agent achieves same outcomes with primitives
|
||||||
|
- [ ] Behavior can be changed by editing prompts
|
||||||
|
- [ ] New features could be added without new tools
|
||||||
|
</checklist>
|
||||||
@@ -0,0 +1,269 @@
|
|||||||
|
<overview>
|
||||||
|
Self-modification is the advanced tier of agent native engineering: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
|
||||||
|
|
||||||
|
This is the logical extension of "whatever the developer can do, the agent can do."
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<why_self_modification>
|
||||||
|
## Why Self-Modification?
|
||||||
|
|
||||||
|
Traditional software is static—it does what you wrote, nothing more. Self-modifying agents can:
|
||||||
|
|
||||||
|
- **Fix their own bugs** - See an error, patch the code, restart
|
||||||
|
- **Add new capabilities** - User asks for something new, agent implements it
|
||||||
|
- **Evolve behavior** - Learn from feedback and adjust prompts
|
||||||
|
- **Deploy themselves** - Push code, trigger builds, restart
|
||||||
|
|
||||||
|
The agent becomes a living system that improves over time, not frozen code.
|
||||||
|
</why_self_modification>
|
||||||
|
|
||||||
|
<capabilities>
|
||||||
|
## What Self-Modification Enables
|
||||||
|
|
||||||
|
**Code modification:**
|
||||||
|
- Read and understand source files
|
||||||
|
- Write fixes and new features
|
||||||
|
- Commit and push to version control
|
||||||
|
- Trigger builds and verify they pass
|
||||||
|
|
||||||
|
**Prompt evolution:**
|
||||||
|
- Edit the system prompt based on feedback
|
||||||
|
- Add new features as prompt sections
|
||||||
|
- Refine judgment criteria that aren't working
|
||||||
|
|
||||||
|
**Infrastructure control:**
|
||||||
|
- Pull latest code from upstream
|
||||||
|
- Merge from other branches/instances
|
||||||
|
- Restart after changes
|
||||||
|
- Roll back if something breaks
|
||||||
|
|
||||||
|
**Site/output generation:**
|
||||||
|
- Generate and maintain websites
|
||||||
|
- Create documentation
|
||||||
|
- Build dashboards from data
|
||||||
|
</capabilities>
|
||||||
|
|
||||||
|
<guardrails>
|
||||||
|
## Required Guardrails
|
||||||
|
|
||||||
|
Self-modification is powerful. It needs safety mechanisms.
|
||||||
|
|
||||||
|
**Approval gates for code changes:**
|
||||||
|
```typescript
|
||||||
|
tool("write_file", async ({ path, content }) => {
|
||||||
|
if (isCodeFile(path)) {
|
||||||
|
// Store for approval, don't apply immediately
|
||||||
|
pendingChanges.set(path, content);
|
||||||
|
const diff = generateDiff(path, content);
|
||||||
|
return { text: `Requires approval:\n\n${diff}\n\nReply "yes" to apply.` };
|
||||||
|
}
|
||||||
|
// Non-code files apply immediately
|
||||||
|
writeFileSync(path, content);
|
||||||
|
return { text: `Wrote ${path}` };
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
**Auto-commit before changes:**
|
||||||
|
```typescript
|
||||||
|
tool("self_deploy", async () => {
|
||||||
|
// Save current state first
|
||||||
|
runGit("stash"); // or commit uncommitted changes
|
||||||
|
|
||||||
|
// Then pull/merge
|
||||||
|
runGit("fetch origin");
|
||||||
|
runGit("merge origin/main --no-edit");
|
||||||
|
|
||||||
|
// Build and verify
|
||||||
|
runCommand("npm run build");
|
||||||
|
|
||||||
|
// Only then restart
|
||||||
|
scheduleRestart();
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
**Build verification:**
|
||||||
|
```typescript
|
||||||
|
// Don't restart unless build passes
|
||||||
|
try {
|
||||||
|
runCommand("npm run build", { timeout: 120000 });
|
||||||
|
} catch (error) {
|
||||||
|
// Rollback the merge
|
||||||
|
runGit("merge --abort");
|
||||||
|
return { text: "Build failed, aborting deploy", isError: true };
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Health checks after restart:**
|
||||||
|
```typescript
|
||||||
|
tool("health_check", async () => {
|
||||||
|
const uptime = process.uptime();
|
||||||
|
const buildValid = existsSync("dist/index.js");
|
||||||
|
const gitClean = !runGit("status --porcelain");
|
||||||
|
|
||||||
|
return {
|
||||||
|
text: JSON.stringify({
|
||||||
|
status: "healthy",
|
||||||
|
uptime: `${Math.floor(uptime / 60)}m`,
|
||||||
|
build: buildValid ? "valid" : "missing",
|
||||||
|
git: gitClean ? "clean" : "uncommitted changes",
|
||||||
|
}, null, 2),
|
||||||
|
};
|
||||||
|
});
|
||||||
|
```
|
||||||
|
</guardrails>
|
||||||
|
|
||||||
|
<git_architecture>
|
||||||
|
## Git-Based Self-Modification
|
||||||
|
|
||||||
|
Use git as the foundation for self-modification. It provides:
|
||||||
|
- Version history (rollback capability)
|
||||||
|
- Branching (experiment safely)
|
||||||
|
- Merge (sync with other instances)
|
||||||
|
- Push/pull (deploy and collaborate)
|
||||||
|
|
||||||
|
**Essential git tools:**
|
||||||
|
```typescript
|
||||||
|
tool("status", "Show git status", {}, ...);
|
||||||
|
tool("diff", "Show file changes", { path: z.string().optional() }, ...);
|
||||||
|
tool("log", "Show commit history", { count: z.number() }, ...);
|
||||||
|
tool("commit_code", "Commit code changes", { message: z.string() }, ...);
|
||||||
|
tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...);
|
||||||
|
tool("pull", "Pull from GitHub", { source: z.enum(["main", "instance"]) }, ...);
|
||||||
|
tool("rollback", "Revert recent commits", { commits: z.number() }, ...);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Multi-instance architecture:**
|
||||||
|
```
|
||||||
|
main # Shared code
|
||||||
|
├── instance/bot-a # Instance A's branch
|
||||||
|
├── instance/bot-b # Instance B's branch
|
||||||
|
└── instance/bot-c # Instance C's branch
|
||||||
|
```
|
||||||
|
|
||||||
|
Each instance can:
|
||||||
|
- Pull updates from main
|
||||||
|
- Push improvements back to main (via PR)
|
||||||
|
- Sync features from other instances
|
||||||
|
- Maintain instance-specific config
|
||||||
|
</git_architecture>
|
||||||
|
|
||||||
|
<prompt_evolution>
|
||||||
|
## Self-Modifying Prompts
|
||||||
|
|
||||||
|
The system prompt is a file the agent can read and write.
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Agent can read its own prompt
|
||||||
|
tool("read_file", ...); // Can read src/prompts/system.md
|
||||||
|
|
||||||
|
// Agent can propose changes
|
||||||
|
tool("write_file", ...); // Can write to src/prompts/system.md (with approval)
|
||||||
|
```
|
||||||
|
|
||||||
|
**System prompt as living document:**
|
||||||
|
```markdown
|
||||||
|
## Feedback Processing
|
||||||
|
|
||||||
|
When someone shares feedback:
|
||||||
|
1. Acknowledge warmly
|
||||||
|
2. Rate importance 1-5
|
||||||
|
3. Store using feedback tools
|
||||||
|
|
||||||
|
<!-- Note to self: Video walkthroughs should always be 4-5,
|
||||||
|
learned this from Dan's feedback on 2024-12-07 -->
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent can:
|
||||||
|
- Add notes to itself
|
||||||
|
- Refine judgment criteria
|
||||||
|
- Add new feature sections
|
||||||
|
- Document edge cases it learned
|
||||||
|
</prompt_evolution>
|
||||||
|
|
||||||
|
<when_to_use>
|
||||||
|
## When to Implement Self-Modification
|
||||||
|
|
||||||
|
**Good candidates:**
|
||||||
|
- Long-running autonomous agents
|
||||||
|
- Agents that need to adapt to feedback
|
||||||
|
- Systems where behavior evolution is valuable
|
||||||
|
- Internal tools where rapid iteration matters
|
||||||
|
|
||||||
|
**Not necessary for:**
|
||||||
|
- Simple single-task agents
|
||||||
|
- Highly regulated environments
|
||||||
|
- Systems where behavior must be auditable
|
||||||
|
- One-off or short-lived agents
|
||||||
|
|
||||||
|
Start with a non-self-modifying prompt-native agent. Add self-modification when you need it.
|
||||||
|
</when_to_use>
|
||||||
|
|
||||||
|
<example_tools>
|
||||||
|
## Complete Self-Modification Toolset
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
const selfMcpServer = createSdkMcpServer({
|
||||||
|
name: "self",
|
||||||
|
version: "1.0.0",
|
||||||
|
tools: [
|
||||||
|
// FILE OPERATIONS
|
||||||
|
tool("read_file", "Read any project file", { path: z.string() }, ...),
|
||||||
|
tool("write_file", "Write a file (code requires approval)", { path, content }, ...),
|
||||||
|
tool("list_files", "List directory contents", { path: z.string() }, ...),
|
||||||
|
tool("search_code", "Search for patterns", { pattern: z.string() }, ...),
|
||||||
|
|
||||||
|
// APPROVAL WORKFLOW
|
||||||
|
tool("apply_pending", "Apply approved changes", {}, ...),
|
||||||
|
tool("get_pending", "Show pending changes", {}, ...),
|
||||||
|
tool("clear_pending", "Discard pending changes", {}, ...),
|
||||||
|
|
||||||
|
// RESTART
|
||||||
|
tool("restart", "Rebuild and restart", {}, ...),
|
||||||
|
tool("health_check", "Check if bot is healthy", {}, ...),
|
||||||
|
],
|
||||||
|
});
|
||||||
|
|
||||||
|
const gitMcpServer = createSdkMcpServer({
|
||||||
|
name: "git",
|
||||||
|
version: "1.0.0",
|
||||||
|
tools: [
|
||||||
|
// STATUS
|
||||||
|
tool("status", "Show git status", {}, ...),
|
||||||
|
tool("diff", "Show changes", { path: z.string().optional() }, ...),
|
||||||
|
tool("log", "Show history", { count: z.number() }, ...),
|
||||||
|
|
||||||
|
// COMMIT & PUSH
|
||||||
|
tool("commit_code", "Commit code changes", { message: z.string() }, ...),
|
||||||
|
tool("git_push", "Push to GitHub", { branch: z.string().optional() }, ...),
|
||||||
|
|
||||||
|
// SYNC
|
||||||
|
tool("pull", "Pull from upstream", { source: z.enum(["main", "instance"]) }, ...),
|
||||||
|
tool("self_deploy", "Pull, build, restart", { source: z.enum(["main", "instance"]) }, ...),
|
||||||
|
|
||||||
|
// SAFETY
|
||||||
|
tool("rollback", "Revert commits", { commits: z.number() }, ...),
|
||||||
|
tool("health_check", "Detailed health report", {}, ...),
|
||||||
|
],
|
||||||
|
});
|
||||||
|
```
|
||||||
|
</example_tools>
|
||||||
|
|
||||||
|
<checklist>
|
||||||
|
## Self-Modification Checklist
|
||||||
|
|
||||||
|
Before enabling self-modification:
|
||||||
|
- [ ] Git-based version control set up
|
||||||
|
- [ ] Approval gates for code changes
|
||||||
|
- [ ] Build verification before restart
|
||||||
|
- [ ] Rollback mechanism available
|
||||||
|
- [ ] Health check endpoint
|
||||||
|
- [ ] Instance identity configured
|
||||||
|
|
||||||
|
When implementing:
|
||||||
|
- [ ] Agent can read all project files
|
||||||
|
- [ ] Agent can write files (with appropriate approval)
|
||||||
|
- [ ] Agent can commit and push
|
||||||
|
- [ ] Agent can pull updates
|
||||||
|
- [ ] Agent can restart itself
|
||||||
|
- [ ] Agent can roll back if needed
|
||||||
|
</checklist>
|
||||||
@@ -0,0 +1,250 @@
|
|||||||
|
<overview>
|
||||||
|
How to write system prompts for prompt-native agents. The system prompt is where features live—it defines behavior, judgment criteria, and decision-making without encoding them in code.
|
||||||
|
</overview>
|
||||||
|
|
||||||
|
<principle name="features-in-prompts">
|
||||||
|
## Features Are Prompt Sections
|
||||||
|
|
||||||
|
Each feature is a section of the system prompt that tells the agent how to behave.
|
||||||
|
|
||||||
|
**Traditional approach:** Feature = function in codebase
|
||||||
|
```typescript
|
||||||
|
function processFeedback(message) {
|
||||||
|
const category = categorize(message);
|
||||||
|
const priority = calculatePriority(message);
|
||||||
|
await store(message, category, priority);
|
||||||
|
if (priority > 3) await notify();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prompt-native approach:** Feature = section in system prompt
|
||||||
|
```markdown
|
||||||
|
## Feedback Processing
|
||||||
|
|
||||||
|
When someone shares feedback:
|
||||||
|
1. Read the message to understand what they're saying
|
||||||
|
2. Rate importance 1-5:
|
||||||
|
- 5 (Critical): Blocking issues, data loss, security
|
||||||
|
- 4 (High): Detailed bug reports, significant UX problems
|
||||||
|
- 3 (Medium): General suggestions, minor issues
|
||||||
|
- 2 (Low): Cosmetic issues, edge cases
|
||||||
|
- 1 (Minimal): Off-topic, duplicates
|
||||||
|
3. Store using feedback.store_feedback
|
||||||
|
4. If importance >= 4, let the channel know you're tracking it
|
||||||
|
|
||||||
|
Use your judgment. Context matters.
|
||||||
|
```
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<structure>
|
||||||
|
## System Prompt Structure
|
||||||
|
|
||||||
|
A well-structured prompt-native system prompt:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Identity
|
||||||
|
|
||||||
|
You are [Name], [brief identity statement].
|
||||||
|
|
||||||
|
## Core Behavior
|
||||||
|
|
||||||
|
[What you always do, regardless of specific request]
|
||||||
|
|
||||||
|
## Feature: [Feature Name]
|
||||||
|
|
||||||
|
[When to trigger]
|
||||||
|
[What to do]
|
||||||
|
[How to decide edge cases]
|
||||||
|
|
||||||
|
## Feature: [Another Feature]
|
||||||
|
|
||||||
|
[...]
|
||||||
|
|
||||||
|
## Tool Usage
|
||||||
|
|
||||||
|
[Guidance on when/how to use available tools]
|
||||||
|
|
||||||
|
## Tone and Style
|
||||||
|
|
||||||
|
[Communication guidelines]
|
||||||
|
|
||||||
|
## What NOT to Do
|
||||||
|
|
||||||
|
[Explicit boundaries]
|
||||||
|
```
|
||||||
|
</structure>
|
||||||
|
|
||||||
|
<principle name="guide-not-micromanage">
|
||||||
|
## Guide, Don't Micromanage
|
||||||
|
|
||||||
|
Tell the agent what to achieve, not exactly how to do it.
|
||||||
|
|
||||||
|
**Micromanaging (bad):**
|
||||||
|
```markdown
|
||||||
|
When creating a summary:
|
||||||
|
1. Use exactly 3 bullet points
|
||||||
|
2. Each bullet under 20 words
|
||||||
|
3. Use em-dashes for sub-points
|
||||||
|
4. Bold the first word of each bullet
|
||||||
|
5. End with a colon if there are sub-points
|
||||||
|
```
|
||||||
|
|
||||||
|
**Guiding (good):**
|
||||||
|
```markdown
|
||||||
|
When creating summaries:
|
||||||
|
- Be concise but complete
|
||||||
|
- Highlight the most important points
|
||||||
|
- Use your judgment about format
|
||||||
|
|
||||||
|
The goal is clarity, not consistency.
|
||||||
|
```
|
||||||
|
|
||||||
|
Trust the agent's intelligence. It knows how to communicate.
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<principle name="judgment-criteria">
|
||||||
|
## Define Judgment Criteria, Not Rules
|
||||||
|
|
||||||
|
Instead of rules, provide criteria for making decisions.
|
||||||
|
|
||||||
|
**Rules (rigid):**
|
||||||
|
```markdown
|
||||||
|
If the message contains "bug", set importance to 4.
|
||||||
|
If the message contains "crash", set importance to 5.
|
||||||
|
```
|
||||||
|
|
||||||
|
**Judgment criteria (flexible):**
|
||||||
|
```markdown
|
||||||
|
## Importance Rating
|
||||||
|
|
||||||
|
Rate importance based on:
|
||||||
|
- **Impact**: How many users affected? How severe?
|
||||||
|
- **Urgency**: Is this blocking? Time-sensitive?
|
||||||
|
- **Actionability**: Can we actually fix this?
|
||||||
|
- **Evidence**: Video/screenshots vs vague description
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- "App crashes when I tap submit" → 4-5 (critical, reproducible)
|
||||||
|
- "The button color seems off" → 2 (cosmetic, non-blocking)
|
||||||
|
- "Video walkthrough with 15 timestamped issues" → 5 (high-quality evidence)
|
||||||
|
```
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<principle name="context-windows">
|
||||||
|
## Work With Context Windows
|
||||||
|
|
||||||
|
The agent sees: system prompt + recent messages + tool results. Design for this.
|
||||||
|
|
||||||
|
**Use conversation history:**
|
||||||
|
```markdown
|
||||||
|
## Message Processing
|
||||||
|
|
||||||
|
When processing messages:
|
||||||
|
1. Check if this relates to recent conversation
|
||||||
|
2. If someone is continuing a previous thread, maintain context
|
||||||
|
3. Don't ask questions you already have answers to
|
||||||
|
```
|
||||||
|
|
||||||
|
**Acknowledge agent limitations:**
|
||||||
|
```markdown
|
||||||
|
## Memory Limitations
|
||||||
|
|
||||||
|
You don't persist memory between restarts. Use the memory server:
|
||||||
|
- Before responding, check memory.recall for relevant context
|
||||||
|
- After important decisions, use memory.store to remember
|
||||||
|
- Store conversation threads, not individual messages
|
||||||
|
```
|
||||||
|
</principle>
|
||||||
|
|
||||||
|
<example name="feedback-bot">
|
||||||
|
## Example: Complete System Prompt
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# R2-C2 Feedback Bot
|
||||||
|
|
||||||
|
You are R2-C2, Every's feedback collection assistant. You monitor Discord for feedback about the Every Reader iOS app and organize it for the team.
|
||||||
|
|
||||||
|
## Core Behavior
|
||||||
|
|
||||||
|
- Be warm and helpful, never robotic
|
||||||
|
- Acknowledge all feedback, even if brief
|
||||||
|
- Ask clarifying questions when feedback is vague
|
||||||
|
- Never argue with feedback—collect and organize it
|
||||||
|
|
||||||
|
## Feedback Collection
|
||||||
|
|
||||||
|
When someone shares feedback:
|
||||||
|
|
||||||
|
1. **Acknowledge** warmly: "Thanks for this!" or "Good catch!"
|
||||||
|
2. **Clarify** if needed: "Can you tell me more about when this happens?"
|
||||||
|
3. **Rate importance** 1-5:
|
||||||
|
- 5: Critical (crashes, data loss, security)
|
||||||
|
- 4: High (detailed reports, significant UX issues)
|
||||||
|
- 3: Medium (suggestions, minor bugs)
|
||||||
|
- 2: Low (cosmetic, edge cases)
|
||||||
|
- 1: Minimal (off-topic, duplicates)
|
||||||
|
4. **Store** using feedback.store_feedback
|
||||||
|
5. **Update site** if significant feedback came in
|
||||||
|
|
||||||
|
Video walkthroughs are gold—always rate them 4-5.
|
||||||
|
|
||||||
|
## Site Management
|
||||||
|
|
||||||
|
You maintain a public feedback site. When feedback accumulates:
|
||||||
|
|
||||||
|
1. Sync data to site/public/content/feedback.json
|
||||||
|
2. Update status counts and organization
|
||||||
|
3. Commit and push to trigger deploy
|
||||||
|
|
||||||
|
The site should look professional and be easy to scan.
|
||||||
|
|
||||||
|
## Message Deduplication
|
||||||
|
|
||||||
|
Before processing any message:
|
||||||
|
1. Check memory.recall(key: "processed_{messageId}")
|
||||||
|
2. Skip if already processed
|
||||||
|
3. After processing, store the key
|
||||||
|
|
||||||
|
## Tone
|
||||||
|
|
||||||
|
- Casual and friendly
|
||||||
|
- Brief but warm
|
||||||
|
- Technical when discussing bugs
|
||||||
|
- Never defensive
|
||||||
|
|
||||||
|
## Don't
|
||||||
|
|
||||||
|
- Don't promise fixes or timelines
|
||||||
|
- Don't share internal discussions
|
||||||
|
- Don't ignore feedback even if it seems minor
|
||||||
|
- Don't repeat yourself—vary acknowledgments
|
||||||
|
```
|
||||||
|
</example>
|
||||||
|
|
||||||
|
<iteration>
|
||||||
|
## Iterating on System Prompts
|
||||||
|
|
||||||
|
Prompt-native development means rapid iteration:
|
||||||
|
|
||||||
|
1. **Observe** agent behavior in production
|
||||||
|
2. **Identify** gaps: "It's not rating video feedback high enough"
|
||||||
|
3. **Add guidance**: "Video walkthroughs are gold—always rate them 4-5"
|
||||||
|
4. **Deploy** (just edit the prompt file)
|
||||||
|
5. **Repeat**
|
||||||
|
|
||||||
|
No code changes. No recompilation. Just prose.
|
||||||
|
</iteration>
|
||||||
|
|
||||||
|
<checklist>
|
||||||
|
## System Prompt Checklist
|
||||||
|
|
||||||
|
- [ ] Clear identity statement
|
||||||
|
- [ ] Core behaviors that always apply
|
||||||
|
- [ ] Features as separate sections
|
||||||
|
- [ ] Judgment criteria instead of rigid rules
|
||||||
|
- [ ] Examples for ambiguous cases
|
||||||
|
- [ ] Explicit boundaries (what NOT to do)
|
||||||
|
- [ ] Tone guidance
|
||||||
|
- [ ] Tool usage guidance (when to use each)
|
||||||
|
- [ ] Memory/context handling
|
||||||
|
</checklist>
|
||||||
Reference in New Issue
Block a user