How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives.
## Diagnosing Non-Prompt-Native Code
Signs your agent isn't prompt-native:
**Tools that encode workflows:**
```typescript
// RED FLAG: Tool contains business logic
tool("process_feedback", async ({ message }) => {
const category = categorize(message); // Logic in code
const priority = calculatePriority(message); // Logic in code
await store(message, category, priority); // Orchestration in code
if (priority > 3) await notify(); // Decision in code
});
```
**Agent calls functions instead of figuring things out:**
```typescript
// RED FLAG: Agent is just a function caller
"Use process_feedback to handle incoming messages"
// vs.
"When feedback comes in, decide importance, store it, notify if high"
```
**Artificial limits on agent capability:**
```typescript
// RED FLAG: Tool prevents agent from doing what users can do
tool("read_file", async ({ path }) => {
if (!ALLOWED_PATHS.includes(path)) {
throw new Error("Not allowed to read this file");
}
return readFile(path);
});
```
**Prompts that specify HOW instead of WHAT:**
```markdown
// RED FLAG: Micromanaging the agent
When creating a summary:
1. Use exactly 3 bullet points
2. Each bullet must be under 20 words
3. Format with em-dashes for sub-points
4. Bold the first word of each bullet
```
## Step-by-Step Refactoring
**Step 1: Identify workflow tools**
List all your tools. Mark any that:
- Have business logic (categorize, calculate, decide)
- Orchestrate multiple operations
- Make decisions on behalf of the agent
- Contain conditional logic (if/else based on content)
**Step 2: Extract the primitives**
For each workflow tool, identify the underlying primitives:
| Workflow Tool | Hidden Primitives |
|---------------|-------------------|
| `process_feedback` | `store_item`, `send_message` |
| `generate_report` | `read_file`, `write_file` |
| `deploy_and_notify` | `git_push`, `send_message` |
**Step 3: Move behavior to the prompt**
Take the logic from your workflow tools and express it in natural language:
```typescript
// Before (in code):
async function processFeedback(message) {
const priority = message.includes("crash") ? 5 :
message.includes("bug") ? 4 : 3;
await store(message, priority);
if (priority >= 4) await notify();
}
```
```markdown
// After (in prompt):
## Feedback Processing
When someone shares feedback:
1. Rate importance 1-5:
- 5: Crashes, data loss, security issues
- 4: Bug reports with clear reproduction steps
- 3: General suggestions, minor issues
2. Store using store_item
3. If importance >= 4, notify the team
Use your judgment. Context matters more than keywords.
```
**Step 4: Simplify tools to primitives**
```typescript
// Before: 1 workflow tool
tool("process_feedback", { message, category, priority }, ...complex logic...)
// After: 2 primitive tools
tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)
```
**Step 5: Remove artificial limits**
```typescript
// Before: Limited capability
tool("read_file", async ({ path }) => {
if (!isAllowed(path)) throw new Error("Forbidden");
return readFile(path);
});
// After: Full capability
tool("read_file", async ({ path }) => {
return readFile(path); // Agent can read anything
});
// Use approval gates for WRITES, not artificial limits on READS
```
**Step 6: Test with outcomes, not procedures**
Instead of testing "does it call the right function?", test "does it achieve the outcome?"
```typescript
// Before: Testing procedure
expect(mockProcessFeedback).toHaveBeenCalledWith(...)
// After: Testing outcome
// Send feedback → Check it was stored with reasonable importance
// Send high-priority feedback → Check notification was sent
```
## Before/After Examples
**Example 1: Feedback Processing**
Before:
```typescript
tool("handle_feedback", async ({ message, author }) => {
const category = detectCategory(message);
const priority = calculatePriority(message, category);
const feedbackId = await db.feedback.insert({
id: generateId(),
author,
message,
category,
priority,
timestamp: new Date().toISOString(),
});
if (priority >= 4) {
await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
}
return { feedbackId, category, priority };
});
```
After:
```typescript
// Simple storage primitive
tool("store_feedback", async ({ item }) => {
await db.feedback.insert(item);
return { text: `Stored feedback ${item.id}` };
});
// Simple message primitive
tool("send_message", async ({ channel, content }) => {
await discord.send(channel, content);
return { text: "Sent" };
});
```
System prompt:
```markdown
## Feedback Processing
When someone shares feedback:
1. Generate a unique ID
2. Rate importance 1-5 based on impact and urgency
3. Store using store_feedback with the full item
4. If importance >= 4, send a notification to the team channel
Importance guidelines:
- 5: Critical (crashes, data loss, security)
- 4: High (detailed bug reports, blocking issues)
- 3: Medium (suggestions, minor bugs)
- 2: Low (cosmetic, edge cases)
- 1: Minimal (off-topic, duplicates)
```
**Example 2: Report Generation**
Before:
```typescript
tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
const data = await fetchMetrics(startDate, endDate);
const summary = summarizeMetrics(data);
const charts = generateCharts(data);
if (format === "html") {
return renderHtmlReport(summary, charts);
} else if (format === "markdown") {
return renderMarkdownReport(summary, charts);
} else {
return renderPdfReport(summary, charts);
}
});
```
After:
```typescript
tool("query_metrics", async ({ start, end }) => {
const data = await db.metrics.query({ start, end });
return { text: JSON.stringify(data, null, 2) };
});
tool("write_file", async ({ path, content }) => {
writeFileSync(path, content);
return { text: `Wrote ${path}` };
});
```
System prompt:
```markdown
## Report Generation
When asked to generate a report:
1. Query the relevant metrics using query_metrics
2. Analyze the data and identify key trends
3. Create a clear, well-formatted report
4. Write it using write_file in the appropriate format
Use your judgment about format and structure. Make it useful.
```
## Common Refactoring Challenges
**"But the agent might make mistakes!"**
Yes, and you can iterate. Change the prompt to add guidance:
```markdown
// Before
Rate importance 1-5.
// After (if agent keeps rating too high)
Rate importance 1-5. Be conservative—most feedback is 2-3.
Only use 4-5 for truly blocking or critical issues.
```
**"The workflow is complex!"**
Complex workflows can still be expressed in prompts. The agent is smart.
```markdown
When processing video feedback:
1. Check if it's a Loom, YouTube, or direct link
2. For YouTube, pass URL directly to video analysis
3. For others, download first, then analyze
4. Extract timestamped issues
5. Rate based on issue density and severity
```
**"We need deterministic behavior!"**
Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.
Keep in code:
- Security validation
- Rate limiting
- Audit logging
- Exact format requirements
Move to prompts:
- Categorization decisions
- Priority judgments
- Content generation
- Workflow orchestration
**"What about testing?"**
Test outcomes, not procedures:
- "Given this input, does the agent achieve the right result?"
- "Does stored feedback have reasonable importance ratings?"
- "Are notifications sent for truly high-priority items?"
## Refactoring Checklist
Diagnosis:
- [ ] Listed all tools with business logic
- [ ] Identified artificial limits on agent capability
- [ ] Found prompts that micromanage HOW
Refactoring:
- [ ] Extracted primitives from workflow tools
- [ ] Moved business logic to system prompt
- [ ] Removed artificial limits
- [ ] Simplified tool inputs to data, not decisions
Validation:
- [ ] Agent achieves same outcomes with primitives
- [ ] Behavior can be changed by editing prompts
- [ ] New features could be added without new tools