- Add agent-native-reviewer agent to verify features are agent-accessible - Add agent-native-architecture skill for prompt-native design patterns - Add agent-native-reviewer to /review command parallel agents - Move agent-native skill to correct plugin folder - Update component counts (25 agents, 12 skills) - Include mermaid dark mode fix from PR #45 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
318 lines
8.4 KiB
Markdown
318 lines
8.4 KiB
Markdown
<overview>
|
|
How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives.
|
|
</overview>
|
|
|
|
<diagnosis>
|
|
## Diagnosing Non-Prompt-Native Code
|
|
|
|
Signs your agent isn't prompt-native:
|
|
|
|
**Tools that encode workflows:**
|
|
```typescript
|
|
// RED FLAG: Tool contains business logic
|
|
tool("process_feedback", async ({ message }) => {
|
|
const category = categorize(message); // Logic in code
|
|
const priority = calculatePriority(message); // Logic in code
|
|
await store(message, category, priority); // Orchestration in code
|
|
if (priority > 3) await notify(); // Decision in code
|
|
});
|
|
```
|
|
|
|
**Agent calls functions instead of figuring things out:**
|
|
```typescript
|
|
// RED FLAG: Agent is just a function caller
|
|
"Use process_feedback to handle incoming messages"
|
|
// vs.
|
|
"When feedback comes in, decide importance, store it, notify if high"
|
|
```
|
|
|
|
**Artificial limits on agent capability:**
|
|
```typescript
|
|
// RED FLAG: Tool prevents agent from doing what users can do
|
|
tool("read_file", async ({ path }) => {
|
|
if (!ALLOWED_PATHS.includes(path)) {
|
|
throw new Error("Not allowed to read this file");
|
|
}
|
|
return readFile(path);
|
|
});
|
|
```
|
|
|
|
**Prompts that specify HOW instead of WHAT:**
|
|
```markdown
|
|
// RED FLAG: Micromanaging the agent
|
|
When creating a summary:
|
|
1. Use exactly 3 bullet points
|
|
2. Each bullet must be under 20 words
|
|
3. Format with em-dashes for sub-points
|
|
4. Bold the first word of each bullet
|
|
```
|
|
</diagnosis>
|
|
|
|
<refactoring_workflow>
|
|
## Step-by-Step Refactoring
|
|
|
|
**Step 1: Identify workflow tools**
|
|
|
|
List all your tools. Mark any that:
|
|
- Have business logic (categorize, calculate, decide)
|
|
- Orchestrate multiple operations
|
|
- Make decisions on behalf of the agent
|
|
- Contain conditional logic (if/else based on content)
|
|
|
|
**Step 2: Extract the primitives**
|
|
|
|
For each workflow tool, identify the underlying primitives:
|
|
|
|
| Workflow Tool | Hidden Primitives |
|
|
|---------------|-------------------|
|
|
| `process_feedback` | `store_item`, `send_message` |
|
|
| `generate_report` | `read_file`, `write_file` |
|
|
| `deploy_and_notify` | `git_push`, `send_message` |
|
|
|
|
**Step 3: Move behavior to the prompt**
|
|
|
|
Take the logic from your workflow tools and express it in natural language:
|
|
|
|
```typescript
|
|
// Before (in code):
|
|
async function processFeedback(message) {
|
|
const priority = message.includes("crash") ? 5 :
|
|
message.includes("bug") ? 4 : 3;
|
|
await store(message, priority);
|
|
if (priority >= 4) await notify();
|
|
}
|
|
```
|
|
|
|
```markdown
|
|
// After (in prompt):
|
|
## Feedback Processing
|
|
|
|
When someone shares feedback:
|
|
1. Rate importance 1-5:
|
|
- 5: Crashes, data loss, security issues
|
|
- 4: Bug reports with clear reproduction steps
|
|
- 3: General suggestions, minor issues
|
|
2. Store using store_item
|
|
3. If importance >= 4, notify the team
|
|
|
|
Use your judgment. Context matters more than keywords.
|
|
```
|
|
|
|
**Step 4: Simplify tools to primitives**
|
|
|
|
```typescript
|
|
// Before: 1 workflow tool
|
|
tool("process_feedback", { message, category, priority }, ...complex logic...)
|
|
|
|
// After: 2 primitive tools
|
|
tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
|
|
tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)
|
|
```
|
|
|
|
**Step 5: Remove artificial limits**
|
|
|
|
```typescript
|
|
// Before: Limited capability
|
|
tool("read_file", async ({ path }) => {
|
|
if (!isAllowed(path)) throw new Error("Forbidden");
|
|
return readFile(path);
|
|
});
|
|
|
|
// After: Full capability
|
|
tool("read_file", async ({ path }) => {
|
|
return readFile(path); // Agent can read anything
|
|
});
|
|
// Use approval gates for WRITES, not artificial limits on READS
|
|
```
|
|
|
|
**Step 6: Test with outcomes, not procedures**
|
|
|
|
Instead of testing "does it call the right function?", test "does it achieve the outcome?"
|
|
|
|
```typescript
|
|
// Before: Testing procedure
|
|
expect(mockProcessFeedback).toHaveBeenCalledWith(...)
|
|
|
|
// After: Testing outcome
|
|
// Send feedback → Check it was stored with reasonable importance
|
|
// Send high-priority feedback → Check notification was sent
|
|
```
|
|
</refactoring_workflow>
|
|
|
|
<before_after>
|
|
## Before/After Examples
|
|
|
|
**Example 1: Feedback Processing**
|
|
|
|
Before:
|
|
```typescript
|
|
tool("handle_feedback", async ({ message, author }) => {
|
|
const category = detectCategory(message);
|
|
const priority = calculatePriority(message, category);
|
|
const feedbackId = await db.feedback.insert({
|
|
id: generateId(),
|
|
author,
|
|
message,
|
|
category,
|
|
priority,
|
|
timestamp: new Date().toISOString(),
|
|
});
|
|
|
|
if (priority >= 4) {
|
|
await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
|
|
}
|
|
|
|
return { feedbackId, category, priority };
|
|
});
|
|
```
|
|
|
|
After:
|
|
```typescript
|
|
// Simple storage primitive
|
|
tool("store_feedback", async ({ item }) => {
|
|
await db.feedback.insert(item);
|
|
return { text: `Stored feedback ${item.id}` };
|
|
});
|
|
|
|
// Simple message primitive
|
|
tool("send_message", async ({ channel, content }) => {
|
|
await discord.send(channel, content);
|
|
return { text: "Sent" };
|
|
});
|
|
```
|
|
|
|
System prompt:
|
|
```markdown
|
|
## Feedback Processing
|
|
|
|
When someone shares feedback:
|
|
1. Generate a unique ID
|
|
2. Rate importance 1-5 based on impact and urgency
|
|
3. Store using store_feedback with the full item
|
|
4. If importance >= 4, send a notification to the team channel
|
|
|
|
Importance guidelines:
|
|
- 5: Critical (crashes, data loss, security)
|
|
- 4: High (detailed bug reports, blocking issues)
|
|
- 3: Medium (suggestions, minor bugs)
|
|
- 2: Low (cosmetic, edge cases)
|
|
- 1: Minimal (off-topic, duplicates)
|
|
```
|
|
|
|
**Example 2: Report Generation**
|
|
|
|
Before:
|
|
```typescript
|
|
tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
|
|
const data = await fetchMetrics(startDate, endDate);
|
|
const summary = summarizeMetrics(data);
|
|
const charts = generateCharts(data);
|
|
|
|
if (format === "html") {
|
|
return renderHtmlReport(summary, charts);
|
|
} else if (format === "markdown") {
|
|
return renderMarkdownReport(summary, charts);
|
|
} else {
|
|
return renderPdfReport(summary, charts);
|
|
}
|
|
});
|
|
```
|
|
|
|
After:
|
|
```typescript
|
|
tool("query_metrics", async ({ start, end }) => {
|
|
const data = await db.metrics.query({ start, end });
|
|
return { text: JSON.stringify(data, null, 2) };
|
|
});
|
|
|
|
tool("write_file", async ({ path, content }) => {
|
|
writeFileSync(path, content);
|
|
return { text: `Wrote ${path}` };
|
|
});
|
|
```
|
|
|
|
System prompt:
|
|
```markdown
|
|
## Report Generation
|
|
|
|
When asked to generate a report:
|
|
1. Query the relevant metrics using query_metrics
|
|
2. Analyze the data and identify key trends
|
|
3. Create a clear, well-formatted report
|
|
4. Write it using write_file in the appropriate format
|
|
|
|
Use your judgment about format and structure. Make it useful.
|
|
```
|
|
</before_after>
|
|
|
|
<common_challenges>
|
|
## Common Refactoring Challenges
|
|
|
|
**"But the agent might make mistakes!"**
|
|
|
|
Yes, and you can iterate. Change the prompt to add guidance:
|
|
```markdown
|
|
// Before
|
|
Rate importance 1-5.
|
|
|
|
// After (if agent keeps rating too high)
|
|
Rate importance 1-5. Be conservative—most feedback is 2-3.
|
|
Only use 4-5 for truly blocking or critical issues.
|
|
```
|
|
|
|
**"The workflow is complex!"**
|
|
|
|
Complex workflows can still be expressed in prompts. The agent is smart.
|
|
```markdown
|
|
When processing video feedback:
|
|
1. Check if it's a Loom, YouTube, or direct link
|
|
2. For YouTube, pass URL directly to video analysis
|
|
3. For others, download first, then analyze
|
|
4. Extract timestamped issues
|
|
5. Rate based on issue density and severity
|
|
```
|
|
|
|
**"We need deterministic behavior!"**
|
|
|
|
Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.
|
|
|
|
Keep in code:
|
|
- Security validation
|
|
- Rate limiting
|
|
- Audit logging
|
|
- Exact format requirements
|
|
|
|
Move to prompts:
|
|
- Categorization decisions
|
|
- Priority judgments
|
|
- Content generation
|
|
- Workflow orchestration
|
|
|
|
**"What about testing?"**
|
|
|
|
Test outcomes, not procedures:
|
|
- "Given this input, does the agent achieve the right result?"
|
|
- "Does stored feedback have reasonable importance ratings?"
|
|
- "Are notifications sent for truly high-priority items?"
|
|
</common_challenges>
|
|
|
|
<checklist>
|
|
## Refactoring Checklist
|
|
|
|
Diagnosis:
|
|
- [ ] Listed all tools with business logic
|
|
- [ ] Identified artificial limits on agent capability
|
|
- [ ] Found prompts that micromanage HOW
|
|
|
|
Refactoring:
|
|
- [ ] Extracted primitives from workflow tools
|
|
- [ ] Moved business logic to system prompt
|
|
- [ ] Removed artificial limits
|
|
- [ ] Simplified tool inputs to data, not decisions
|
|
|
|
Validation:
|
|
- [ ] Agent achieves same outcomes with primitives
|
|
- [ ] Behavior can be changed by editing prompts
|
|
- [ ] New features could be added without new tools
|
|
</checklist>
|