Files
claude-engineering-plugin/plugins/compound-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md
Kieran Klaassen 4ea9f52ba9 [2.10.0] Add agent-native reviewer and architecture skill
- Add agent-native-reviewer agent to verify features are agent-accessible
- Add agent-native-architecture skill for prompt-native design patterns
- Add agent-native-reviewer to /review command parallel agents
- Move agent-native skill to correct plugin folder
- Update component counts (25 agents, 12 skills)
- Include mermaid dark mode fix from PR #45

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-10 11:26:02 -08:00

8.4 KiB

How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives. ## Diagnosing Non-Prompt-Native Code

Signs your agent isn't prompt-native:

Tools that encode workflows:

// RED FLAG: Tool contains business logic
tool("process_feedback", async ({ message }) => {
  const category = categorize(message);        // Logic in code
  const priority = calculatePriority(message); // Logic in code
  await store(message, category, priority);    // Orchestration in code
  if (priority > 3) await notify();            // Decision in code
});

Agent calls functions instead of figuring things out:

// RED FLAG: Agent is just a function caller
"Use process_feedback to handle incoming messages"
// vs.
"When feedback comes in, decide importance, store it, notify if high"

Artificial limits on agent capability:

// RED FLAG: Tool prevents agent from doing what users can do
tool("read_file", async ({ path }) => {
  if (!ALLOWED_PATHS.includes(path)) {
    throw new Error("Not allowed to read this file");
  }
  return readFile(path);
});

Prompts that specify HOW instead of WHAT:

// RED FLAG: Micromanaging the agent
When creating a summary:
1. Use exactly 3 bullet points
2. Each bullet must be under 20 words
3. Format with em-dashes for sub-points
4. Bold the first word of each bullet

<refactoring_workflow>

Step-by-Step Refactoring

Step 1: Identify workflow tools

List all your tools. Mark any that:

  • Have business logic (categorize, calculate, decide)
  • Orchestrate multiple operations
  • Make decisions on behalf of the agent
  • Contain conditional logic (if/else based on content)

Step 2: Extract the primitives

For each workflow tool, identify the underlying primitives:

Workflow Tool Hidden Primitives
process_feedback store_item, send_message
generate_report read_file, write_file
deploy_and_notify git_push, send_message

Step 3: Move behavior to the prompt

Take the logic from your workflow tools and express it in natural language:

// Before (in code):
async function processFeedback(message) {
  const priority = message.includes("crash") ? 5 :
                   message.includes("bug") ? 4 : 3;
  await store(message, priority);
  if (priority >= 4) await notify();
}
// After (in prompt):
## Feedback Processing

When someone shares feedback:
1. Rate importance 1-5:
   - 5: Crashes, data loss, security issues
   - 4: Bug reports with clear reproduction steps
   - 3: General suggestions, minor issues
2. Store using store_item
3. If importance >= 4, notify the team

Use your judgment. Context matters more than keywords.

Step 4: Simplify tools to primitives

// Before: 1 workflow tool
tool("process_feedback", { message, category, priority }, ...complex logic...)

// After: 2 primitive tools
tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)

Step 5: Remove artificial limits

// Before: Limited capability
tool("read_file", async ({ path }) => {
  if (!isAllowed(path)) throw new Error("Forbidden");
  return readFile(path);
});

// After: Full capability
tool("read_file", async ({ path }) => {
  return readFile(path);  // Agent can read anything
});
// Use approval gates for WRITES, not artificial limits on READS

Step 6: Test with outcomes, not procedures

Instead of testing "does it call the right function?", test "does it achieve the outcome?"

// Before: Testing procedure
expect(mockProcessFeedback).toHaveBeenCalledWith(...)

// After: Testing outcome
// Send feedback → Check it was stored with reasonable importance
// Send high-priority feedback → Check notification was sent

</refactoring_workflow>

<before_after>

Before/After Examples

Example 1: Feedback Processing

Before:

tool("handle_feedback", async ({ message, author }) => {
  const category = detectCategory(message);
  const priority = calculatePriority(message, category);
  const feedbackId = await db.feedback.insert({
    id: generateId(),
    author,
    message,
    category,
    priority,
    timestamp: new Date().toISOString(),
  });

  if (priority >= 4) {
    await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
  }

  return { feedbackId, category, priority };
});

After:

// Simple storage primitive
tool("store_feedback", async ({ item }) => {
  await db.feedback.insert(item);
  return { text: `Stored feedback ${item.id}` };
});

// Simple message primitive
tool("send_message", async ({ channel, content }) => {
  await discord.send(channel, content);
  return { text: "Sent" };
});

System prompt:

## Feedback Processing

When someone shares feedback:
1. Generate a unique ID
2. Rate importance 1-5 based on impact and urgency
3. Store using store_feedback with the full item
4. If importance >= 4, send a notification to the team channel

Importance guidelines:
- 5: Critical (crashes, data loss, security)
- 4: High (detailed bug reports, blocking issues)
- 3: Medium (suggestions, minor bugs)
- 2: Low (cosmetic, edge cases)
- 1: Minimal (off-topic, duplicates)

Example 2: Report Generation

Before:

tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
  const data = await fetchMetrics(startDate, endDate);
  const summary = summarizeMetrics(data);
  const charts = generateCharts(data);

  if (format === "html") {
    return renderHtmlReport(summary, charts);
  } else if (format === "markdown") {
    return renderMarkdownReport(summary, charts);
  } else {
    return renderPdfReport(summary, charts);
  }
});

After:

tool("query_metrics", async ({ start, end }) => {
  const data = await db.metrics.query({ start, end });
  return { text: JSON.stringify(data, null, 2) };
});

tool("write_file", async ({ path, content }) => {
  writeFileSync(path, content);
  return { text: `Wrote ${path}` };
});

System prompt:

## Report Generation

When asked to generate a report:
1. Query the relevant metrics using query_metrics
2. Analyze the data and identify key trends
3. Create a clear, well-formatted report
4. Write it using write_file in the appropriate format

Use your judgment about format and structure. Make it useful.

</before_after>

<common_challenges>

Common Refactoring Challenges

"But the agent might make mistakes!"

Yes, and you can iterate. Change the prompt to add guidance:

// Before
Rate importance 1-5.

// After (if agent keeps rating too high)
Rate importance 1-5. Be conservative—most feedback is 2-3.
Only use 4-5 for truly blocking or critical issues.

"The workflow is complex!"

Complex workflows can still be expressed in prompts. The agent is smart.

When processing video feedback:
1. Check if it's a Loom, YouTube, or direct link
2. For YouTube, pass URL directly to video analysis
3. For others, download first, then analyze
4. Extract timestamped issues
5. Rate based on issue density and severity

"We need deterministic behavior!"

Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.

Keep in code:

  • Security validation
  • Rate limiting
  • Audit logging
  • Exact format requirements

Move to prompts:

  • Categorization decisions
  • Priority judgments
  • Content generation
  • Workflow orchestration

"What about testing?"

Test outcomes, not procedures:

  • "Given this input, does the agent achieve the right result?"
  • "Does stored feedback have reasonable importance ratings?"
  • "Are notifications sent for truly high-priority items?" </common_challenges>
## Refactoring Checklist

Diagnosis:

  • Listed all tools with business logic
  • Identified artificial limits on agent capability
  • Found prompts that micromanage HOW

Refactoring:

  • Extracted primitives from workflow tools
  • Moved business logic to system prompt
  • Removed artificial limits
  • Simplified tool inputs to data, not decisions

Validation:

  • Agent achieves same outcomes with primitives
  • Behavior can be changed by editing prompts
  • New features could be added without new tools