claude-engineering-plugin/plugins/compound-engineering/skills/agent-native-architecture/references/refactoring-to-prompt-native.md at 1bc6bd916404c1954cb8e26e22ba2ab317fb0ef2

Files

Kieran Klaassen 4ea9f52ba9 [2.10.0] Add agent-native reviewer and architecture skill

- Add agent-native-reviewer agent to verify features are agent-accessible
- Add agent-native-architecture skill for prompt-native design patterns
- Add agent-native-reviewer to /review command parallel agents
- Move agent-native skill to correct plugin folder
- Update component counts (25 agents, 12 skills)
- Include mermaid dark mode fix from PR #45

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-10 11:26:02 -08:00

8.4 KiB

Raw Blame History

How to refactor existing agent code to follow prompt-native principles. The goal: move behavior from code into prompts, and simplify tools into primitives. ## Diagnosing Non-Prompt-Native Code

Signs your agent isn't prompt-native:

Tools that encode workflows:

// RED FLAG: Tool contains business logic
tool("process_feedback", async ({ message }) => {
  const category = categorize(message);        // Logic in code
  const priority = calculatePriority(message); // Logic in code
  await store(message, category, priority);    // Orchestration in code
  if (priority > 3) await notify();            // Decision in code
});

Agent calls functions instead of figuring things out:

// RED FLAG: Agent is just a function caller
"Use process_feedback to handle incoming messages"
// vs.
"When feedback comes in, decide importance, store it, notify if high"

Artificial limits on agent capability:

// RED FLAG: Tool prevents agent from doing what users can do
tool("read_file", async ({ path }) => {
  if (!ALLOWED_PATHS.includes(path)) {
    throw new Error("Not allowed to read this file");
  }
  return readFile(path);
});

Prompts that specify HOW instead of WHAT:

// RED FLAG: Micromanaging the agent
When creating a summary:
1. Use exactly 3 bullet points
2. Each bullet must be under 20 words
3. Format with em-dashes for sub-points
4. Bold the first word of each bullet

<refactoring_workflow>

Step-by-Step Refactoring

Step 1: Identify workflow tools

List all your tools. Mark any that:

Have business logic (categorize, calculate, decide)
Orchestrate multiple operations
Make decisions on behalf of the agent
Contain conditional logic (if/else based on content)

Step 2: Extract the primitives

For each workflow tool, identify the underlying primitives:

Workflow Tool	Hidden Primitives
`process_feedback`	`store_item`, `send_message`
`generate_report`	`read_file`, `write_file`
`deploy_and_notify`	`git_push`, `send_message`

Step 3: Move behavior to the prompt

Take the logic from your workflow tools and express it in natural language:

// Before (in code):
async function processFeedback(message) {
  const priority = message.includes("crash") ? 5 :
                   message.includes("bug") ? 4 : 3;
  await store(message, priority);
  if (priority >= 4) await notify();
}

// After (in prompt):
## Feedback Processing

When someone shares feedback:
1. Rate importance 1-5:
   - 5: Crashes, data loss, security issues
   - 4: Bug reports with clear reproduction steps
   - 3: General suggestions, minor issues
2. Store using store_item
3. If importance >= 4, notify the team

Use your judgment. Context matters more than keywords.

Step 4: Simplify tools to primitives

// Before: 1 workflow tool
tool("process_feedback", { message, category, priority }, ...complex logic...)

// After: 2 primitive tools
tool("store_item", { key: z.string(), value: z.any() }, ...simple storage...)
tool("send_message", { channel: z.string(), content: z.string() }, ...simple send...)

Step 5: Remove artificial limits

// Before: Limited capability
tool("read_file", async ({ path }) => {
  if (!isAllowed(path)) throw new Error("Forbidden");
  return readFile(path);
});

// After: Full capability
tool("read_file", async ({ path }) => {
  return readFile(path);  // Agent can read anything
});
// Use approval gates for WRITES, not artificial limits on READS

Step 6: Test with outcomes, not procedures

Instead of testing "does it call the right function?", test "does it achieve the outcome?"

// Before: Testing procedure
expect(mockProcessFeedback).toHaveBeenCalledWith(...)

// After: Testing outcome
// Send feedback → Check it was stored with reasonable importance
// Send high-priority feedback → Check notification was sent

</refactoring_workflow>

<before_after>

Before/After Examples

Example 1: Feedback Processing

Before:

tool("handle_feedback", async ({ message, author }) => {
  const category = detectCategory(message);
  const priority = calculatePriority(message, category);
  const feedbackId = await db.feedback.insert({
    id: generateId(),
    author,
    message,
    category,
    priority,
    timestamp: new Date().toISOString(),
  });

  if (priority >= 4) {
    await discord.send(ALERT_CHANNEL, `High priority feedback from ${author}`);
  }

  return { feedbackId, category, priority };
});

After:

// Simple storage primitive
tool("store_feedback", async ({ item }) => {
  await db.feedback.insert(item);
  return { text: `Stored feedback ${item.id}` };
});

// Simple message primitive
tool("send_message", async ({ channel, content }) => {
  await discord.send(channel, content);
  return { text: "Sent" };
});

System prompt:

## Feedback Processing

When someone shares feedback:
1. Generate a unique ID
2. Rate importance 1-5 based on impact and urgency
3. Store using store_feedback with the full item
4. If importance >= 4, send a notification to the team channel

Importance guidelines:
- 5: Critical (crashes, data loss, security)
- 4: High (detailed bug reports, blocking issues)
- 3: Medium (suggestions, minor bugs)
- 2: Low (cosmetic, edge cases)
- 1: Minimal (off-topic, duplicates)

Example 2: Report Generation

Before:

tool("generate_weekly_report", async ({ startDate, endDate, format }) => {
  const data = await fetchMetrics(startDate, endDate);
  const summary = summarizeMetrics(data);
  const charts = generateCharts(data);

  if (format === "html") {
    return renderHtmlReport(summary, charts);
  } else if (format === "markdown") {
    return renderMarkdownReport(summary, charts);
  } else {
    return renderPdfReport(summary, charts);
  }
});

After:

tool("query_metrics", async ({ start, end }) => {
  const data = await db.metrics.query({ start, end });
  return { text: JSON.stringify(data, null, 2) };
});

tool("write_file", async ({ path, content }) => {
  writeFileSync(path, content);
  return { text: `Wrote ${path}` };
});

System prompt:

## Report Generation

When asked to generate a report:
1. Query the relevant metrics using query_metrics
2. Analyze the data and identify key trends
3. Create a clear, well-formatted report
4. Write it using write_file in the appropriate format

Use your judgment about format and structure. Make it useful.

</before_after>

<common_challenges>

Common Refactoring Challenges

"But the agent might make mistakes!"

Yes, and you can iterate. Change the prompt to add guidance:

// Before
Rate importance 1-5.

// After (if agent keeps rating too high)
Rate importance 1-5. Be conservative—most feedback is 2-3.
Only use 4-5 for truly blocking or critical issues.

"The workflow is complex!"

Complex workflows can still be expressed in prompts. The agent is smart.

When processing video feedback:
1. Check if it's a Loom, YouTube, or direct link
2. For YouTube, pass URL directly to video analysis
3. For others, download first, then analyze
4. Extract timestamped issues
5. Rate based on issue density and severity

"We need deterministic behavior!"

Some operations should stay in code. That's fine. Prompt-native isn't all-or-nothing.

Keep in code:

Security validation
Rate limiting
Audit logging
Exact format requirements

Move to prompts:

Categorization decisions
Priority judgments
Content generation
Workflow orchestration

"What about testing?"

Test outcomes, not procedures:

"Given this input, does the agent achieve the right result?"
"Does stored feedback have reasonable importance ratings?"
"Are notifications sent for truly high-priority items?" </common_challenges>

## Refactoring Checklist

Diagnosis:

Listed all tools with business logic
Identified artificial limits on agent capability
Found prompts that micromanage HOW

Refactoring:

Extracted primitives from workflow tools
Moved business logic to system prompt
Removed artificial limits
Simplified tool inputs to data, not decisions

Validation:

Agent achieves same outcomes with primitives
Behavior can be changed by editing prompts
New features could be added without new tools

8.4 KiB Raw Blame History

Step-by-Step Refactoring

Before/After Examples

Common Refactoring Challenges

8.4 KiB

Raw Blame History