Files
claude-engineering-plugin/plugins/compound-engineering/skills/agent-native-architecture/references/mcp-tool-design.md
Dan Shipper 1bc6bd9164 [2.18.0] Add Dynamic Capability Discovery and iCloud sync patterns (#62)
* [2.17.0] Expand agent-native skill with mobile app learnings

Major expansion of agent-native-architecture skill based on real-world
learnings from building the Every Reader iOS app.

New reference documents:
- dynamic-context-injection.md: Runtime app state in system prompts
- action-parity-discipline.md: Ensuring agents can do what users can
- shared-workspace-architecture.md: Agents and users in same data space
- agent-native-testing.md: Testing patterns for agent-native apps
- mobile-patterns.md: Background execution, permissions, cost awareness

Updated references:
- architecture-patterns.md: Added Unified Agent Architecture, Agent-to-UI
  Communication, and Model Tier Selection patterns

Enhanced agent-native-reviewer with comprehensive review process covering
all new patterns, including mobile-specific verification.

Key insight: "The agent should be able to do anything the user can do,
through tools that mirror UI capabilities, with full context about the
app state."

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* [2.18.0] Add Dynamic Capability Discovery and iCloud sync patterns

New patterns in agent-native-architecture skill:

- **Dynamic Capability Discovery** - For agent-native apps integrating with
  external APIs (HealthKit, HomeKit, GraphQL), use a discovery tool (list_*)
  plus a generic access tool instead of individual tools per endpoint.
  (Note: Static mapping is fine for constrained agents with limited scope.)

- **CRUD Completeness** - Every entity needs create, read, update, AND delete.

- **iCloud File Storage** - Use iCloud Documents for shared workspace to get
  free, automatic multi-device sync without building a sync layer.

- **Architecture Review Checklist** - Pushes reviewer findings earlier into
  design phase. Covers tool design, action parity, UI integration, context.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-25 12:03:07 -06:00

15 KiB

How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.

Core principle: Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.

## Tools Are Primitives, Not Workflows

Wrong approach: Tools that encode business logic

tool("process_feedback", {
  feedback: z.string(),
  category: z.enum(["bug", "feature", "question"]),
  priority: z.enum(["low", "medium", "high"]),
}, async ({ feedback, category, priority }) => {
  // Tool decides how to process
  const processed = categorize(feedback);
  const stored = await saveToDatabase(processed);
  const notification = await notify(priority);
  return { processed, stored, notification };
});

Right approach: Primitives that enable any workflow

tool("store_item", {
  key: z.string(),
  value: z.any(),
}, async ({ key, value }) => {
  await db.set(key, value);
  return { text: `Stored ${key}` };
});

tool("send_message", {
  channel: z.string(),
  content: z.string(),
}, async ({ channel, content }) => {
  await messenger.send(channel, content);
  return { text: "Sent" };
});

The agent decides categorization, priority, and when to notify based on the system prompt.

## Tools Should Have Descriptive, Primitive Names

Names should describe the capability, not the use case:

Wrong Right
process_user_feedback store_item
create_feedback_summary write_file
send_notification send_message
deploy_to_production git_push

The prompt tells the agent when to use primitives. The tool just provides capability.

## Inputs Should Be Simple

Tools accept data. They don't accept decisions.

Wrong: Tool accepts decisions

tool("format_content", {
  content: z.string(),
  format: z.enum(["markdown", "html", "json"]),
  style: z.enum(["formal", "casual", "technical"]),
}, ...)

Right: Tool accepts data, agent decides format

tool("write_file", {
  path: z.string(),
  content: z.string(),
}, ...)
// Agent decides to write index.html with HTML content, or data.json with JSON
## Outputs Should Be Rich

Return enough information for the agent to verify and iterate.

Wrong: Minimal output

async ({ key }) => {
  await db.delete(key);
  return { text: "Deleted" };
}

Right: Rich output

async ({ key }) => {
  const existed = await db.has(key);
  if (!existed) {
    return { text: `Key ${key} did not exist` };
  }
  await db.delete(key);
  return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
}

<design_template>

Tool Design Template

import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";

export const serverName = createSdkMcpServer({
  name: "server-name",
  version: "1.0.0",
  tools: [
    // READ operations
    tool(
      "read_item",
      "Read an item by key",
      { key: z.string().describe("Item key") },
      async ({ key }) => {
        const item = await storage.get(key);
        return {
          content: [{
            type: "text",
            text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
          }],
          isError: !item,
        };
      }
    ),

    tool(
      "list_items",
      "List all items, optionally filtered",
      {
        prefix: z.string().optional().describe("Filter by key prefix"),
        limit: z.number().default(100).describe("Max items"),
      },
      async ({ prefix, limit }) => {
        const items = await storage.list({ prefix, limit });
        return {
          content: [{
            type: "text",
            text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
          }],
        };
      }
    ),

    // WRITE operations
    tool(
      "store_item",
      "Store an item",
      {
        key: z.string().describe("Item key"),
        value: z.any().describe("Item data"),
      },
      async ({ key, value }) => {
        await storage.set(key, value);
        return {
          content: [{ type: "text", text: `Stored ${key}` }],
        };
      }
    ),

    tool(
      "delete_item",
      "Delete an item",
      { key: z.string().describe("Item key") },
      async ({ key }) => {
        const existed = await storage.delete(key);
        return {
          content: [{
            type: "text",
            text: existed ? `Deleted ${key}` : `${key} did not exist`,
          }],
        };
      }
    ),

    // EXTERNAL operations
    tool(
      "call_api",
      "Make an HTTP request",
      {
        url: z.string().url(),
        method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
        body: z.any().optional(),
      },
      async ({ url, method, body }) => {
        const response = await fetch(url, { method, body: JSON.stringify(body) });
        const text = await response.text();
        return {
          content: [{
            type: "text",
            text: `${response.status} ${response.statusText}\n\n${text}`,
          }],
          isError: !response.ok,
        };
      }
    ),
  ],
});

</design_template>

## Example: Feedback Storage Server

This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.

export const feedbackMcpServer = createSdkMcpServer({
  name: "feedback",
  version: "1.0.0",
  tools: [
    tool(
      "store_feedback",
      "Store a feedback item",
      {
        item: z.object({
          id: z.string(),
          author: z.string(),
          content: z.string(),
          importance: z.number().min(1).max(5),
          timestamp: z.string(),
          status: z.string().optional(),
          urls: z.array(z.string()).optional(),
          metadata: z.any().optional(),
        }).describe("Feedback item"),
      },
      async ({ item }) => {
        await db.feedback.insert(item);
        return {
          content: [{
            type: "text",
            text: `Stored feedback ${item.id} from ${item.author}`,
          }],
        };
      }
    ),

    tool(
      "list_feedback",
      "List feedback items",
      {
        limit: z.number().default(50),
        status: z.string().optional(),
      },
      async ({ limit, status }) => {
        const items = await db.feedback.list({ limit, status });
        return {
          content: [{
            type: "text",
            text: JSON.stringify(items, null, 2),
          }],
        };
      }
    ),

    tool(
      "update_feedback",
      "Update a feedback item",
      {
        id: z.string(),
        updates: z.object({
          status: z.string().optional(),
          importance: z.number().optional(),
          metadata: z.any().optional(),
        }),
      },
      async ({ id, updates }) => {
        await db.feedback.update(id, updates);
        return {
          content: [{ type: "text", text: `Updated ${id}` }],
        };
      }
    ),
  ],
});

The system prompt then tells the agent how to use these primitives:

## Feedback Processing

When someone shares feedback:
1. Extract author, content, and any URLs
2. Rate importance 1-5 based on actionability
3. Store using feedback.store_feedback
4. If high importance (4-5), notify the channel

Use your judgment about importance ratings.
## Dynamic Capability Discovery vs Static Tool Mapping

This pattern is specifically for agent-native apps where you want the agent to have full access to an external API—the same access a user would have. It follows the core agent-native principle: "Whatever the user can do, the agent can do."

If you're building a constrained agent with limited capabilities, static tool mapping may be intentional. But for agent-native apps integrating with HealthKit, HomeKit, GraphQL, or similar APIs:

Static Tool Mapping (Anti-pattern for Agent-Native): Build individual tools for each API capability. Always out of date, limits agent to only what you anticipated.

// ❌ Static: Every API type needs a hardcoded tool
tool("read_steps", async ({ startDate, endDate }) => {
  return healthKit.query(HKQuantityType.stepCount, startDate, endDate);
});

tool("read_heart_rate", async ({ startDate, endDate }) => {
  return healthKit.query(HKQuantityType.heartRate, startDate, endDate);
});

tool("read_sleep", async ({ startDate, endDate }) => {
  return healthKit.query(HKCategoryType.sleepAnalysis, startDate, endDate);
});

// When HealthKit adds glucose tracking... you need a code change

Dynamic Capability Discovery (Preferred): Build a meta-tool that discovers what's available, and a generic tool that can access anything.

// ✅ Dynamic: Agent discovers and uses any capability

// Discovery tool - returns what's available at runtime
tool("list_available_capabilities", async () => {
  const quantityTypes = await healthKit.availableQuantityTypes();
  const categoryTypes = await healthKit.availableCategoryTypes();

  return {
    text: `Available health metrics:\n` +
          `Quantity types: ${quantityTypes.join(", ")}\n` +
          `Category types: ${categoryTypes.join(", ")}\n` +
          `\nUse read_health_data with any of these types.`
  };
});

// Generic access tool - type is a string, API validates
tool("read_health_data", {
  dataType: z.string(),  // NOT z.enum - let HealthKit validate
  startDate: z.string(),
  endDate: z.string(),
  aggregation: z.enum(["sum", "average", "samples"]).optional()
}, async ({ dataType, startDate, endDate, aggregation }) => {
  // HealthKit validates the type, returns helpful error if invalid
  const result = await healthKit.query(dataType, startDate, endDate, aggregation);
  return { text: JSON.stringify(result, null, 2) };
});

When to Use Each Approach:

Dynamic (Agent-Native) Static (Constrained Agent)
Agent should access anything user can Agent has intentionally limited scope
External API with many endpoints (HealthKit, HomeKit, GraphQL) Internal domain with fixed operations
API evolves independently of your code Tightly coupled domain logic
You want full action parity You want strict guardrails

The agent-native default is Dynamic. Only use Static when you're intentionally limiting the agent's capabilities.

Complete Dynamic Pattern:

// 1. Discovery tool: What can I access?
tool("list_health_types", "Get available health data types") { _ in
    let store = HKHealthStore()

    let quantityTypes = HKQuantityTypeIdentifier.allCases.map { $0.rawValue }
    let categoryTypes = HKCategoryTypeIdentifier.allCases.map { $0.rawValue }
    let characteristicTypes = HKCharacteristicTypeIdentifier.allCases.map { $0.rawValue }

    return ToolResult(text: """
        Available HealthKit types:

        ## Quantity Types (numeric values)
        \(quantityTypes.joined(separator: ", "))

        ## Category Types (categorical data)
        \(categoryTypes.joined(separator: ", "))

        ## Characteristic Types (user info)
        \(characteristicTypes.joined(separator: ", "))

        Use read_health_data or write_health_data with any of these.
        """)
}

// 2. Generic read: Access any type by name
tool("read_health_data", "Read any health metric", {
    dataType: z.string().describe("Type name from list_health_types"),
    startDate: z.string(),
    endDate: z.string()
}) { request in
    // Let HealthKit validate the type name
    guard let type = HKQuantityTypeIdentifier(rawValue: request.dataType)
                     ?? HKCategoryTypeIdentifier(rawValue: request.dataType) else {
        return ToolResult(
            text: "Unknown type: \(request.dataType). Use list_health_types to see available types.",
            isError: true
        )
    }

    let samples = try await healthStore.querySamples(type: type, start: startDate, end: endDate)
    return ToolResult(text: samples.formatted())
}

// 3. Context injection: Tell agent what's available in system prompt
func buildSystemPrompt() -> String {
    let availableTypes = healthService.getAuthorizedTypes()

    return """
    ## Available Health Data

    You have access to these health metrics:
    \(availableTypes.map { "- \($0)" }.joined(separator: "\n"))

    Use read_health_data with any type above. For new types not listed,
    use list_health_types to discover what's available.
    """
}

Benefits:

  • Agent can use any API capability, including ones added after your code shipped
  • API is the validator, not your enum definition
  • Smaller tool surface (2-3 tools vs N tools)
  • Agent naturally discovers capabilities by asking
  • Works with any API that has introspection (HealthKit, GraphQL, OpenAPI)
## CRUD Completeness

Every data type the agent can create, it should be able to read, update, and delete. Incomplete CRUD = broken action parity.

Anti-pattern: Create-only tools

// ❌ Can create but not modify or delete
tool("create_experiment", { hypothesis, variable, metric })
tool("write_journal_entry", { content, author, tags })
// User: "Delete that experiment" → Agent: "I can't do that"

Correct: Full CRUD for each entity

// ✅ Complete CRUD
tool("create_experiment", { hypothesis, variable, metric })
tool("read_experiment", { id })
tool("update_experiment", { id, updates: { hypothesis?, status?, endDate? } })
tool("delete_experiment", { id })

tool("create_journal_entry", { content, author, tags })
tool("read_journal", { query?, dateRange?, author? })
tool("update_journal_entry", { id, content, tags? })
tool("delete_journal_entry", { id })

The CRUD Audit: For each entity type in your app, verify:

  • Create: Agent can create new instances
  • Read: Agent can query/search/list instances
  • Update: Agent can modify existing instances
  • Delete: Agent can remove instances

If any operation is missing, users will eventually ask for it and the agent will fail.

## MCP Tool Design Checklist

Fundamentals:

  • Tool names describe capability, not use case
  • Inputs are data, not decisions
  • Outputs are rich (enough for agent to verify)
  • CRUD operations are separate tools (not one mega-tool)
  • No business logic in tool implementations
  • Error states clearly communicated via isError
  • Descriptions explain what the tool does, not when to use it

Dynamic Capability Discovery (for agent-native apps):

  • For external APIs where agent should have full access, use dynamic discovery
  • Include a list_* or discover_* tool for each API surface
  • Use string inputs (not enums) when the API validates
  • Inject available capabilities into system prompt at runtime
  • Only use static tool mapping if intentionally limiting agent scope

CRUD Completeness:

  • Every entity has create, read, update, delete operations
  • Every UI action has a corresponding agent tool
  • Test: "Can the agent undo what it just did?"