* [2.17.0] Expand agent-native skill with mobile app learnings Major expansion of agent-native-architecture skill based on real-world learnings from building the Every Reader iOS app. New reference documents: - dynamic-context-injection.md: Runtime app state in system prompts - action-parity-discipline.md: Ensuring agents can do what users can - shared-workspace-architecture.md: Agents and users in same data space - agent-native-testing.md: Testing patterns for agent-native apps - mobile-patterns.md: Background execution, permissions, cost awareness Updated references: - architecture-patterns.md: Added Unified Agent Architecture, Agent-to-UI Communication, and Model Tier Selection patterns Enhanced agent-native-reviewer with comprehensive review process covering all new patterns, including mobile-specific verification. Key insight: "The agent should be able to do anything the user can do, through tools that mirror UI capabilities, with full context about the app state." 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * [2.18.0] Add Dynamic Capability Discovery and iCloud sync patterns New patterns in agent-native-architecture skill: - **Dynamic Capability Discovery** - For agent-native apps integrating with external APIs (HealthKit, HomeKit, GraphQL), use a discovery tool (list_*) plus a generic access tool instead of individual tools per endpoint. (Note: Static mapping is fine for constrained agents with limited scope.) - **CRUD Completeness** - Every entity needs create, read, update, AND delete. - **iCloud File Storage** - Use iCloud Documents for shared workspace to get free, automatic multi-device sync without building a sync layer. - **Architecture Review Checklist** - Pushes reviewer findings earlier into design phase. Covers tool design, action parity, UI integration, context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
507 lines
15 KiB
Markdown
507 lines
15 KiB
Markdown
<overview>
|
|
How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.
|
|
|
|
**Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.
|
|
</overview>
|
|
|
|
<principle name="primitives-not-workflows">
|
|
## Tools Are Primitives, Not Workflows
|
|
|
|
**Wrong approach:** Tools that encode business logic
|
|
```typescript
|
|
tool("process_feedback", {
|
|
feedback: z.string(),
|
|
category: z.enum(["bug", "feature", "question"]),
|
|
priority: z.enum(["low", "medium", "high"]),
|
|
}, async ({ feedback, category, priority }) => {
|
|
// Tool decides how to process
|
|
const processed = categorize(feedback);
|
|
const stored = await saveToDatabase(processed);
|
|
const notification = await notify(priority);
|
|
return { processed, stored, notification };
|
|
});
|
|
```
|
|
|
|
**Right approach:** Primitives that enable any workflow
|
|
```typescript
|
|
tool("store_item", {
|
|
key: z.string(),
|
|
value: z.any(),
|
|
}, async ({ key, value }) => {
|
|
await db.set(key, value);
|
|
return { text: `Stored ${key}` };
|
|
});
|
|
|
|
tool("send_message", {
|
|
channel: z.string(),
|
|
content: z.string(),
|
|
}, async ({ channel, content }) => {
|
|
await messenger.send(channel, content);
|
|
return { text: "Sent" };
|
|
});
|
|
```
|
|
|
|
The agent decides categorization, priority, and when to notify based on the system prompt.
|
|
</principle>
|
|
|
|
<principle name="descriptive-names">
|
|
## Tools Should Have Descriptive, Primitive Names
|
|
|
|
Names should describe the capability, not the use case:
|
|
|
|
| Wrong | Right |
|
|
|-------|-------|
|
|
| `process_user_feedback` | `store_item` |
|
|
| `create_feedback_summary` | `write_file` |
|
|
| `send_notification` | `send_message` |
|
|
| `deploy_to_production` | `git_push` |
|
|
|
|
The prompt tells the agent *when* to use primitives. The tool just provides *capability*.
|
|
</principle>
|
|
|
|
<principle name="simple-inputs">
|
|
## Inputs Should Be Simple
|
|
|
|
Tools accept data. They don't accept decisions.
|
|
|
|
**Wrong:** Tool accepts decisions
|
|
```typescript
|
|
tool("format_content", {
|
|
content: z.string(),
|
|
format: z.enum(["markdown", "html", "json"]),
|
|
style: z.enum(["formal", "casual", "technical"]),
|
|
}, ...)
|
|
```
|
|
|
|
**Right:** Tool accepts data, agent decides format
|
|
```typescript
|
|
tool("write_file", {
|
|
path: z.string(),
|
|
content: z.string(),
|
|
}, ...)
|
|
// Agent decides to write index.html with HTML content, or data.json with JSON
|
|
```
|
|
</principle>
|
|
|
|
<principle name="rich-outputs">
|
|
## Outputs Should Be Rich
|
|
|
|
Return enough information for the agent to verify and iterate.
|
|
|
|
**Wrong:** Minimal output
|
|
```typescript
|
|
async ({ key }) => {
|
|
await db.delete(key);
|
|
return { text: "Deleted" };
|
|
}
|
|
```
|
|
|
|
**Right:** Rich output
|
|
```typescript
|
|
async ({ key }) => {
|
|
const existed = await db.has(key);
|
|
if (!existed) {
|
|
return { text: `Key ${key} did not exist` };
|
|
}
|
|
await db.delete(key);
|
|
return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
|
|
}
|
|
```
|
|
</principle>
|
|
|
|
<design_template>
|
|
## Tool Design Template
|
|
|
|
```typescript
|
|
import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
|
|
import { z } from "zod";
|
|
|
|
export const serverName = createSdkMcpServer({
|
|
name: "server-name",
|
|
version: "1.0.0",
|
|
tools: [
|
|
// READ operations
|
|
tool(
|
|
"read_item",
|
|
"Read an item by key",
|
|
{ key: z.string().describe("Item key") },
|
|
async ({ key }) => {
|
|
const item = await storage.get(key);
|
|
return {
|
|
content: [{
|
|
type: "text",
|
|
text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
|
|
}],
|
|
isError: !item,
|
|
};
|
|
}
|
|
),
|
|
|
|
tool(
|
|
"list_items",
|
|
"List all items, optionally filtered",
|
|
{
|
|
prefix: z.string().optional().describe("Filter by key prefix"),
|
|
limit: z.number().default(100).describe("Max items"),
|
|
},
|
|
async ({ prefix, limit }) => {
|
|
const items = await storage.list({ prefix, limit });
|
|
return {
|
|
content: [{
|
|
type: "text",
|
|
text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
|
|
}],
|
|
};
|
|
}
|
|
),
|
|
|
|
// WRITE operations
|
|
tool(
|
|
"store_item",
|
|
"Store an item",
|
|
{
|
|
key: z.string().describe("Item key"),
|
|
value: z.any().describe("Item data"),
|
|
},
|
|
async ({ key, value }) => {
|
|
await storage.set(key, value);
|
|
return {
|
|
content: [{ type: "text", text: `Stored ${key}` }],
|
|
};
|
|
}
|
|
),
|
|
|
|
tool(
|
|
"delete_item",
|
|
"Delete an item",
|
|
{ key: z.string().describe("Item key") },
|
|
async ({ key }) => {
|
|
const existed = await storage.delete(key);
|
|
return {
|
|
content: [{
|
|
type: "text",
|
|
text: existed ? `Deleted ${key}` : `${key} did not exist`,
|
|
}],
|
|
};
|
|
}
|
|
),
|
|
|
|
// EXTERNAL operations
|
|
tool(
|
|
"call_api",
|
|
"Make an HTTP request",
|
|
{
|
|
url: z.string().url(),
|
|
method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
|
|
body: z.any().optional(),
|
|
},
|
|
async ({ url, method, body }) => {
|
|
const response = await fetch(url, { method, body: JSON.stringify(body) });
|
|
const text = await response.text();
|
|
return {
|
|
content: [{
|
|
type: "text",
|
|
text: `${response.status} ${response.statusText}\n\n${text}`,
|
|
}],
|
|
isError: !response.ok,
|
|
};
|
|
}
|
|
),
|
|
],
|
|
});
|
|
```
|
|
</design_template>
|
|
|
|
<example name="feedback-server">
|
|
## Example: Feedback Storage Server
|
|
|
|
This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.
|
|
|
|
```typescript
|
|
export const feedbackMcpServer = createSdkMcpServer({
|
|
name: "feedback",
|
|
version: "1.0.0",
|
|
tools: [
|
|
tool(
|
|
"store_feedback",
|
|
"Store a feedback item",
|
|
{
|
|
item: z.object({
|
|
id: z.string(),
|
|
author: z.string(),
|
|
content: z.string(),
|
|
importance: z.number().min(1).max(5),
|
|
timestamp: z.string(),
|
|
status: z.string().optional(),
|
|
urls: z.array(z.string()).optional(),
|
|
metadata: z.any().optional(),
|
|
}).describe("Feedback item"),
|
|
},
|
|
async ({ item }) => {
|
|
await db.feedback.insert(item);
|
|
return {
|
|
content: [{
|
|
type: "text",
|
|
text: `Stored feedback ${item.id} from ${item.author}`,
|
|
}],
|
|
};
|
|
}
|
|
),
|
|
|
|
tool(
|
|
"list_feedback",
|
|
"List feedback items",
|
|
{
|
|
limit: z.number().default(50),
|
|
status: z.string().optional(),
|
|
},
|
|
async ({ limit, status }) => {
|
|
const items = await db.feedback.list({ limit, status });
|
|
return {
|
|
content: [{
|
|
type: "text",
|
|
text: JSON.stringify(items, null, 2),
|
|
}],
|
|
};
|
|
}
|
|
),
|
|
|
|
tool(
|
|
"update_feedback",
|
|
"Update a feedback item",
|
|
{
|
|
id: z.string(),
|
|
updates: z.object({
|
|
status: z.string().optional(),
|
|
importance: z.number().optional(),
|
|
metadata: z.any().optional(),
|
|
}),
|
|
},
|
|
async ({ id, updates }) => {
|
|
await db.feedback.update(id, updates);
|
|
return {
|
|
content: [{ type: "text", text: `Updated ${id}` }],
|
|
};
|
|
}
|
|
),
|
|
],
|
|
});
|
|
```
|
|
|
|
The system prompt then tells the agent *how* to use these primitives:
|
|
|
|
```markdown
|
|
## Feedback Processing
|
|
|
|
When someone shares feedback:
|
|
1. Extract author, content, and any URLs
|
|
2. Rate importance 1-5 based on actionability
|
|
3. Store using feedback.store_feedback
|
|
4. If high importance (4-5), notify the channel
|
|
|
|
Use your judgment about importance ratings.
|
|
```
|
|
</example>
|
|
|
|
<principle name="dynamic-capability-discovery">
|
|
## Dynamic Capability Discovery vs Static Tool Mapping
|
|
|
|
**This pattern is specifically for agent-native apps** where you want the agent to have full access to an external API—the same access a user would have. It follows the core agent-native principle: "Whatever the user can do, the agent can do."
|
|
|
|
If you're building a constrained agent with limited capabilities, static tool mapping may be intentional. But for agent-native apps integrating with HealthKit, HomeKit, GraphQL, or similar APIs:
|
|
|
|
**Static Tool Mapping (Anti-pattern for Agent-Native):**
|
|
Build individual tools for each API capability. Always out of date, limits agent to only what you anticipated.
|
|
|
|
```typescript
|
|
// ❌ Static: Every API type needs a hardcoded tool
|
|
tool("read_steps", async ({ startDate, endDate }) => {
|
|
return healthKit.query(HKQuantityType.stepCount, startDate, endDate);
|
|
});
|
|
|
|
tool("read_heart_rate", async ({ startDate, endDate }) => {
|
|
return healthKit.query(HKQuantityType.heartRate, startDate, endDate);
|
|
});
|
|
|
|
tool("read_sleep", async ({ startDate, endDate }) => {
|
|
return healthKit.query(HKCategoryType.sleepAnalysis, startDate, endDate);
|
|
});
|
|
|
|
// When HealthKit adds glucose tracking... you need a code change
|
|
```
|
|
|
|
**Dynamic Capability Discovery (Preferred):**
|
|
Build a meta-tool that discovers what's available, and a generic tool that can access anything.
|
|
|
|
```typescript
|
|
// ✅ Dynamic: Agent discovers and uses any capability
|
|
|
|
// Discovery tool - returns what's available at runtime
|
|
tool("list_available_capabilities", async () => {
|
|
const quantityTypes = await healthKit.availableQuantityTypes();
|
|
const categoryTypes = await healthKit.availableCategoryTypes();
|
|
|
|
return {
|
|
text: `Available health metrics:\n` +
|
|
`Quantity types: ${quantityTypes.join(", ")}\n` +
|
|
`Category types: ${categoryTypes.join(", ")}\n` +
|
|
`\nUse read_health_data with any of these types.`
|
|
};
|
|
});
|
|
|
|
// Generic access tool - type is a string, API validates
|
|
tool("read_health_data", {
|
|
dataType: z.string(), // NOT z.enum - let HealthKit validate
|
|
startDate: z.string(),
|
|
endDate: z.string(),
|
|
aggregation: z.enum(["sum", "average", "samples"]).optional()
|
|
}, async ({ dataType, startDate, endDate, aggregation }) => {
|
|
// HealthKit validates the type, returns helpful error if invalid
|
|
const result = await healthKit.query(dataType, startDate, endDate, aggregation);
|
|
return { text: JSON.stringify(result, null, 2) };
|
|
});
|
|
```
|
|
|
|
**When to Use Each Approach:**
|
|
|
|
| Dynamic (Agent-Native) | Static (Constrained Agent) |
|
|
|------------------------|---------------------------|
|
|
| Agent should access anything user can | Agent has intentionally limited scope |
|
|
| External API with many endpoints (HealthKit, HomeKit, GraphQL) | Internal domain with fixed operations |
|
|
| API evolves independently of your code | Tightly coupled domain logic |
|
|
| You want full action parity | You want strict guardrails |
|
|
|
|
**The agent-native default is Dynamic.** Only use Static when you're intentionally limiting the agent's capabilities.
|
|
|
|
**Complete Dynamic Pattern:**
|
|
|
|
```swift
|
|
// 1. Discovery tool: What can I access?
|
|
tool("list_health_types", "Get available health data types") { _ in
|
|
let store = HKHealthStore()
|
|
|
|
let quantityTypes = HKQuantityTypeIdentifier.allCases.map { $0.rawValue }
|
|
let categoryTypes = HKCategoryTypeIdentifier.allCases.map { $0.rawValue }
|
|
let characteristicTypes = HKCharacteristicTypeIdentifier.allCases.map { $0.rawValue }
|
|
|
|
return ToolResult(text: """
|
|
Available HealthKit types:
|
|
|
|
## Quantity Types (numeric values)
|
|
\(quantityTypes.joined(separator: ", "))
|
|
|
|
## Category Types (categorical data)
|
|
\(categoryTypes.joined(separator: ", "))
|
|
|
|
## Characteristic Types (user info)
|
|
\(characteristicTypes.joined(separator: ", "))
|
|
|
|
Use read_health_data or write_health_data with any of these.
|
|
""")
|
|
}
|
|
|
|
// 2. Generic read: Access any type by name
|
|
tool("read_health_data", "Read any health metric", {
|
|
dataType: z.string().describe("Type name from list_health_types"),
|
|
startDate: z.string(),
|
|
endDate: z.string()
|
|
}) { request in
|
|
// Let HealthKit validate the type name
|
|
guard let type = HKQuantityTypeIdentifier(rawValue: request.dataType)
|
|
?? HKCategoryTypeIdentifier(rawValue: request.dataType) else {
|
|
return ToolResult(
|
|
text: "Unknown type: \(request.dataType). Use list_health_types to see available types.",
|
|
isError: true
|
|
)
|
|
}
|
|
|
|
let samples = try await healthStore.querySamples(type: type, start: startDate, end: endDate)
|
|
return ToolResult(text: samples.formatted())
|
|
}
|
|
|
|
// 3. Context injection: Tell agent what's available in system prompt
|
|
func buildSystemPrompt() -> String {
|
|
let availableTypes = healthService.getAuthorizedTypes()
|
|
|
|
return """
|
|
## Available Health Data
|
|
|
|
You have access to these health metrics:
|
|
\(availableTypes.map { "- \($0)" }.joined(separator: "\n"))
|
|
|
|
Use read_health_data with any type above. For new types not listed,
|
|
use list_health_types to discover what's available.
|
|
"""
|
|
}
|
|
```
|
|
|
|
**Benefits:**
|
|
- Agent can use any API capability, including ones added after your code shipped
|
|
- API is the validator, not your enum definition
|
|
- Smaller tool surface (2-3 tools vs N tools)
|
|
- Agent naturally discovers capabilities by asking
|
|
- Works with any API that has introspection (HealthKit, GraphQL, OpenAPI)
|
|
</principle>
|
|
|
|
<principle name="crud-completeness">
|
|
## CRUD Completeness
|
|
|
|
Every data type the agent can create, it should be able to read, update, and delete. Incomplete CRUD = broken action parity.
|
|
|
|
**Anti-pattern: Create-only tools**
|
|
```typescript
|
|
// ❌ Can create but not modify or delete
|
|
tool("create_experiment", { hypothesis, variable, metric })
|
|
tool("write_journal_entry", { content, author, tags })
|
|
// User: "Delete that experiment" → Agent: "I can't do that"
|
|
```
|
|
|
|
**Correct: Full CRUD for each entity**
|
|
```typescript
|
|
// ✅ Complete CRUD
|
|
tool("create_experiment", { hypothesis, variable, metric })
|
|
tool("read_experiment", { id })
|
|
tool("update_experiment", { id, updates: { hypothesis?, status?, endDate? } })
|
|
tool("delete_experiment", { id })
|
|
|
|
tool("create_journal_entry", { content, author, tags })
|
|
tool("read_journal", { query?, dateRange?, author? })
|
|
tool("update_journal_entry", { id, content, tags? })
|
|
tool("delete_journal_entry", { id })
|
|
```
|
|
|
|
**The CRUD Audit:**
|
|
For each entity type in your app, verify:
|
|
- [ ] Create: Agent can create new instances
|
|
- [ ] Read: Agent can query/search/list instances
|
|
- [ ] Update: Agent can modify existing instances
|
|
- [ ] Delete: Agent can remove instances
|
|
|
|
If any operation is missing, users will eventually ask for it and the agent will fail.
|
|
</principle>
|
|
|
|
<checklist>
|
|
## MCP Tool Design Checklist
|
|
|
|
**Fundamentals:**
|
|
- [ ] Tool names describe capability, not use case
|
|
- [ ] Inputs are data, not decisions
|
|
- [ ] Outputs are rich (enough for agent to verify)
|
|
- [ ] CRUD operations are separate tools (not one mega-tool)
|
|
- [ ] No business logic in tool implementations
|
|
- [ ] Error states clearly communicated via `isError`
|
|
- [ ] Descriptions explain what the tool does, not when to use it
|
|
|
|
**Dynamic Capability Discovery (for agent-native apps):**
|
|
- [ ] For external APIs where agent should have full access, use dynamic discovery
|
|
- [ ] Include a `list_*` or `discover_*` tool for each API surface
|
|
- [ ] Use string inputs (not enums) when the API validates
|
|
- [ ] Inject available capabilities into system prompt at runtime
|
|
- [ ] Only use static tool mapping if intentionally limiting agent scope
|
|
|
|
**CRUD Completeness:**
|
|
- [ ] Every entity has create, read, update, delete operations
|
|
- [ ] Every UI action has a corresponding agent tool
|
|
- [ ] Test: "Can the agent undo what it just did?"
|
|
</checklist>
|