refactor(cli)!: rename all skills and agents to consistent ce- prefix (#503)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 15:44:22 -07:00
parent 49249d7317
commit 5c0ec9137a
233 changed files with 3199 additions and 936 deletions
--- a/plugins/compound-engineering/skills/ce-agent-native-architecture/references/mcp-tool-design.md
+++ b/plugins/compound-engineering/skills/ce-agent-native-architecture/references/mcp-tool-design.md
@@ -0,0 +1,506 @@
+<overview>
+How to design MCP tools following prompt-native principles. Tools should be primitives that enable capability, not workflows that encode decisions.
+
+**Core principle:** Whatever a user can do, the agent should be able to do. Don't artificially limit the agent—give it the same primitives a power user would have.
+</overview>
+
+<principle name="primitives-not-workflows">
+## Tools Are Primitives, Not Workflows
+
+**Wrong approach:** Tools that encode business logic
+```typescript
+tool("process_feedback", {
+  feedback: z.string(),
+  category: z.enum(["bug", "feature", "question"]),
+  priority: z.enum(["low", "medium", "high"]),
+}, async ({ feedback, category, priority }) => {
+  // Tool decides how to process
+  const processed = categorize(feedback);
+  const stored = await saveToDatabase(processed);
+  const notification = await notify(priority);
+  return { processed, stored, notification };
+});
+```
+
+**Right approach:** Primitives that enable any workflow
+```typescript
+tool("store_item", {
+  key: z.string(),
+  value: z.any(),
+}, async ({ key, value }) => {
+  await db.set(key, value);
+  return { text: `Stored ${key}` };
+});
+
+tool("send_message", {
+  channel: z.string(),
+  content: z.string(),
+}, async ({ channel, content }) => {
+  await messenger.send(channel, content);
+  return { text: "Sent" };
+});
+```
+
+The agent decides categorization, priority, and when to notify based on the system prompt.
+</principle>
+
+<principle name="descriptive-names">
+## Tools Should Have Descriptive, Primitive Names
+
+Names should describe the capability, not the use case:
+
+| Wrong | Right |
+|-------|-------|
+| `process_user_feedback` | `store_item` |
+| `create_feedback_summary` | `write_file` |
+| `send_notification` | `send_message` |
+| `deploy_to_production` | `git_push` |
+
+The prompt tells the agent *when* to use primitives. The tool just provides *capability*.
+</principle>
+
+<principle name="simple-inputs">
+## Inputs Should Be Simple
+
+Tools accept data. They don't accept decisions.
+
+**Wrong:** Tool accepts decisions
+```typescript
+tool("format_content", {
+  content: z.string(),
+  format: z.enum(["markdown", "html", "json"]),
+  style: z.enum(["formal", "casual", "technical"]),
+}, ...)
+```
+
+**Right:** Tool accepts data, agent decides format
+```typescript
+tool("write_file", {
+  path: z.string(),
+  content: z.string(),
+}, ...)
+// Agent decides to write index.html with HTML content, or data.json with JSON
+```
+</principle>
+
+<principle name="rich-outputs">
+## Outputs Should Be Rich
+
+Return enough information for the agent to verify and iterate.
+
+**Wrong:** Minimal output
+```typescript
+async ({ key }) => {
+  await db.delete(key);
+  return { text: "Deleted" };
+}
+```
+
+**Right:** Rich output
+```typescript
+async ({ key }) => {
+  const existed = await db.has(key);
+  if (!existed) {
+    return { text: `Key ${key} did not exist` };
+  }
+  await db.delete(key);
+  return { text: `Deleted ${key}. ${await db.count()} items remaining.` };
+}
+```
+</principle>
+
+<design_template>
+## Tool Design Template
+
+```typescript
+import { createSdkMcpServer, tool } from "@anthropic-ai/claude-agent-sdk";
+import { z } from "zod";
+
+export const serverName = createSdkMcpServer({
+  name: "server-name",
+  version: "1.0.0",
+  tools: [
+    // READ operations
+    tool(
+      "read_item",
+      "Read an item by key",
+      { key: z.string().describe("Item key") },
+      async ({ key }) => {
+        const item = await storage.get(key);
+        return {
+          content: [{
+            type: "text",
+            text: item ? JSON.stringify(item, null, 2) : `Not found: ${key}`,
+          }],
+          isError: !item,
+        };
+      }
+    ),
+
+    tool(
+      "list_items",
+      "List all items, optionally filtered",
+      {
+        prefix: z.string().optional().describe("Filter by key prefix"),
+        limit: z.number().default(100).describe("Max items"),
+      },
+      async ({ prefix, limit }) => {
+        const items = await storage.list({ prefix, limit });
+        return {
+          content: [{
+            type: "text",
+            text: `Found ${items.length} items:\n${items.map(i => i.key).join("\n")}`,
+          }],
+        };
+      }
+    ),
+
+    // WRITE operations
+    tool(
+      "store_item",
+      "Store an item",
+      {
+        key: z.string().describe("Item key"),
+        value: z.any().describe("Item data"),
+      },
+      async ({ key, value }) => {
+        await storage.set(key, value);
+        return {
+          content: [{ type: "text", text: `Stored ${key}` }],
+        };
+      }
+    ),
+
+    tool(
+      "delete_item",
+      "Delete an item",
+      { key: z.string().describe("Item key") },
+      async ({ key }) => {
+        const existed = await storage.delete(key);
+        return {
+          content: [{
+            type: "text",
+            text: existed ? `Deleted ${key}` : `${key} did not exist`,
+          }],
+        };
+      }
+    ),
+
+    // EXTERNAL operations
+    tool(
+      "call_api",
+      "Make an HTTP request",
+      {
+        url: z.string().url(),
+        method: z.enum(["GET", "POST", "PUT", "DELETE"]).default("GET"),
+        body: z.any().optional(),
+      },
+      async ({ url, method, body }) => {
+        const response = await fetch(url, { method, body: JSON.stringify(body) });
+        const text = await response.text();
+        return {
+          content: [{
+            type: "text",
+            text: `${response.status} ${response.statusText}\n\n${text}`,
+          }],
+          isError: !response.ok,
+        };
+      }
+    ),
+  ],
+});
+```
+</design_template>
+
+<example name="feedback-server">
+## Example: Feedback Storage Server
+
+This server provides primitives for storing feedback. It does NOT decide how to categorize or organize feedback—that's the agent's job via the prompt.
+
+```typescript
+export const feedbackMcpServer = createSdkMcpServer({
+  name: "feedback",
+  version: "1.0.0",
+  tools: [
+    tool(
+      "store_feedback",
+      "Store a feedback item",
+      {
+        item: z.object({
+          id: z.string(),
+          author: z.string(),
+          content: z.string(),
+          importance: z.number().min(1).max(5),
+          timestamp: z.string(),
+          status: z.string().optional(),
+          urls: z.array(z.string()).optional(),
+          metadata: z.any().optional(),
+        }).describe("Feedback item"),
+      },
+      async ({ item }) => {
+        await db.feedback.insert(item);
+        return {
+          content: [{
+            type: "text",
+            text: `Stored feedback ${item.id} from ${item.author}`,
+          }],
+        };
+      }
+    ),
+
+    tool(
+      "list_feedback",
+      "List feedback items",
+      {
+        limit: z.number().default(50),
+        status: z.string().optional(),
+      },
+      async ({ limit, status }) => {
+        const items = await db.feedback.list({ limit, status });
+        return {
+          content: [{
+            type: "text",
+            text: JSON.stringify(items, null, 2),
+          }],
+        };
+      }
+    ),
+
+    tool(
+      "update_feedback",
+      "Update a feedback item",
+      {
+        id: z.string(),
+        updates: z.object({
+          status: z.string().optional(),
+          importance: z.number().optional(),
+          metadata: z.any().optional(),
+        }),
+      },
+      async ({ id, updates }) => {
+        await db.feedback.update(id, updates);
+        return {
+          content: [{ type: "text", text: `Updated ${id}` }],
+        };
+      }
+    ),
+  ],
+});
+```
+
+The system prompt then tells the agent *how* to use these primitives:
+
+```markdown
+## Feedback Processing
+
+When someone shares feedback:
+1. Extract author, content, and any URLs
+2. Rate importance 1-5 based on actionability
+3. Store using feedback.store_feedback
+4. If high importance (4-5), notify the channel
+
+Use your judgment about importance ratings.
+```
+</example>
+
+<principle name="dynamic-capability-discovery">
+## Dynamic Capability Discovery vs Static Tool Mapping
+
+**This pattern is specifically for agent-native apps** where you want the agent to have full access to an external API—the same access a user would have. It follows the core agent-native principle: "Whatever the user can do, the agent can do."
+
+If you're building a constrained agent with limited capabilities, static tool mapping may be intentional. But for agent-native apps integrating with HealthKit, HomeKit, GraphQL, or similar APIs:
+
+**Static Tool Mapping (Anti-pattern for Agent-Native):**
+Build individual tools for each API capability. Always out of date, limits agent to only what you anticipated.
+
+```typescript
+// ❌ Static: Every API type needs a hardcoded tool
+tool("read_steps", async ({ startDate, endDate }) => {
+  return healthKit.query(HKQuantityType.stepCount, startDate, endDate);
+});
+
+tool("read_heart_rate", async ({ startDate, endDate }) => {
+  return healthKit.query(HKQuantityType.heartRate, startDate, endDate);
+});
+
+tool("read_sleep", async ({ startDate, endDate }) => {
+  return healthKit.query(HKCategoryType.sleepAnalysis, startDate, endDate);
+});
+
+// When HealthKit adds glucose tracking... you need a code change
+```
+
+**Dynamic Capability Discovery (Preferred):**
+Build a meta-tool that discovers what's available, and a generic tool that can access anything.
+
+```typescript
+// ✅ Dynamic: Agent discovers and uses any capability
+
+// Discovery tool - returns what's available at runtime
+tool("list_available_capabilities", async () => {
+  const quantityTypes = await healthKit.availableQuantityTypes();
+  const categoryTypes = await healthKit.availableCategoryTypes();
+
+  return {
+    text: `Available health metrics:\n` +
+          `Quantity types: ${quantityTypes.join(", ")}\n` +
+          `Category types: ${categoryTypes.join(", ")}\n` +
+          `\nUse read_health_data with any of these types.`
+  };
+});
+
+// Generic access tool - type is a string, API validates
+tool("read_health_data", {
+  dataType: z.string(),  // NOT z.enum - let HealthKit validate
+  startDate: z.string(),
+  endDate: z.string(),
+  aggregation: z.enum(["sum", "average", "samples"]).optional()
+}, async ({ dataType, startDate, endDate, aggregation }) => {
+  // HealthKit validates the type, returns helpful error if invalid
+  const result = await healthKit.query(dataType, startDate, endDate, aggregation);
+  return { text: JSON.stringify(result, null, 2) };
+});
+```
+
+**When to Use Each Approach:**
+
+| Dynamic (Agent-Native) | Static (Constrained Agent) |
+|------------------------|---------------------------|
+| Agent should access anything user can | Agent has intentionally limited scope |
+| External API with many endpoints (HealthKit, HomeKit, GraphQL) | Internal domain with fixed operations |
+| API evolves independently of your code | Tightly coupled domain logic |
+| You want full action parity | You want strict guardrails |
+
+**The agent-native default is Dynamic.** Only use Static when you're intentionally limiting the agent's capabilities.
+
+**Complete Dynamic Pattern:**
+
+```swift
+// 1. Discovery tool: What can I access?
+tool("list_health_types", "Get available health data types") { _ in
+    let store = HKHealthStore()
+
+    let quantityTypes = HKQuantityTypeIdentifier.allCases.map { $0.rawValue }
+    let categoryTypes = HKCategoryTypeIdentifier.allCases.map { $0.rawValue }
+    let characteristicTypes = HKCharacteristicTypeIdentifier.allCases.map { $0.rawValue }
+
+    return ToolResult(text: """
+        Available HealthKit types:
+
+        ## Quantity Types (numeric values)
+        \(quantityTypes.joined(separator: ", "))
+
+        ## Category Types (categorical data)
+        \(categoryTypes.joined(separator: ", "))
+
+        ## Characteristic Types (user info)
+        \(characteristicTypes.joined(separator: ", "))
+
+        Use read_health_data or write_health_data with any of these.
+        """)
+}
+
+// 2. Generic read: Access any type by name
+tool("read_health_data", "Read any health metric", {
+    dataType: z.string().describe("Type name from list_health_types"),
+    startDate: z.string(),
+    endDate: z.string()
+}) { request in
+    // Let HealthKit validate the type name
+    guard let type = HKQuantityTypeIdentifier(rawValue: request.dataType)
+                     ?? HKCategoryTypeIdentifier(rawValue: request.dataType) else {
+        return ToolResult(
+            text: "Unknown type: \(request.dataType). Use list_health_types to see available types.",
+            isError: true
+        )
+    }
+
+    let samples = try await healthStore.querySamples(type: type, start: startDate, end: endDate)
+    return ToolResult(text: samples.formatted())
+}
+
+// 3. Context injection: Tell agent what's available in system prompt
+func buildSystemPrompt() -> String {
+    let availableTypes = healthService.getAuthorizedTypes()
+
+    return """
+    ## Available Health Data
+
+    You have access to these health metrics:
+    \(availableTypes.map { "- \($0)" }.joined(separator: "\n"))
+
+    Use read_health_data with any type above. For new types not listed,
+    use list_health_types to discover what's available.
+    """
+}
+```
+
+**Benefits:**
+- Agent can use any API capability, including ones added after your code shipped
+- API is the validator, not your enum definition
+- Smaller tool surface (2-3 tools vs N tools)
+- Agent naturally discovers capabilities by asking
+- Works with any API that has introspection (HealthKit, GraphQL, OpenAPI)
+</principle>
+
+<principle name="crud-completeness">
+## CRUD Completeness
+
+Every data type the agent can create, it should be able to read, update, and delete. Incomplete CRUD = broken action parity.
+
+**Anti-pattern: Create-only tools**
+```typescript
+// ❌ Can create but not modify or delete
+tool("create_experiment", { hypothesis, variable, metric })
+tool("write_journal_entry", { content, author, tags })
+// User: "Delete that experiment" → Agent: "I can't do that"
+```
+
+**Correct: Full CRUD for each entity**
+```typescript
+// ✅ Complete CRUD
+tool("create_experiment", { hypothesis, variable, metric })
+tool("read_experiment", { id })
+tool("update_experiment", { id, updates: { hypothesis?, status?, endDate? } })
+tool("delete_experiment", { id })
+
+tool("create_journal_entry", { content, author, tags })
+tool("read_journal", { query?, dateRange?, author? })
+tool("update_journal_entry", { id, content, tags? })
+tool("delete_journal_entry", { id })
+```
+
+**The CRUD Audit:**
+For each entity type in your app, verify:
+- [ ] Create: Agent can create new instances
+- [ ] Read: Agent can query/search/list instances
+- [ ] Update: Agent can modify existing instances
+- [ ] Delete: Agent can remove instances
+
+If any operation is missing, users will eventually ask for it and the agent will fail.
+</principle>
+
+<checklist>
+## MCP Tool Design Checklist
+
+**Fundamentals:**
+- [ ] Tool names describe capability, not use case
+- [ ] Inputs are data, not decisions
+- [ ] Outputs are rich (enough for agent to verify)
+- [ ] CRUD operations are separate tools (not one mega-tool)
+- [ ] No business logic in tool implementations
+- [ ] Error states clearly communicated via `isError`
+- [ ] Descriptions explain what the tool does, not when to use it
+
+**Dynamic Capability Discovery (for agent-native apps):**
+- [ ] For external APIs where agent should have full access, use dynamic discovery
+- [ ] Include a `list_*` or `discover_*` tool for each API surface
+- [ ] Use string inputs (not enums) when the API validates
+- [ ] Inject available capabilities into system prompt at runtime
+- [ ] Only use static tool mapping if intentionally limiting agent scope
+
+**CRUD Completeness:**
+- [ ] Every entity has create, read, update, delete operations
+- [ ] Every UI action has a corresponding agent tool
+- [ ] Test: "Can the agent undo what it just did?"
+</checklist>