[2.18.0] Add Dynamic Capability Discovery and iCloud sync patterns (#62)

* [2.17.0] Expand agent-native skill with mobile app learnings

Major expansion of agent-native-architecture skill based on real-world
learnings from building the Every Reader iOS app.

New reference documents:
- dynamic-context-injection.md: Runtime app state in system prompts
- action-parity-discipline.md: Ensuring agents can do what users can
- shared-workspace-architecture.md: Agents and users in same data space
- agent-native-testing.md: Testing patterns for agent-native apps
- mobile-patterns.md: Background execution, permissions, cost awareness

Updated references:
- architecture-patterns.md: Added Unified Agent Architecture, Agent-to-UI
  Communication, and Model Tier Selection patterns

Enhanced agent-native-reviewer with comprehensive review process covering
all new patterns, including mobile-specific verification.

Key insight: "The agent should be able to do anything the user can do,
through tools that mirror UI capabilities, with full context about the
app state."

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* [2.18.0] Add Dynamic Capability Discovery and iCloud sync patterns

New patterns in agent-native-architecture skill:

- **Dynamic Capability Discovery** - For agent-native apps integrating with
  external APIs (HealthKit, HomeKit, GraphQL), use a discovery tool (list_*)
  plus a generic access tool instead of individual tools per endpoint.
  (Note: Static mapping is fine for constrained agents with limited scope.)

- **CRUD Completeness** - Every entity needs create, read, update, AND delete.

- **iCloud File Storage** - Use iCloud Documents for shared workspace to get
  free, automatic multi-device sync without building a sync layer.

- **Architecture Review Checklist** - Pushes reviewer findings earlier into
  design phase. Covers tool design, action parity, UI integration, context.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Dan Shipper
2025-12-25 13:03:07 -05:00
committed by GitHub
parent 5a79f97374
commit 1bc6bd9164
12 changed files with 3523 additions and 60 deletions

View File

@@ -12,7 +12,7 @@
{
"name": "compound-engineering",
"description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 27 specialized agents, 19 commands, and 13 skills.",
"version": "2.15.2",
"version": "2.18.0",
"author": {
"name": "Kieran Klaassen",
"url": "https://github.com/kieranklaassen",

View File

@@ -1,6 +1,6 @@
{
"name": "compound-engineering",
"version": "2.16.0",
"version": "2.18.0",
"description": "AI-powered development tools. 27 agents, 19 commands, 13 skills, 2 MCP servers for code review, research, design, and workflow automation.",
"author": {
"name": "Kieran Klaassen",

View File

@@ -5,6 +5,58 @@ All notable changes to the compound-engineering plugin will be documented in thi
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.18.0] - 2025-12-25
### Added
- **`agent-native-architecture` skill** - Added **Dynamic Capability Discovery** pattern and **Architecture Review Checklist**:
**New Patterns in mcp-tool-design.md:**
- **Dynamic Capability Discovery** - For external APIs (HealthKit, HomeKit, GraphQL), build a discovery tool (`list_*`) that returns available capabilities at runtime, plus a generic access tool that takes strings (not enums). The API validates, not your code. This means agents can use new API capabilities without code changes.
- **CRUD Completeness** - Every entity the agent can create must also be readable, updatable, and deletable. Incomplete CRUD = broken action parity.
**New in SKILL.md:**
- **Architecture Review Checklist** - Pushes reviewer findings earlier into the design phase. Covers tool design (dynamic vs static, CRUD completeness), action parity (capability map, edit/delete), UI integration (agent → UI communication), and context injection.
- **Option 11: API Integration** - New intake option for connecting to external APIs like HealthKit, HomeKit, GraphQL
- **New anti-patterns:** Static Tool Mapping (building individual tools for each API endpoint), Incomplete CRUD (create-only tools)
- **Tool Design Criteria** section added to success criteria checklist
**New in shared-workspace-architecture.md:**
- **iCloud File Storage for Multi-Device Sync** - Use iCloud Documents for your shared workspace to get free, automatic multi-device sync without building a sync layer. Includes implementation pattern, conflict handling, entitlements, and when NOT to use it.
### Philosophy
This update codifies a key insight for **agent-native apps**: when integrating with external APIs where the agent should have the same access as the user, use **Dynamic Capability Discovery** instead of static tool mapping. Instead of building `read_steps`, `read_heart_rate`, `read_sleep`... build `list_health_types` + `read_health_data(dataType: string)`. The agent discovers what's available, the API validates the type.
Note: This pattern is specifically for agent-native apps following the "whatever the user can do, the agent can do" philosophy. For constrained agents with intentionally limited capabilities, static tool mapping may be appropriate.
---
## [2.17.0] - 2025-12-25
### Enhanced
- **`agent-native-architecture` skill** - Major expansion based on real-world learnings from building the Every Reader iOS app. Added 5 new reference documents and expanded existing ones:
**New References:**
- **dynamic-context-injection.md** - How to inject runtime app state into agent system prompts. Covers context injection patterns, what context to inject (resources, activity, capabilities, vocabulary), implementation patterns for Swift/iOS and TypeScript, and context freshness.
- **action-parity-discipline.md** - Workflow for ensuring agents can do everything users can do. Includes capability mapping templates, parity audit process, PR checklists, tool design for parity, and context parity guidelines.
- **shared-workspace-architecture.md** - Patterns for agents and users working in the same data space. Covers directory structure, file tools, UI integration (file watching, shared stores), agent-user collaboration patterns, and security considerations.
- **agent-native-testing.md** - Testing patterns for agent-native apps. Includes "Can Agent Do It?" tests, the Surprise Test, automated parity testing, integration testing, and CI/CD integration.
- **mobile-patterns.md** - Mobile-specific patterns for iOS/Android. Covers background execution (checkpoint/resume), permission handling, cost-aware design (model tiers, token budgets, network awareness), offline handling, and battery awareness.
**Updated References:**
- **architecture-patterns.md** - Added 3 new patterns: Unified Agent Architecture (one orchestrator, many agent types), Agent-to-UI Communication (shared data store, file watching, event bus), and Model Tier Selection (fast/balanced/powerful).
**Updated Skill Root:**
- **SKILL.md** - Expanded intake menu (now 10 options including context injection, action parity, shared workspace, testing, mobile patterns). Added 5 new agent-native anti-patterns (Context Starvation, Orphan Features, Sandbox Isolation, Silent Actions, Capability Hiding). Expanded success criteria with agent-native and mobile-specific checklists.
- **`agent-native-reviewer` agent** - Significantly enhanced with comprehensive review process covering all new patterns. Now checks for action parity, context parity, shared workspace, tool design (primitives vs workflows), dynamic context injection, and mobile-specific concerns. Includes detailed anti-patterns, output format template, quick checks ("Write to Location" test, Surprise test), and mobile-specific verification.
### Philosophy
These updates operationalize a key insight from building agent-native mobile apps: **"The agent should be able to do anything the user can do, through tools that mirror UI capabilities, with full context about the app state."** The failure case that prompted these changes: an agent asked "what reading feed?" when a user said "write something in my reading feed"—because it had no `publish_to_feed` tool and no context about what "feed" meant.
## [2.16.0] - 2025-12-21
### Enhanced

View File

@@ -3,89 +3,243 @@ name: agent-native-reviewer
description: Use this agent when reviewing code to ensure features are agent-native - that any action a user can take, an agent can also take, and anything a user can see, an agent can see. This enforces the principle that agents should have parity with users in capability and context. <example>Context: The user added a new feature to their application.\nuser: "I just implemented a new email filtering feature"\nassistant: "I'll use the agent-native-reviewer to verify this feature is accessible to agents"\n<commentary>New features need agent-native review to ensure agents can also filter emails, not just humans through UI.</commentary></example><example>Context: The user created a new UI workflow.\nuser: "I added a multi-step wizard for creating reports"\nassistant: "Let me check if this workflow is agent-native using the agent-native-reviewer"\n<commentary>UI workflows often miss agent accessibility - the reviewer checks for API/tool equivalents.</commentary></example>
---
You are an Agent-Native Architecture Reviewer. Your role is to ensure that every feature added to a codebase follows the agent-native principle:
# Agent-Native Architecture Reviewer
**THE FOUNDATIONAL PRINCIPLE: Whatever the user can do, the agent can do. Whatever the user can see, the agent can see.**
You are an expert reviewer specializing in agent-native application architecture. Your role is to review code, PRs, and application designs to ensure they follow agent-native principles—where agents are first-class citizens with the same capabilities as users, not bolt-on features.
## Your Review Criteria
## Core Principles You Enforce
For every new feature or change, verify:
1. **Action Parity**: Every UI action should have an equivalent agent tool
2. **Context Parity**: Agents should see the same data users see
3. **Shared Workspace**: Agents and users work in the same data space
4. **Primitives over Workflows**: Tools should be primitives, not encoded business logic
5. **Dynamic Context Injection**: System prompts should include runtime app state
### 1. Action Parity
- [ ] Every UI action has an equivalent API/tool the agent can call
- [ ] No "UI-only" workflows that require human interaction
- [ ] Agents can trigger the same business logic humans can
- [ ] No artificial limits on agent capabilities
## Review Process
### 2. Context Parity
- [ ] Data visible to users is accessible to agents (via API/tools)
- [ ] Agents can read the same context humans see
- [ ] No hidden state that only the UI can access
- [ ] Real-time data available to both humans and agents
### Step 1: Understand the Codebase
### 3. Tool Design (if applicable)
- [ ] Tools are primitives that provide capability, not behavior
- [ ] Features are defined in prompts, not hardcoded in tool logic
- [ ] Tools don't artificially constrain what agents can do
- [ ] Proper MCP tool definitions exist for new capabilities
First, explore to understand:
- What UI actions exist in the app?
- What agent tools are defined?
- How is the system prompt constructed?
- Where does the agent get its context?
### 4. API Surface
- [ ] New features exposed via API endpoints
- [ ] Consistent API patterns for agent consumption
- [ ] Proper authentication for agent access
- [ ] No rate-limiting that unfairly penalizes agents
### Step 2: Check Action Parity
## Analysis Process
For every UI action you find, verify:
- [ ] A corresponding agent tool exists
- [ ] The tool is documented in the system prompt
- [ ] The agent has access to the same data the UI uses
1. **Identify New Capabilities**: What can users now do that they couldn't before?
**Look for:**
- SwiftUI: `Button`, `onTapGesture`, `.onSubmit`, navigation actions
- React: `onClick`, `onSubmit`, form actions, navigation
- Flutter: `onPressed`, `onTap`, gesture handlers
2. **Check Agent Access**: For each capability:
- Can an agent trigger this action?
- Can an agent see the results?
- Is there a documented way for agents to use this?
**Create a capability map:**
```
| UI Action | Location | Agent Tool | System Prompt | Status |
|-----------|----------|------------|---------------|--------|
```
3. **Find Gaps**: List any capabilities that are human-only
### Step 3: Check Context Parity
4. **Recommend Solutions**: For each gap, suggest how to make it agent-native
Verify the system prompt includes:
- [ ] Available resources (books, files, data the user can see)
- [ ] Recent activity (what the user has done)
- [ ] Capabilities mapping (what tool does what)
- [ ] Domain vocabulary (app-specific terms explained)
## Output Format
**Red flags:**
- Static system prompts with no runtime context
- Agent doesn't know what resources exist
- Agent doesn't understand app-specific terms
Provide findings in this structure:
### Step 4: Check Tool Design
For each tool, verify:
- [ ] Tool is a primitive (read, write, store), not a workflow
- [ ] Inputs are data, not decisions
- [ ] No business logic in the tool implementation
- [ ] Rich output that helps agent verify success
**Red flags:**
```typescript
// BAD: Tool encodes business logic
tool("process_feedback", async ({ message }) => {
const category = categorize(message); // Logic in tool
const priority = calculatePriority(message); // Logic in tool
if (priority > 3) await notify(); // Decision in tool
});
// GOOD: Tool is a primitive
tool("store_item", async ({ key, value }) => {
await db.set(key, value);
return { text: `Stored ${key}` };
});
```
### Step 5: Check Shared Workspace
Verify:
- [ ] Agents and users work in the same data space
- [ ] Agent file operations use the same paths as the UI
- [ ] UI observes changes the agent makes (file watching or shared store)
- [ ] No separate "agent sandbox" isolated from user data
**Red flags:**
- Agent writes to `agent_output/` instead of user's documents
- Sync layer needed to move data between agent and user spaces
- User can't inspect or edit agent-created files
## Common Anti-Patterns to Flag
### 1. Context Starvation
Agent doesn't know what resources exist.
```
User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand."
```
**Fix:** Inject available resources and capabilities into system prompt.
### 2. Orphan Features
UI action with no agent equivalent.
```swift
// UI has this button
Button("Publish to Feed") { publishToFeed(insight) }
// But no tool exists for agent to do the same
// Agent can't help user publish to feed
```
**Fix:** Add corresponding tool and document in system prompt.
### 3. Sandbox Isolation
Agent works in separate data space from user.
```
Documents/
├── user_files/ ← User's space
└── agent_output/ ← Agent's space (isolated)
```
**Fix:** Use shared workspace architecture.
### 4. Silent Actions
Agent changes state but UI doesn't update.
```typescript
// Agent writes to feed
await feedService.add(item);
// But UI doesn't observe feedService
// User doesn't see the new item until refresh
```
**Fix:** Use shared data store with reactive binding, or file watching.
### 5. Capability Hiding
Users can't discover what agents can do.
```
User: "Can you help me with my reading?"
Agent: "Sure, what would you like help with?"
// Agent doesn't mention it can publish to feed, research books, etc.
```
**Fix:** Add capability hints to agent responses, or onboarding.
### 6. Workflow Tools
Tools that encode business logic instead of being primitives.
**Fix:** Extract primitives, move logic to system prompt.
### 7. Decision Inputs
Tools that accept decisions instead of data.
```typescript
// BAD: Tool accepts decision
tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) })
// GOOD: Agent decides, tool just writes
tool("write_file", { path: z.string(), content: z.string() })
```
## Review Output Format
Structure your review as:
```markdown
## Agent-Native Review
## Agent-Native Architecture Review
### New Capabilities Identified
- [List what the PR/changes add]
### Summary
[One paragraph assessment of agent-native compliance]
### Agent Accessibility Check
### Capability Map
| Capability | User Access | Agent Access | Gap? |
|------------|-------------|--------------|------|
| [Feature 1] | UI button | API endpoint | No |
| [Feature 2] | Modal form | None | YES |
| UI Action | Location | Agent Tool | Prompt Ref | Status |
|-----------|----------|------------|------------|--------|
| ... | ... | ... | ... | ✅/⚠️/❌ |
### Gaps Found
1. **[Gap Name]**: [Description of what users can do but agents cannot]
- **Impact**: [Why this matters]
- **Recommendation**: [How to fix]
### Findings
#### Critical Issues (Must Fix)
1. **[Issue Name]**: [Description]
- Location: [file:line]
- Impact: [What breaks]
- Fix: [How to fix]
#### Warnings (Should Fix)
1. **[Issue Name]**: [Description]
- Location: [file:line]
- Recommendation: [How to improve]
#### Observations (Consider)
1. **[Observation]**: [Description and suggestion]
### Recommendations
1. [Prioritized list of improvements]
2. ...
### What's Working Well
- [Positive observations about agent-native patterns in use]
### Agent-Native Score
- **X/Y capabilities are agent-accessible**
- **Verdict**: [PASS/NEEDS WORK]
```
## Common Anti-Patterns to Flag
## Review Triggers
1. **UI-Only Features**: Actions that only work through clicks/forms
2. **Hidden Context**: Data shown in UI but not in API responses
3. **Workflow Lock-in**: Multi-step processes that require human navigation
4. **Hardcoded Limits**: Artificial restrictions on agent actions
5. **Missing Tools**: No MCP tool definition for new capabilities
6. **Behavior-Encoding Tools**: Tools that decide HOW to do things instead of providing primitives
Use this review when:
- PRs add new UI features (check for tool parity)
- PRs add new agent tools (check for proper design)
- PRs modify system prompts (check for completeness)
- Periodic architecture audits
- User reports agent confusion ("agent didn't understand X")
## Remember
## Quick Checks
The goal is not to add overhead - it's to ensure agents are first-class citizens. Many times, making something agent-native actually simplifies the architecture because you're building a clean API that both UI and agents consume.
### The "Write to Location" Test
Ask: "If a user said 'write something to [location]', would the agent know how?"
When reviewing, ask: "Could an autonomous agent use this feature to help the user, or are we forcing humans to do it manually?"
For every noun in your app (feed, library, profile, settings), the agent should:
1. Know what it is (context injection)
2. Have a tool to interact with it (action parity)
3. Be documented in the system prompt (discoverability)
### The Surprise Test
Ask: "If given an open-ended request, can the agent figure out a creative approach?"
Good agents use available tools creatively. If the agent can only do exactly what you hardcoded, you have workflow tools instead of primitives.
## Mobile-Specific Checks
For iOS/Android apps, also verify:
- [ ] Background execution handling (checkpoint/resume)
- [ ] Permission requests in tools (photo library, files, etc.)
- [ ] Cost-aware design (batch calls, defer to WiFi)
- [ ] Offline graceful degradation
## Questions to Ask During Review
1. "Can the agent do everything the user can do?"
2. "Does the agent know what resources exist?"
3. "Can users inspect and edit agent work?"
4. "Are tools primitives or workflows?"
5. "Would a new feature require a new tool, or just a prompt update?"
6. "If this fails, how does the agent (and user) know?"

View File

@@ -65,6 +65,12 @@ What aspect of agent native architecture do you need help with?
3. **Write system prompts** - Define agent behavior in prompts
4. **Self-modification** - Enable agents to safely evolve themselves
5. **Review/refactor** - Make existing code more prompt-native
6. **Context injection** - Inject runtime app state into agent prompts
7. **Action parity** - Ensure agents can do everything users can do
8. **Shared workspace** - Set up agents and users in the same data space
9. **Testing** - Test agent-native apps for capability and parity
10. **Mobile patterns** - Handle background execution, permissions, cost
11. **API integration** - Connect to external APIs (HealthKit, HomeKit, GraphQL)
**Wait for response before proceeding.**
</intake>
@@ -72,15 +78,55 @@ What aspect of agent native architecture do you need help with?
<routing>
| Response | Action |
|----------|--------|
| 1, "design", "architecture", "plan" | Read [architecture-patterns.md](./references/architecture-patterns.md) |
| 1, "design", "architecture", "plan" | Read [architecture-patterns.md](./references/architecture-patterns.md), then apply Architecture Checklist below |
| 2, "tool", "mcp", "primitive" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) |
| 3, "prompt", "system prompt", "behavior" | Read [system-prompt-design.md](./references/system-prompt-design.md) |
| 4, "self-modify", "evolve", "git" | Read [self-modification.md](./references/self-modification.md) |
| 5, "review", "refactor", "existing" | Read [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) |
| 6, "context", "inject", "runtime", "dynamic" | Read [dynamic-context-injection.md](./references/dynamic-context-injection.md) |
| 7, "parity", "ui action", "capability map" | Read [action-parity-discipline.md](./references/action-parity-discipline.md) |
| 8, "workspace", "shared", "files", "filesystem" | Read [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) |
| 9, "test", "testing", "verify", "validate" | Read [agent-native-testing.md](./references/agent-native-testing.md) |
| 10, "mobile", "ios", "android", "background" | Read [mobile-patterns.md](./references/mobile-patterns.md) |
| 11, "api", "healthkit", "homekit", "graphql", "external" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) (Dynamic Capability Discovery section) |
**After reading the reference, apply those patterns to the user's specific context.**
</routing>
<architecture_checklist>
## Architecture Review Checklist (Apply During Design)
When designing an agent-native system, verify these **before implementation**:
### Tool Design
- [ ] **Dynamic vs Static:** For external APIs where agent should have full user-level access (HealthKit, HomeKit, GraphQL), use Dynamic Capability Discovery. Only use static mapping if intentionally limiting agent scope.
- [ ] **CRUD Completeness:** Every entity has create, read, update, AND delete tools
- [ ] **Primitives not Workflows:** Tools enable capability, they don't encode business logic
- [ ] **API as Validator:** Use `z.string()` inputs when the API validates, not `z.enum()`
### Action Parity
- [ ] **Capability Map:** Every UI action has a corresponding agent tool
- [ ] **Edit/Delete:** If UI can edit or delete, agent must be able to too
- [ ] **The Write Test:** "Write something to [app location]" must work for all locations
### UI Integration
- [ ] **Agent → UI:** Define how agent changes reflect in UI (shared service, file watching, or event bus)
- [ ] **No Silent Actions:** Agent writes should trigger UI updates immediately
- [ ] **Capability Discovery:** Users can learn what agent can do (onboarding, hints)
### Context Injection
- [ ] **Available Resources:** System prompt includes what exists (files, data, types)
- [ ] **Available Capabilities:** System prompt documents what agent can do with user vocabulary
- [ ] **Dynamic Context:** Context refreshes for long sessions (or provide `refresh_context` tool)
### Mobile (if applicable)
- [ ] **Background Execution:** Checkpoint/resume pattern for iOS app suspension
- [ ] **Permissions:** Just-in-time permission requests in tools
- [ ] **Cost Awareness:** Model tier selection (Haiku/Sonnet/Opus)
**When designing architecture, explicitly address each checkbox in your plan.**
</architecture_checklist>
<quick_start>
Build a prompt-native agent in three steps:
@@ -123,11 +169,19 @@ query({
All references in `references/`:
**Core Patterns:**
- **Architecture:** [architecture-patterns.md](./references/architecture-patterns.md)
- **Tool Design:** [mcp-tool-design.md](./references/mcp-tool-design.md)
- **Tool Design:** [mcp-tool-design.md](./references/mcp-tool-design.md) - includes Dynamic Capability Discovery, CRUD Completeness
- **Prompts:** [system-prompt-design.md](./references/system-prompt-design.md)
- **Self-Modification:** [self-modification.md](./references/self-modification.md)
- **Refactoring:** [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md)
**Agent-Native Disciplines:**
- **Context Injection:** [dynamic-context-injection.md](./references/dynamic-context-injection.md)
- **Action Parity:** [action-parity-discipline.md](./references/action-parity-discipline.md)
- **Shared Workspace:** [shared-workspace-architecture.md](./references/shared-workspace-architecture.md)
- **Testing:** [agent-native-testing.md](./references/agent-native-testing.md)
- **Mobile Patterns:** [mobile-patterns.md](./references/mobile-patterns.md)
</reference_index>
<anti_patterns>
@@ -186,11 +240,80 @@ each under 20 words, formatted with em-dashes...
// Right - define outcome, trust intelligence
Create clear, useful summaries. Use your judgment.
```
### Agent-Native Anti-Patterns
**Context Starvation**
Agent doesn't know what resources exist in the app.
```
User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand what system you're referring to."
```
Fix: Inject available resources, capabilities, and vocabulary into the system prompt at runtime.
**Orphan Features**
UI action with no agent equivalent.
```swift
// UI has a "Publish to Feed" button
Button("Publish") { publishToFeed(insight) }
// But no agent tool exists to do the same thing
```
Fix: Add corresponding tool and document in system prompt for every UI action.
**Sandbox Isolation**
Agent works in separate data space from user.
```
Documents/
├── user_files/ ← User's space
└── agent_output/ ← Agent's space (isolated)
```
Fix: Use shared workspace where both agent and user operate on the same files.
**Silent Actions**
Agent changes state but UI doesn't update.
```typescript
// Agent writes to database
await db.insert("feed", content);
// But UI doesn't observe this table - user sees nothing
```
Fix: Use shared data stores with reactive binding, or file system observation.
**Capability Hiding**
Users can't discover what agents can do.
```
User: "Help me with my reading"
Agent: "What would you like help with?"
// Agent doesn't mention it can publish to feed, research books, etc.
```
Fix: Include capability hints in agent responses or provide onboarding.
**Static Tool Mapping (for agent-native apps)**
Building individual tools for each API endpoint when you want the agent to have full access.
```typescript
// You built 50 tools for 50 HealthKit types
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required
// Agent can only access what you anticipated
```
Fix: Use Dynamic Capability Discovery - one `list_*` tool to discover what's available, one generic tool to access any type. See [mcp-tool-design.md](./references/mcp-tool-design.md). (Note: Static mapping is fine for constrained agents with intentionally limited scope.)
**Incomplete CRUD**
Agent can create but not update or delete.
```typescript
// ❌ User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...)
// Missing: update_journal_entry, delete_journal_entry
```
Fix: Every entity needs full CRUD (Create, Read, Update, Delete). The CRUD Audit: for each entity, verify all four operations exist.
</anti_patterns>
<success_criteria>
You've built a prompt-native agent when:
**Core Prompt-Native Criteria:**
- [ ] The agent figures out HOW to achieve outcomes, not just calls your functions
- [ ] Whatever a user could do, the agent can do (no artificial limits)
- [ ] Features are prompts that define outcomes, not code that defines workflows
@@ -198,4 +321,25 @@ You've built a prompt-native agent when:
- [ ] Changing behavior means editing prose, not refactoring code
- [ ] The agent can surprise you with clever approaches you didn't anticipate
- [ ] You could add a new feature by writing a new prompt section, not new code
**Tool Design Criteria:**
- [ ] External APIs (where agent should have full access) use Dynamic Capability Discovery
- [ ] Every entity has full CRUD (Create, Read, Update, Delete)
- [ ] API validates inputs, not your enum definitions
- [ ] Discovery tools exist for each API surface (`list_*`, `discover_*`)
**Agent-Native Criteria:**
- [ ] System prompt includes dynamic context about app state (available resources, recent activity)
- [ ] Every UI action has a corresponding agent tool (action parity)
- [ ] Agent tools are documented in the system prompt with user vocabulary
- [ ] Agent and user work in the same data space (shared workspace)
- [ ] Agent actions are immediately reflected in the UI (shared service, file watching, or event bus)
- [ ] The "write something to [app location]" test passes for all locations
- [ ] Users can discover what the agent can do (capability hints, onboarding)
- [ ] Context refreshes for long sessions (or `refresh_context` tool exists)
**Mobile-Specific Criteria (if applicable):**
- [ ] Background execution handling implemented (checkpoint/resume)
- [ ] Permission requests handled gracefully in tools
- [ ] Cost-aware design (appropriate model tiers, batching)
</success_criteria>

View File

@@ -0,0 +1,409 @@
<overview>
A structured discipline for ensuring agents can do everything users can do. Every UI action should have an equivalent agent tool. This isn't a one-time check—it's an ongoing practice integrated into your development workflow.
**Core principle:** When adding a UI feature, add the corresponding tool in the same PR.
</overview>
<why_parity>
## Why Action Parity Matters
**The failure case:**
```
User: "Write something about Catherine the Great in my reading feed"
Agent: "What system are you referring to? I'm not sure what reading feed means."
```
The user could publish to their feed through the UI. But the agent had no `publish_to_feed` tool. The fix was simple—add the tool. But the insight is profound:
**Every action a user can take through the UI must have an equivalent tool the agent can call.**
Without this parity:
- Users ask agents to do things they can't do
- Agents ask clarifying questions about features they should understand
- The agent feels limited compared to direct app usage
- Users lose trust in the agent's capabilities
</why_parity>
<capability_mapping>
## The Capability Map
Maintain a structured map of UI actions to agent tools:
| UI Action | UI Location | Agent Tool | System Prompt Reference |
|-----------|-------------|------------|-------------------------|
| View library | Library tab | `read_library` | "View books and highlights" |
| Add book | Library → Add | `add_book` | "Add books to library" |
| Publish insight | Analysis view | `publish_to_feed` | "Create insights for Feed tab" |
| Start research | Book detail | `start_research` | "Research books via web search" |
| Edit profile | Settings | `write_file(profile.md)` | "Update reading profile" |
| Take screenshot | Camera | N/A (user action) | — |
| Search web | Chat | `web_search` | "Search the internet" |
**Update this table whenever adding features.**
### Template for Your App
```markdown
# Capability Map - [Your App Name]
| UI Action | UI Location | Agent Tool | System Prompt | Status |
|-----------|-------------|------------|---------------|--------|
| | | | | ⚠️ Missing |
| | | | | ✅ Done |
| | | | | 🚫 N/A |
```
Status meanings:
- ✅ Done: Tool exists and is documented in system prompt
- ⚠️ Missing: UI action exists but no agent equivalent
- 🚫 N/A: User-only action (e.g., biometric auth, camera capture)
</capability_mapping>
<parity_workflow>
## The Action Parity Workflow
### When Adding a New Feature
Before merging any PR that adds UI functionality:
```
1. What action is this?
→ "User can publish an insight to their reading feed"
2. Does an agent tool exist for this?
→ Check tool definitions
→ If NO: Create the tool
3. Is it documented in the system prompt?
→ Check system prompt capabilities section
→ If NO: Add documentation
4. Is the context available?
→ Does agent know what "feed" means?
→ Does agent see available books?
→ If NO: Add to context injection
5. Update the capability map
→ Add row to tracking document
```
### PR Checklist
Add to your PR template:
```markdown
## Agent-Native Checklist
- [ ] Every new UI action has a corresponding agent tool
- [ ] System prompt updated to mention new capability
- [ ] Agent has access to same data UI uses
- [ ] Capability map updated
- [ ] Tested with natural language request
```
</parity_workflow>
<parity_audit>
## The Parity Audit
Periodically audit your app for action parity gaps:
### Step 1: List All UI Actions
Walk through every screen and list what users can do:
```
Library Screen:
- View list of books
- Search books
- Filter by category
- Add new book
- Delete book
- Open book detail
Book Detail Screen:
- View book info
- Start research
- View highlights
- Add highlight
- Share book
- Remove from library
Feed Screen:
- View insights
- Create new insight
- Edit insight
- Delete insight
- Share insight
Settings:
- Edit profile
- Change theme
- Export data
- Delete account
```
### Step 2: Check Tool Coverage
For each action, verify:
```
✅ View list of books → read_library
✅ Search books → read_library (with query param)
⚠️ Filter by category → MISSING (add filter param to read_library)
⚠️ Add new book → MISSING (need add_book tool)
✅ Delete book → delete_book
✅ Open book detail → read_library (single book)
✅ Start research → start_research
✅ View highlights → read_library (includes highlights)
⚠️ Add highlight → MISSING (need add_highlight tool)
⚠️ Share book → MISSING (or N/A if sharing is UI-only)
✅ View insights → read_library (includes feed)
✅ Create new insight → publish_to_feed
⚠️ Edit insight → MISSING (need update_feed_item tool)
⚠️ Delete insight → MISSING (need delete_feed_item tool)
```
### Step 3: Prioritize Gaps
Not all gaps are equal:
**High priority (users will ask for this):**
- Add new book
- Create/edit/delete content
- Core workflow actions
**Medium priority (occasional requests):**
- Filter/search variations
- Export functionality
- Sharing features
**Low priority (rarely requested via agent):**
- Theme changes
- Account deletion
- Settings that are UI-preference
</parity_audit>
<tool_design_for_parity>
## Designing Tools for Parity
### Match Tool Granularity to UI Granularity
If the UI has separate buttons for "Edit" and "Delete", consider separate tools:
```typescript
// Matches UI granularity
tool("update_feed_item", { id, content, headline }, ...);
tool("delete_feed_item", { id }, ...);
// vs. combined (harder for agent to discover)
tool("modify_feed_item", { id, action: "update" | "delete", ... }, ...);
```
### Use User Vocabulary in Tool Names
```typescript
// Good: Matches what users say
tool("publish_to_feed", ...); // "publish to my feed"
tool("add_book", ...); // "add this book"
tool("start_research", ...); // "research this"
// Bad: Technical jargon
tool("create_analysis_record", ...);
tool("insert_library_item", ...);
tool("initiate_web_scrape_workflow", ...);
```
### Return What the UI Shows
If the UI shows a confirmation with details, the tool should too:
```typescript
// UI shows: "Added 'Moby Dick' to your library"
// Tool should return the same:
tool("add_book", async ({ title, author }) => {
const book = await library.add({ title, author });
return {
text: `Added "${book.title}" by ${book.author} to your library (id: ${book.id})`
};
});
```
</tool_design_for_parity>
<context_parity>
## Context Parity
Whatever the user sees, the agent should be able to access.
### The Problem
```swift
// UI shows recent analyses in a list
ForEach(analysisRecords) { record in
AnalysisRow(record: record)
}
// But system prompt only mentions books, not analyses
let systemPrompt = """
## Available Books
\(books.map { $0.title })
// Missing: recent analyses!
"""
```
The user sees their reading journal. The agent doesn't. This creates a disconnect.
### The Fix
```swift
// System prompt includes what UI shows
let systemPrompt = """
## Available Books
\(books.map { "- \($0.title)" }.joined(separator: "\n"))
## Recent Reading Journal
\(analysisRecords.prefix(10).map { "- \($0.summary)" }.joined(separator: "\n"))
"""
```
### Context Parity Checklist
For each screen in your app:
- [ ] What data does this screen display?
- [ ] Is that data available to the agent?
- [ ] Can the agent access the same level of detail?
</context_parity>
<continuous_parity>
## Maintaining Parity Over Time
### Git Hooks / CI Checks
```bash
#!/bin/bash
# pre-commit hook: check for new UI actions without tools
# Find new SwiftUI Button/onTapGesture additions
NEW_ACTIONS=$(git diff --cached --name-only | xargs grep -l "Button\|onTapGesture")
if [ -n "$NEW_ACTIONS" ]; then
echo "⚠️ New UI actions detected. Did you add corresponding agent tools?"
echo "Files: $NEW_ACTIONS"
echo ""
echo "Checklist:"
echo " [ ] Agent tool exists for new action"
echo " [ ] System prompt documents new capability"
echo " [ ] Capability map updated"
fi
```
### Automated Parity Testing
```typescript
// parity.test.ts
describe('Action Parity', () => {
const capabilityMap = loadCapabilityMap();
for (const [action, toolName] of Object.entries(capabilityMap)) {
if (toolName === 'N/A') continue;
test(`${action} has agent tool: ${toolName}`, () => {
expect(agentTools.map(t => t.name)).toContain(toolName);
});
test(`${toolName} is documented in system prompt`, () => {
expect(systemPrompt).toContain(toolName);
});
}
});
```
### Regular Audits
Schedule periodic reviews:
```markdown
## Monthly Parity Audit
1. Review all PRs merged this month
2. Check each for new UI actions
3. Verify tool coverage
4. Update capability map
5. Test with natural language requests
```
</continuous_parity>
<examples>
## Real Example: The Feed Gap
**Before:** Every Reader had a feed where insights appeared, but no agent tool to publish there.
```
User: "Write something about Catherine the Great in my reading feed"
Agent: "I'm not sure what system you're referring to. Could you clarify?"
```
**Diagnosis:**
- ✅ UI action: User can publish insights from the analysis view
- ❌ Agent tool: No `publish_to_feed` tool
- ❌ System prompt: No mention of "feed" or how to publish
- ❌ Context: Agent didn't know what "feed" meant
**Fix:**
```swift
// 1. Add the tool
tool("publish_to_feed",
"Publish an insight to the user's reading feed",
{
bookId: z.string().describe("Book ID"),
content: z.string().describe("The insight content"),
headline: z.string().describe("A punchy headline")
},
async ({ bookId, content, headline }) => {
await feedService.publish({ bookId, content, headline });
return { text: `Published "${headline}" to your reading feed` };
}
);
// 2. Update system prompt
"""
## Your Capabilities
- **Publish to Feed**: Create insights that appear in the Feed tab using `publish_to_feed`.
Include a book_id, content, and a punchy headline.
"""
// 3. Add to context injection
"""
When the user mentions "the feed" or "reading feed", they mean the Feed tab
where insights appear. Use `publish_to_feed` to create content there.
"""
```
**After:**
```
User: "Write something about Catherine the Great in my reading feed"
Agent: [Uses publish_to_feed to create insight]
"Done! I've published 'The Enlightened Empress' to your reading feed."
```
</examples>
<checklist>
## Action Parity Checklist
For every PR with UI changes:
- [ ] Listed all new UI actions
- [ ] Verified agent tool exists for each action
- [ ] Updated system prompt with new capabilities
- [ ] Added to capability map
- [ ] Tested with natural language request
For periodic audits:
- [ ] Walked through every screen
- [ ] Listed all possible user actions
- [ ] Checked tool coverage for each
- [ ] Prioritized gaps by likelihood of user request
- [ ] Created issues for high-priority gaps
</checklist>

View File

@@ -0,0 +1,582 @@
<overview>
Testing agent-native apps requires different approaches than traditional unit testing. You're testing whether the agent achieves outcomes, not whether it calls specific functions. This guide provides concrete testing patterns for verifying your app is truly agent-native.
</overview>
<testing_philosophy>
## Testing Philosophy
### Test Outcomes, Not Procedures
**Traditional (procedure-focused):**
```typescript
// Testing that a specific function was called with specific args
expect(mockProcessFeedback).toHaveBeenCalledWith({
message: "Great app!",
category: "praise",
priority: 2
});
```
**Agent-native (outcome-focused):**
```typescript
// Testing that the outcome was achieved
const result = await agent.process("Great app!");
const storedFeedback = await db.feedback.getLatest();
expect(storedFeedback.content).toContain("Great app");
expect(storedFeedback.importance).toBeGreaterThanOrEqual(1);
expect(storedFeedback.importance).toBeLessThanOrEqual(5);
// We don't care exactly how it categorized—just that it's reasonable
```
### Accept Variability
Agents may solve problems differently each time. Your tests should:
- Verify the end state, not the path
- Accept reasonable ranges, not exact values
- Check for presence of required elements, not exact format
</testing_philosophy>
<can_agent_do_it_test>
## The "Can Agent Do It?" Test
For each UI feature, write a test prompt and verify the agent can accomplish it.
### Template
```typescript
describe('Agent Capability Tests', () => {
test('Agent can add a book to library', async () => {
const result = await agent.chat("Add 'Moby Dick' by Herman Melville to my library");
// Verify outcome
const library = await libraryService.getBooks();
const mobyDick = library.find(b => b.title.includes("Moby Dick"));
expect(mobyDick).toBeDefined();
expect(mobyDick.author).toContain("Melville");
});
test('Agent can publish to feed', async () => {
// Setup: ensure a book exists
await libraryService.addBook({ id: "book_123", title: "1984" });
const result = await agent.chat("Write something about surveillance themes in my feed");
// Verify outcome
const feed = await feedService.getItems();
const newItem = feed.find(item => item.bookId === "book_123");
expect(newItem).toBeDefined();
expect(newItem.content.toLowerCase()).toMatch(/surveillance|watching|control/);
});
test('Agent can search and save research', async () => {
await libraryService.addBook({ id: "book_456", title: "Moby Dick" });
const result = await agent.chat("Research whale symbolism in Moby Dick");
// Verify files were created
const files = await fileService.listFiles("Research/book_456/");
expect(files.length).toBeGreaterThan(0);
// Verify content is relevant
const content = await fileService.readFile(files[0]);
expect(content.toLowerCase()).toMatch(/whale|symbolism|melville/);
});
});
```
### The "Write to Location" Test
A key litmus test: can the agent create content in specific app locations?
```typescript
describe('Location Awareness Tests', () => {
const locations = [
{ userPhrase: "my reading feed", expectedTool: "publish_to_feed" },
{ userPhrase: "my library", expectedTool: "add_book" },
{ userPhrase: "my research folder", expectedTool: "write_file" },
{ userPhrase: "my profile", expectedTool: "write_file" },
];
for (const { userPhrase, expectedTool } of locations) {
test(`Agent knows how to write to "${userPhrase}"`, async () => {
const prompt = `Write a test note to ${userPhrase}`;
const result = await agent.chat(prompt);
// Check that agent used the right tool (or achieved the outcome)
expect(result.toolCalls).toContainEqual(
expect.objectContaining({ name: expectedTool })
);
// Or verify outcome directly
// expect(await locationHasNewContent(userPhrase)).toBe(true);
});
}
});
```
</can_agent_do_it_test>
<surprise_test>
## The "Surprise Test"
A well-designed agent-native app lets the agent figure out creative approaches. Test this by giving open-ended requests.
### The Test
```typescript
describe('Agent Creativity Tests', () => {
test('Agent can handle open-ended requests', async () => {
// Setup: user has some books
await libraryService.addBook({ id: "1", title: "1984", author: "Orwell" });
await libraryService.addBook({ id: "2", title: "Brave New World", author: "Huxley" });
await libraryService.addBook({ id: "3", title: "Fahrenheit 451", author: "Bradbury" });
// Open-ended request
const result = await agent.chat("Help me organize my reading for next month");
// The agent should do SOMETHING useful
// We don't specify exactly what—that's the point
expect(result.toolCalls.length).toBeGreaterThan(0);
// It should have engaged with the library
const libraryTools = ["read_library", "write_file", "publish_to_feed"];
const usedLibraryTool = result.toolCalls.some(
call => libraryTools.includes(call.name)
);
expect(usedLibraryTool).toBe(true);
});
test('Agent finds creative solutions', async () => {
// Don't specify HOW to accomplish the task
const result = await agent.chat(
"I want to understand the dystopian themes across my sci-fi books"
);
// Agent might:
// - Read all books and create a comparison document
// - Research dystopian literature and relate it to user's books
// - Create a mind map in a markdown file
// - Publish a series of insights to the feed
// We just verify it did something substantive
expect(result.response.length).toBeGreaterThan(100);
expect(result.toolCalls.length).toBeGreaterThan(0);
});
});
```
### What Failure Looks Like
```typescript
// FAILURE: Agent can only say it can't do that
const result = await agent.chat("Help me prepare for a book club discussion");
// Bad outcome:
expect(result.response).not.toContain("I can't");
expect(result.response).not.toContain("I don't have a tool");
expect(result.response).not.toContain("Could you clarify");
// If the agent asks for clarification on something it should understand,
// you have a context injection or capability gap
```
</surprise_test>
<parity_testing>
## Automated Parity Testing
Ensure every UI action has an agent equivalent.
### Capability Map Testing
```typescript
// capability-map.ts
export const capabilityMap = {
// UI Action: Agent Tool
"View library": "read_library",
"Add book": "add_book",
"Delete book": "delete_book",
"Publish insight": "publish_to_feed",
"Start research": "start_research",
"View highlights": "read_library", // same tool, different query
"Edit profile": "write_file",
"Search web": "web_search",
"Export data": "N/A", // UI-only action
};
// parity.test.ts
import { capabilityMap } from './capability-map';
import { getAgentTools } from './agent-config';
import { getSystemPrompt } from './system-prompt';
describe('Action Parity', () => {
const agentTools = getAgentTools();
const systemPrompt = getSystemPrompt();
for (const [uiAction, toolName] of Object.entries(capabilityMap)) {
if (toolName === 'N/A') continue;
test(`"${uiAction}" has agent tool: ${toolName}`, () => {
const toolNames = agentTools.map(t => t.name);
expect(toolNames).toContain(toolName);
});
test(`${toolName} is documented in system prompt`, () => {
expect(systemPrompt).toContain(toolName);
});
}
});
```
### Context Parity Testing
```typescript
describe('Context Parity', () => {
test('Agent sees all data that UI shows', async () => {
// Setup: create some data
await libraryService.addBook({ id: "1", title: "Test Book" });
await feedService.addItem({ id: "f1", content: "Test insight" });
// Get system prompt (which includes context)
const systemPrompt = await buildSystemPrompt();
// Verify data is included
expect(systemPrompt).toContain("Test Book");
expect(systemPrompt).toContain("Test insight");
});
test('Recent activity is visible to agent', async () => {
// Perform some actions
await activityService.log({ action: "highlighted", bookId: "1" });
await activityService.log({ action: "researched", bookId: "2" });
const systemPrompt = await buildSystemPrompt();
// Verify activity is included
expect(systemPrompt).toMatch(/highlighted|researched/);
});
});
```
</parity_testing>
<integration_testing>
## Integration Testing
Test the full flow from user request to outcome.
### End-to-End Flow Tests
```typescript
describe('End-to-End Flows', () => {
test('Research flow: request → web search → file creation', async () => {
// Setup
const bookId = "book_123";
await libraryService.addBook({ id: bookId, title: "Moby Dick" });
// User request
await agent.chat("Research the historical context of whaling in Moby Dick");
// Verify: web search was performed
const searchCalls = mockWebSearch.mock.calls;
expect(searchCalls.length).toBeGreaterThan(0);
expect(searchCalls.some(call =>
call[0].query.toLowerCase().includes("whaling")
)).toBe(true);
// Verify: files were created
const researchFiles = await fileService.listFiles(`Research/${bookId}/`);
expect(researchFiles.length).toBeGreaterThan(0);
// Verify: content is relevant
const content = await fileService.readFile(researchFiles[0]);
expect(content.toLowerCase()).toMatch(/whale|whaling|nantucket|melville/);
});
test('Publish flow: request → tool call → feed update → UI reflects', async () => {
// Setup
await libraryService.addBook({ id: "book_1", title: "1984" });
// Initial state
const feedBefore = await feedService.getItems();
// User request
await agent.chat("Write something about Big Brother for my reading feed");
// Verify feed updated
const feedAfter = await feedService.getItems();
expect(feedAfter.length).toBe(feedBefore.length + 1);
// Verify content
const newItem = feedAfter.find(item =>
!feedBefore.some(old => old.id === item.id)
);
expect(newItem).toBeDefined();
expect(newItem.content.toLowerCase()).toMatch(/big brother|surveillance|watching/);
});
});
```
### Failure Recovery Tests
```typescript
describe('Failure Recovery', () => {
test('Agent handles missing book gracefully', async () => {
const result = await agent.chat("Tell me about 'Nonexistent Book'");
// Agent should not crash
expect(result.error).toBeUndefined();
// Agent should acknowledge the issue
expect(result.response.toLowerCase()).toMatch(
/not found|don't see|can't find|library/
);
});
test('Agent recovers from API failure', async () => {
// Mock API failure
mockWebSearch.mockRejectedValueOnce(new Error("Network error"));
const result = await agent.chat("Research this topic");
// Agent should handle gracefully
expect(result.error).toBeUndefined();
expect(result.response).not.toContain("unhandled exception");
// Agent should communicate the issue
expect(result.response.toLowerCase()).toMatch(
/couldn't search|unable to|try again/
);
});
});
```
</integration_testing>
<snapshot_testing>
## Snapshot Testing for System Prompts
Track changes to system prompts and context injection over time.
```typescript
describe('System Prompt Stability', () => {
test('System prompt structure matches snapshot', async () => {
const systemPrompt = await buildSystemPrompt();
// Extract structure (removing dynamic data)
const structure = systemPrompt
.replace(/id: \w+/g, 'id: [ID]')
.replace(/"[^"]+"/g, '"[TITLE]"')
.replace(/\d{4}-\d{2}-\d{2}/g, '[DATE]');
expect(structure).toMatchSnapshot();
});
test('All capability sections are present', async () => {
const systemPrompt = await buildSystemPrompt();
const requiredSections = [
"Your Capabilities",
"Available Books",
"Recent Activity",
];
for (const section of requiredSections) {
expect(systemPrompt).toContain(section);
}
});
});
```
</snapshot_testing>
<manual_testing>
## Manual Testing Checklist
Some things are best tested manually during development:
### Natural Language Variation Test
Try multiple phrasings for the same request:
```
"Add this to my feed"
"Write something in my reading feed"
"Publish an insight about this"
"Put this in the feed"
"I want this in my feed"
```
All should work if context injection is correct.
### Edge Case Prompts
```
"What can you do?"
→ Agent should describe capabilities
"Help me with my books"
→ Agent should engage with library, not ask what "books" means
"Write something"
→ Agent should ask WHERE (feed, file, etc.) if not clear
"Delete everything"
→ Agent should confirm before destructive actions
```
### Confusion Test
Ask about things that should exist but might not be properly connected:
```
"What's in my research folder?"
→ Should list files, not ask "what research folder?"
"Show me my recent reading"
→ Should show activity, not ask "what do you mean?"
"Continue where I left off"
→ Should reference recent activity if available
```
</manual_testing>
<ci_integration>
## CI/CD Integration
Add agent-native tests to your CI pipeline:
```yaml
# .github/workflows/test.yml
name: Agent-Native Tests
on: [push, pull_request]
jobs:
agent-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup
run: npm install
- name: Run Parity Tests
run: npm run test:parity
- name: Run Capability Tests
run: npm run test:capabilities
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Check System Prompt Completeness
run: npm run test:system-prompt
- name: Verify Capability Map
run: |
# Ensure capability map is up to date
npm run generate:capability-map
git diff --exit-code capability-map.ts
```
### Cost-Aware Testing
Agent tests cost API tokens. Strategies to manage:
```typescript
// Use smaller models for basic tests
const testConfig = {
model: process.env.CI ? "claude-3-haiku" : "claude-3-opus",
maxTokens: 500, // Limit output length
};
// Cache responses for deterministic tests
const cachedAgent = new CachedAgent({
cacheDir: ".test-cache",
ttl: 24 * 60 * 60 * 1000, // 24 hours
});
// Run expensive tests only on main branch
if (process.env.GITHUB_REF === 'refs/heads/main') {
describe('Full Integration Tests', () => { ... });
}
```
</ci_integration>
<test_utilities>
## Test Utilities
### Agent Test Harness
```typescript
class AgentTestHarness {
private agent: Agent;
private mockServices: MockServices;
async setup() {
this.mockServices = createMockServices();
this.agent = await createAgent({
services: this.mockServices,
model: "claude-3-haiku", // Cheaper for tests
});
}
async chat(message: string): Promise<AgentResponse> {
return this.agent.chat(message);
}
async expectToolCall(toolName: string) {
const lastResponse = this.agent.getLastResponse();
expect(lastResponse.toolCalls.map(t => t.name)).toContain(toolName);
}
async expectOutcome(check: () => Promise<boolean>) {
const result = await check();
expect(result).toBe(true);
}
getState() {
return {
library: this.mockServices.library.getBooks(),
feed: this.mockServices.feed.getItems(),
files: this.mockServices.files.listAll(),
};
}
}
// Usage
test('full flow', async () => {
const harness = new AgentTestHarness();
await harness.setup();
await harness.chat("Add 'Moby Dick' to my library");
await harness.expectToolCall("add_book");
await harness.expectOutcome(async () => {
const state = harness.getState();
return state.library.some(b => b.title.includes("Moby"));
});
});
```
</test_utilities>
<checklist>
## Testing Checklist
Automated Tests:
- [ ] "Can Agent Do It?" tests for each UI action
- [ ] Location awareness tests ("write to my feed")
- [ ] Parity tests (tool exists, documented in prompt)
- [ ] Context parity tests (agent sees what UI shows)
- [ ] End-to-end flow tests
- [ ] Failure recovery tests
Manual Tests:
- [ ] Natural language variation (multiple phrasings work)
- [ ] Edge case prompts (open-ended requests)
- [ ] Confusion test (agent knows app vocabulary)
- [ ] Surprise test (agent can be creative)
CI Integration:
- [ ] Parity tests run on every PR
- [ ] Capability tests run with API key
- [ ] System prompt completeness check
- [ ] Capability map drift detection
</checklist>

View File

@@ -203,6 +203,259 @@ tool("apply_pending", async () => {
- docs/* (documentation)
</pattern>
<pattern name="unified-agent-architecture">
## Unified Agent Architecture
One execution engine, many agent types. All agents use the same orchestrator but with different configurations.
```
┌─────────────────────────────────────────────────────────────┐
│ AgentOrchestrator │
├─────────────────────────────────────────────────────────────┤
│ - Lifecycle management (start, pause, resume, stop) │
│ - Checkpoint/restore (for background execution) │
│ - Tool execution │
│ - Chat integration │
└─────────────────────────────────────────────────────────────┘
│ │ │
┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐
│ Research │ │ Chat │ │ Profile │
│ Agent │ │ Agent │ │ Agent │
└───────────┘ └───────────┘ └───────────┘
- web_search - read_library - read_photos
- write_file - publish_to_feed - write_file
- read_file - web_search - analyze_image
```
**Implementation:**
```swift
// All agents use the same orchestrator
let session = try await AgentOrchestrator.shared.startAgent(
config: ResearchAgent.create(book: book), // Config varies
tools: ResearchAgent.tools, // Tools vary
context: ResearchAgent.context(for: book) // Context varies
)
// Agent types define their own configuration
struct ResearchAgent {
static var tools: [AgentTool] {
[
FileTools.readFile(),
FileTools.writeFile(),
WebTools.webSearch(),
WebTools.webFetch(),
]
}
static func context(for book: Book) -> String {
"""
You are researching "\(book.title)" by \(book.author).
Save findings to Documents/Research/\(book.id)/
"""
}
}
struct ChatAgent {
static var tools: [AgentTool] {
[
FileTools.readFile(),
FileTools.writeFile(),
BookTools.readLibrary(),
BookTools.publishToFeed(), // Chat can publish directly
WebTools.webSearch(),
]
}
static func context(library: [Book]) -> String {
"""
You help the user with their reading.
Available books: \(library.map { $0.title }.joined(separator: ", "))
"""
}
}
```
**Benefits:**
- Consistent lifecycle management across all agent types
- Automatic checkpoint/resume (critical for mobile)
- Shared tool protocol
- Easy to add new agent types
- Centralized error handling and logging
</pattern>
<pattern name="agent-to-ui-communication">
## Agent-to-UI Communication
When agents take actions, the UI should reflect them immediately. The user should see what the agent did.
**Pattern 1: Shared Data Store (Recommended)**
Agent writes through the same service the UI observes:
```swift
// Shared service
class BookLibraryService: ObservableObject {
static let shared = BookLibraryService()
@Published var books: [Book] = []
@Published var feedItems: [FeedItem] = []
func addFeedItem(_ item: FeedItem) {
feedItems.append(item)
persist()
}
}
// Agent tool writes through shared service
tool("publish_to_feed", async ({ bookId, content, headline }) => {
let item = FeedItem(bookId: bookId, content: content, headline: headline)
BookLibraryService.shared.addFeedItem(item) // Same service UI uses
return { text: "Published to feed" }
})
// UI observes the same service
struct FeedView: View {
@StateObject var library = BookLibraryService.shared
var body: some View {
List(library.feedItems) { item in
FeedItemRow(item: item)
// Automatically updates when agent adds items
}
}
}
```
**Pattern 2: File System Observation**
For file-based data, watch the file system:
```swift
class ResearchWatcher: ObservableObject {
@Published var files: [URL] = []
private var watcher: DirectoryWatcher?
func watch(bookId: String) {
let path = documentsURL.appendingPathComponent("Research/\(bookId)")
watcher = DirectoryWatcher(path: path) { [weak self] in
self?.reload(from: path)
}
reload(from: path)
}
}
// Agent writes files
tool("write_file", { path, content }) -> {
writeFile(documentsURL.appendingPathComponent(path), content)
// DirectoryWatcher triggers UI update automatically
}
```
**Pattern 3: Event Bus (Cross-Component)**
For complex apps with multiple independent components:
```typescript
// Shared event bus
const agentEvents = new EventEmitter();
// Agent tool emits events
tool("publish_to_feed", async ({ content }) => {
const item = await feedService.add(content);
agentEvents.emit('feed:new-item', item);
return { text: "Published" };
});
// UI components subscribe
function FeedView() {
const [items, setItems] = useState([]);
useEffect(() => {
const handler = (item) => setItems(prev => [...prev, item]);
agentEvents.on('feed:new-item', handler);
return () => agentEvents.off('feed:new-item', handler);
}, []);
return <FeedList items={items} />;
}
```
**What to avoid:**
```swift
// BAD: UI doesn't observe agent changes
// Agent writes to database directly
tool("publish_to_feed", { content }) {
database.insert("feed", content) // UI doesn't see this
}
// UI loads once at startup, never refreshes
struct FeedView: View {
let items = database.query("feed") // Stale!
}
```
</pattern>
<pattern name="model-tier-selection">
## Model Tier Selection
Different agents need different intelligence levels. Use the cheapest model that achieves the outcome.
| Agent Type | Recommended Tier | Reasoning |
|------------|-----------------|-----------|
| Chat/Conversation | Balanced | Fast responses, good reasoning |
| Research | Balanced | Tool loops, not ultra-complex synthesis |
| Content Generation | Balanced | Creative but not synthesis-heavy |
| Complex Analysis | Powerful | Multi-document synthesis, nuanced judgment |
| Profile/Onboarding | Powerful | Photo analysis, complex pattern recognition |
| Simple Queries | Fast/Haiku | Quick lookups, simple transformations |
**Implementation:**
```swift
enum ModelTier {
case fast // claude-3-haiku: Quick, cheap, simple tasks
case balanced // claude-3-sonnet: Good balance for most tasks
case powerful // claude-3-opus: Complex reasoning, synthesis
}
struct AgentConfig {
let modelTier: ModelTier
let tools: [AgentTool]
let systemPrompt: String
}
// Research agent: balanced tier
let researchConfig = AgentConfig(
modelTier: .balanced,
tools: researchTools,
systemPrompt: researchPrompt
)
// Profile analysis: powerful tier (complex photo interpretation)
let profileConfig = AgentConfig(
modelTier: .powerful,
tools: profileTools,
systemPrompt: profilePrompt
)
// Quick lookup: fast tier
let lookupConfig = AgentConfig(
modelTier: .fast,
tools: [readLibrary],
systemPrompt: "Answer quick questions about the user's library."
)
```
**Cost optimization strategies:**
- Start with balanced tier, only upgrade if quality insufficient
- Use fast tier for tool-heavy loops where each turn is simple
- Reserve powerful tier for synthesis tasks (comparing multiple sources)
- Consider token limits per turn to control costs
</pattern>
<design_questions>
## Questions to Ask When Designing
@@ -212,4 +465,7 @@ tool("apply_pending", async () => {
4. **What decisions should be hardcoded?** (security boundaries, approval requirements)
5. **How does the agent verify its work?** (health checks, build verification)
6. **How does the agent recover from mistakes?** (git rollback, approval gates)
7. **How does the UI know when agent changes state?** (shared store, file watching, events)
8. **What model tier does each agent type need?** (fast, balanced, powerful)
9. **How do agents share infrastructure?** (unified orchestrator, shared tools)
</design_questions>

View File

@@ -0,0 +1,338 @@
<overview>
How to inject dynamic runtime context into agent system prompts. The agent needs to know what exists in the app to know what it can work with. Static prompts aren't enough—the agent needs to see the same context the user sees.
**Core principle:** The user's context IS the agent's context.
</overview>
<why_context_matters>
## Why Dynamic Context Injection?
A static system prompt tells the agent what it CAN do. Dynamic context tells it what it can do RIGHT NOW with the user's actual data.
**The failure case:**
```
User: "Write a little thing about Catherine the Great in my reading feed"
Agent: "What system are you referring to? I'm not sure what reading feed means."
```
The agent failed because it didn't know:
- What books exist in the user's library
- What the "reading feed" is
- What tools it has to publish there
**The fix:** Inject runtime context about app state into the system prompt.
</why_context_matters>
<pattern name="context-injection">
## The Context Injection Pattern
Build your system prompt dynamically, including current app state:
```swift
func buildSystemPrompt() -> String {
// Gather current state
let availableBooks = libraryService.books
let recentActivity = analysisService.recentRecords(limit: 10)
let userProfile = profileService.currentProfile
return """
# Your Identity
You are a reading assistant for \(userProfile.name)'s library.
## Available Books in User's Library
\(availableBooks.map { "- \"\($0.title)\" by \($0.author) (id: \($0.id))" }.joined(separator: "\n"))
## Recent Reading Activity
\(recentActivity.map { "- Analyzed \"\($0.bookTitle)\": \($0.excerptPreview)" }.joined(separator: "\n"))
## Your Capabilities
- **publish_to_feed**: Create insights that appear in the Feed tab
- **read_library**: View books, highlights, and analyses
- **web_search**: Search the internet for research
- **write_file**: Save research to Documents/Research/{bookId}/
When the user mentions "the feed" or "reading feed", they mean the Feed tab
where insights appear. Use `publish_to_feed` to create content there.
"""
}
```
</pattern>
<what_to_inject>
## What Context to Inject
### 1. Available Resources
What data/files exist that the agent can access?
```swift
## Available in User's Library
Books:
- "Moby Dick" by Herman Melville (id: book_123)
- "1984" by George Orwell (id: book_456)
Research folders:
- Documents/Research/book_123/ (3 files)
- Documents/Research/book_456/ (1 file)
```
### 2. Current State
What has the user done recently? What's the current context?
```swift
## Recent Activity
- 2 hours ago: Highlighted passage in "1984" about surveillance
- Yesterday: Completed research on "Moby Dick" whale symbolism
- This week: Added 3 new books to library
```
### 3. Capabilities Mapping
What tool maps to what UI feature? Use the user's language.
```swift
## What You Can Do
| User Says | You Should Use | Result |
|-----------|----------------|--------|
| "my feed" / "reading feed" | `publish_to_feed` | Creates insight in Feed tab |
| "my library" / "my books" | `read_library` | Shows their book collection |
| "research this" | `web_search` + `write_file` | Saves to Research folder |
| "my profile" | `read_file("profile.md")` | Shows reading profile |
```
### 4. Domain Vocabulary
Explain app-specific terms the user might use.
```swift
## Vocabulary
- **Feed**: The Feed tab showing reading insights and analyses
- **Research folder**: Documents/Research/{bookId}/ where research is stored
- **Reading profile**: A markdown file describing user's reading preferences
- **Highlight**: A passage the user marked in a book
```
</what_to_inject>
<implementation_patterns>
## Implementation Patterns
### Pattern 1: Service-Based Injection (Swift/iOS)
```swift
class AgentContextBuilder {
let libraryService: BookLibraryService
let profileService: ReadingProfileService
let activityService: ActivityService
func buildContext() -> String {
let books = libraryService.books
let profile = profileService.currentProfile
let activity = activityService.recent(limit: 10)
return """
## Library (\(books.count) books)
\(formatBooks(books))
## Profile
\(profile.summary)
## Recent Activity
\(formatActivity(activity))
"""
}
private func formatBooks(_ books: [Book]) -> String {
books.map { "- \"\($0.title)\" (id: \($0.id))" }.joined(separator: "\n")
}
}
// Usage in agent initialization
let context = AgentContextBuilder(
libraryService: .shared,
profileService: .shared,
activityService: .shared
).buildContext()
let systemPrompt = basePrompt + "\n\n" + context
```
### Pattern 2: Hook-Based Injection (TypeScript)
```typescript
interface ContextProvider {
getContext(): Promise<string>;
}
class LibraryContextProvider implements ContextProvider {
async getContext(): Promise<string> {
const books = await db.books.list();
const recent = await db.activity.recent(10);
return `
## Library
${books.map(b => `- "${b.title}" (${b.id})`).join('\n')}
## Recent
${recent.map(r => `- ${r.description}`).join('\n')}
`.trim();
}
}
// Compose multiple providers
async function buildSystemPrompt(providers: ContextProvider[]): Promise<string> {
const contexts = await Promise.all(providers.map(p => p.getContext()));
return [BASE_PROMPT, ...contexts].join('\n\n');
}
```
### Pattern 3: Template-Based Injection
```markdown
# System Prompt Template (system-prompt.template.md)
You are a reading assistant.
## Available Books
{{#each books}}
- "{{title}}" by {{author}} (id: {{id}})
{{/each}}
## Capabilities
{{#each capabilities}}
- **{{name}}**: {{description}}
{{/each}}
## Recent Activity
{{#each recentActivity}}
- {{timestamp}}: {{description}}
{{/each}}
```
```typescript
// Render at runtime
const prompt = Handlebars.compile(template)({
books: await libraryService.getBooks(),
capabilities: getCapabilities(),
recentActivity: await activityService.getRecent(10),
});
```
</implementation_patterns>
<context_freshness>
## Context Freshness
Context should be injected at agent initialization, and optionally refreshed during long sessions.
**At initialization:**
```swift
// Always inject fresh context when starting an agent
func startChatAgent() async -> AgentSession {
let context = await buildCurrentContext() // Fresh context
return await AgentOrchestrator.shared.startAgent(
config: ChatAgent.config,
systemPrompt: basePrompt + context
)
}
```
**During long sessions (optional):**
```swift
// For long-running agents, provide a refresh tool
tool("refresh_context", "Get current app state") { _ in
let books = libraryService.books
let recent = activityService.recent(10)
return """
Current library: \(books.count) books
Recent: \(recent.map { $0.summary }.joined(separator: ", "))
"""
}
```
**What NOT to do:**
```swift
// DON'T: Use stale context from app launch
let cachedContext = appLaunchContext // Stale!
// Books may have been added, activity may have changed
```
</context_freshness>
<examples>
## Real-World Example: Every Reader
The Every Reader app injects context for its chat agent:
```swift
func getChatAgentSystemPrompt() -> String {
// Get current library state
let books = BookLibraryService.shared.books
let analyses = BookLibraryService.shared.analysisRecords.prefix(10)
let profile = ReadingProfileService.shared.getProfileForSystemPrompt()
let bookList = books.map { book in
"- \"\(book.title)\" by \(book.author) (id: \(book.id))"
}.joined(separator: "\n")
let recentList = analyses.map { record in
let title = books.first { $0.id == record.bookId }?.title ?? "Unknown"
return "- From \"\(title)\": \"\(record.excerptPreview)\""
}.joined(separator: "\n")
return """
# Reading Assistant
You help the user with their reading and book research.
## Available Books in User's Library
\(bookList.isEmpty ? "No books yet." : bookList)
## Recent Reading Journal (Latest Analyses)
\(recentList.isEmpty ? "No analyses yet." : recentList)
## Reading Profile
\(profile)
## Your Capabilities
- **Publish to Feed**: Create insights using `publish_to_feed` that appear in the Feed tab
- **Library Access**: View books and highlights using `read_library`
- **Research**: Search web and save to Documents/Research/{bookId}/
- **Profile**: Read/update the user's reading profile
When the user asks you to "write something for their feed" or "add to my reading feed",
use the `publish_to_feed` tool with the relevant book_id.
"""
}
```
**Result:** When user says "write a little thing about Catherine the Great in my reading feed", the agent:
1. Sees "reading feed" → knows to use `publish_to_feed`
2. Sees available books → finds the relevant book ID
3. Creates appropriate content for the Feed tab
</examples>
<checklist>
## Context Injection Checklist
Before launching an agent:
- [ ] System prompt includes current resources (books, files, data)
- [ ] Recent activity is visible to the agent
- [ ] Capabilities are mapped to user vocabulary
- [ ] Domain-specific terms are explained
- [ ] Context is fresh (gathered at agent start, not cached)
When adding new features:
- [ ] New resources are included in context injection
- [ ] New capabilities are documented in system prompt
- [ ] User vocabulary for the feature is mapped
</checklist>

View File

@@ -303,9 +303,187 @@ Use your judgment about importance ratings.
```
</example>
<principle name="dynamic-capability-discovery">
## Dynamic Capability Discovery vs Static Tool Mapping
**This pattern is specifically for agent-native apps** where you want the agent to have full access to an external API—the same access a user would have. It follows the core agent-native principle: "Whatever the user can do, the agent can do."
If you're building a constrained agent with limited capabilities, static tool mapping may be intentional. But for agent-native apps integrating with HealthKit, HomeKit, GraphQL, or similar APIs:
**Static Tool Mapping (Anti-pattern for Agent-Native):**
Build individual tools for each API capability. Always out of date, limits agent to only what you anticipated.
```typescript
// ❌ Static: Every API type needs a hardcoded tool
tool("read_steps", async ({ startDate, endDate }) => {
return healthKit.query(HKQuantityType.stepCount, startDate, endDate);
});
tool("read_heart_rate", async ({ startDate, endDate }) => {
return healthKit.query(HKQuantityType.heartRate, startDate, endDate);
});
tool("read_sleep", async ({ startDate, endDate }) => {
return healthKit.query(HKCategoryType.sleepAnalysis, startDate, endDate);
});
// When HealthKit adds glucose tracking... you need a code change
```
**Dynamic Capability Discovery (Preferred):**
Build a meta-tool that discovers what's available, and a generic tool that can access anything.
```typescript
// ✅ Dynamic: Agent discovers and uses any capability
// Discovery tool - returns what's available at runtime
tool("list_available_capabilities", async () => {
const quantityTypes = await healthKit.availableQuantityTypes();
const categoryTypes = await healthKit.availableCategoryTypes();
return {
text: `Available health metrics:\n` +
`Quantity types: ${quantityTypes.join(", ")}\n` +
`Category types: ${categoryTypes.join(", ")}\n` +
`\nUse read_health_data with any of these types.`
};
});
// Generic access tool - type is a string, API validates
tool("read_health_data", {
dataType: z.string(), // NOT z.enum - let HealthKit validate
startDate: z.string(),
endDate: z.string(),
aggregation: z.enum(["sum", "average", "samples"]).optional()
}, async ({ dataType, startDate, endDate, aggregation }) => {
// HealthKit validates the type, returns helpful error if invalid
const result = await healthKit.query(dataType, startDate, endDate, aggregation);
return { text: JSON.stringify(result, null, 2) };
});
```
**When to Use Each Approach:**
| Dynamic (Agent-Native) | Static (Constrained Agent) |
|------------------------|---------------------------|
| Agent should access anything user can | Agent has intentionally limited scope |
| External API with many endpoints (HealthKit, HomeKit, GraphQL) | Internal domain with fixed operations |
| API evolves independently of your code | Tightly coupled domain logic |
| You want full action parity | You want strict guardrails |
**The agent-native default is Dynamic.** Only use Static when you're intentionally limiting the agent's capabilities.
**Complete Dynamic Pattern:**
```swift
// 1. Discovery tool: What can I access?
tool("list_health_types", "Get available health data types") { _ in
let store = HKHealthStore()
let quantityTypes = HKQuantityTypeIdentifier.allCases.map { $0.rawValue }
let categoryTypes = HKCategoryTypeIdentifier.allCases.map { $0.rawValue }
let characteristicTypes = HKCharacteristicTypeIdentifier.allCases.map { $0.rawValue }
return ToolResult(text: """
Available HealthKit types:
## Quantity Types (numeric values)
\(quantityTypes.joined(separator: ", "))
## Category Types (categorical data)
\(categoryTypes.joined(separator: ", "))
## Characteristic Types (user info)
\(characteristicTypes.joined(separator: ", "))
Use read_health_data or write_health_data with any of these.
""")
}
// 2. Generic read: Access any type by name
tool("read_health_data", "Read any health metric", {
dataType: z.string().describe("Type name from list_health_types"),
startDate: z.string(),
endDate: z.string()
}) { request in
// Let HealthKit validate the type name
guard let type = HKQuantityTypeIdentifier(rawValue: request.dataType)
?? HKCategoryTypeIdentifier(rawValue: request.dataType) else {
return ToolResult(
text: "Unknown type: \(request.dataType). Use list_health_types to see available types.",
isError: true
)
}
let samples = try await healthStore.querySamples(type: type, start: startDate, end: endDate)
return ToolResult(text: samples.formatted())
}
// 3. Context injection: Tell agent what's available in system prompt
func buildSystemPrompt() -> String {
let availableTypes = healthService.getAuthorizedTypes()
return """
## Available Health Data
You have access to these health metrics:
\(availableTypes.map { "- \($0)" }.joined(separator: "\n"))
Use read_health_data with any type above. For new types not listed,
use list_health_types to discover what's available.
"""
}
```
**Benefits:**
- Agent can use any API capability, including ones added after your code shipped
- API is the validator, not your enum definition
- Smaller tool surface (2-3 tools vs N tools)
- Agent naturally discovers capabilities by asking
- Works with any API that has introspection (HealthKit, GraphQL, OpenAPI)
</principle>
<principle name="crud-completeness">
## CRUD Completeness
Every data type the agent can create, it should be able to read, update, and delete. Incomplete CRUD = broken action parity.
**Anti-pattern: Create-only tools**
```typescript
// ❌ Can create but not modify or delete
tool("create_experiment", { hypothesis, variable, metric })
tool("write_journal_entry", { content, author, tags })
// User: "Delete that experiment" → Agent: "I can't do that"
```
**Correct: Full CRUD for each entity**
```typescript
// ✅ Complete CRUD
tool("create_experiment", { hypothesis, variable, metric })
tool("read_experiment", { id })
tool("update_experiment", { id, updates: { hypothesis?, status?, endDate? } })
tool("delete_experiment", { id })
tool("create_journal_entry", { content, author, tags })
tool("read_journal", { query?, dateRange?, author? })
tool("update_journal_entry", { id, content, tags? })
tool("delete_journal_entry", { id })
```
**The CRUD Audit:**
For each entity type in your app, verify:
- [ ] Create: Agent can create new instances
- [ ] Read: Agent can query/search/list instances
- [ ] Update: Agent can modify existing instances
- [ ] Delete: Agent can remove instances
If any operation is missing, users will eventually ask for it and the agent will fail.
</principle>
<checklist>
## MCP Tool Design Checklist
**Fundamentals:**
- [ ] Tool names describe capability, not use case
- [ ] Inputs are data, not decisions
- [ ] Outputs are rich (enough for agent to verify)
@@ -313,4 +491,16 @@ Use your judgment about importance ratings.
- [ ] No business logic in tool implementations
- [ ] Error states clearly communicated via `isError`
- [ ] Descriptions explain what the tool does, not when to use it
**Dynamic Capability Discovery (for agent-native apps):**
- [ ] For external APIs where agent should have full access, use dynamic discovery
- [ ] Include a `list_*` or `discover_*` tool for each API surface
- [ ] Use string inputs (not enums) when the API validates
- [ ] Inject available capabilities into system prompt at runtime
- [ ] Only use static tool mapping if intentionally limiting agent scope
**CRUD Completeness:**
- [ ] Every entity has create, read, update, delete operations
- [ ] Every UI action has a corresponding agent tool
- [ ] Test: "Can the agent undo what it just did?"
</checklist>

View File

@@ -0,0 +1,658 @@
<overview>
Mobile agent-native apps face unique challenges: background execution limits, system permissions, network constraints, and cost sensitivity. This guide covers patterns for building robust agent experiences on iOS and Android.
</overview>
<background_execution>
## Background Execution & Resumption
Mobile apps can be suspended or terminated at any time. Agents must handle this gracefully.
### The Challenge
```
User starts research agent
Agent begins web search
User switches to another app
iOS suspends your app
Agent is mid-execution... what happens?
```
### Checkpoint/Resume Pattern
Save agent state before backgrounding, restore on foreground:
```swift
class AgentOrchestrator: ObservableObject {
@Published var activeSessions: [AgentSession] = []
// Called when app is about to background
func handleAppWillBackground() {
for session in activeSessions {
saveCheckpoint(session)
session.transition(to: .backgrounded)
}
}
// Called when app returns to foreground
func handleAppDidForeground() {
for session in activeSessions where session.state == .backgrounded {
if let checkpoint = loadCheckpoint(session.id) {
resumeFromCheckpoint(session, checkpoint)
}
}
}
private func saveCheckpoint(_ session: AgentSession) {
let checkpoint = AgentCheckpoint(
sessionId: session.id,
conversationHistory: session.messages,
pendingToolCalls: session.pendingToolCalls,
partialResults: session.partialResults,
timestamp: Date()
)
storage.save(checkpoint, for: session.id)
}
private func resumeFromCheckpoint(_ session: AgentSession, _ checkpoint: AgentCheckpoint) {
session.messages = checkpoint.conversationHistory
session.pendingToolCalls = checkpoint.pendingToolCalls
// Resume execution if there were pending tool calls
if !checkpoint.pendingToolCalls.isEmpty {
session.transition(to: .running)
Task { await executeNextTool(session) }
}
}
}
```
### State Machine for Agent Lifecycle
```swift
enum AgentState {
case idle // Not running
case running // Actively executing
case waitingForUser // Paused, waiting for user input
case backgrounded // App backgrounded, state saved
case completed // Finished successfully
case failed(Error) // Finished with error
}
class AgentSession: ObservableObject {
@Published var state: AgentState = .idle
func transition(to newState: AgentState) {
let validTransitions: [AgentState: Set<AgentState>] = [
.idle: [.running],
.running: [.waitingForUser, .backgrounded, .completed, .failed],
.waitingForUser: [.running, .backgrounded],
.backgrounded: [.running, .completed],
]
guard validTransitions[state]?.contains(newState) == true else {
logger.warning("Invalid transition: \(state)\(newState)")
return
}
state = newState
}
}
```
### Background Task Extension (iOS)
Request extra time when backgrounded during critical operations:
```swift
class AgentOrchestrator {
private var backgroundTask: UIBackgroundTaskIdentifier = .invalid
func handleAppWillBackground() {
// Request extra time for saving state
backgroundTask = UIApplication.shared.beginBackgroundTask { [weak self] in
self?.endBackgroundTask()
}
// Save all checkpoints
Task {
for session in activeSessions {
await saveCheckpoint(session)
}
endBackgroundTask()
}
}
private func endBackgroundTask() {
if backgroundTask != .invalid {
UIApplication.shared.endBackgroundTask(backgroundTask)
backgroundTask = .invalid
}
}
}
```
### User Communication
Let users know what's happening:
```swift
struct AgentStatusView: View {
@ObservedObject var session: AgentSession
var body: some View {
switch session.state {
case .backgrounded:
Label("Paused (app in background)", systemImage: "pause.circle")
.foregroundColor(.orange)
case .running:
Label("Working...", systemImage: "ellipsis.circle")
.foregroundColor(.blue)
case .waitingForUser:
Label("Waiting for your input", systemImage: "person.circle")
.foregroundColor(.green)
// ...
}
}
}
```
</background_execution>
<permissions>
## Permission Handling
Mobile agents may need access to system resources. Handle permission requests gracefully.
### Common Permissions
| Resource | iOS Permission | Use Case |
|----------|---------------|----------|
| Photo Library | PHPhotoLibrary | Profile generation from photos |
| Files | Document picker | Reading user documents |
| Camera | AVCaptureDevice | Scanning book covers |
| Location | CLLocationManager | Location-aware recommendations |
| Network | (automatic) | Web search, API calls |
### Permission-Aware Tools
Check permissions before executing:
```swift
struct PhotoTools {
static func readPhotos() -> AgentTool {
tool(
name: "read_photos",
description: "Read photos from the user's photo library",
parameters: [
"limit": .number("Maximum photos to read"),
"dateRange": .string("Date range filter").optional()
],
execute: { params, context in
// Check permission first
let status = await PHPhotoLibrary.requestAuthorization(for: .readWrite)
switch status {
case .authorized, .limited:
// Proceed with reading photos
let photos = await fetchPhotos(params)
return ToolResult(text: "Found \(photos.count) photos", images: photos)
case .denied, .restricted:
return ToolResult(
text: "Photo access needed. Please grant permission in Settings → Privacy → Photos.",
isError: true
)
case .notDetermined:
return ToolResult(
text: "Photo permission required. Please try again.",
isError: true
)
@unknown default:
return ToolResult(text: "Unknown permission status", isError: true)
}
}
)
}
}
```
### Graceful Degradation
When permissions aren't granted, offer alternatives:
```swift
func readPhotos() async -> ToolResult {
let status = PHPhotoLibrary.authorizationStatus(for: .readWrite)
switch status {
case .denied, .restricted:
// Suggest alternative
return ToolResult(
text: """
I don't have access to your photos. You can either:
1. Grant access in Settings → Privacy → Photos
2. Share specific photos directly in our chat
Would you like me to help with something else instead?
""",
isError: false // Not a hard error, just a limitation
)
// ...
}
}
```
### Permission Request Timing
Don't request permissions until needed:
```swift
// BAD: Request all permissions at launch
func applicationDidFinishLaunching() {
requestPhotoAccess()
requestCameraAccess()
requestLocationAccess()
// User is overwhelmed with permission dialogs
}
// GOOD: Request when the feature is used
tool("analyze_book_cover", async ({ image }) => {
// Only request camera access when user tries to scan a cover
let status = await AVCaptureDevice.requestAccess(for: .video)
if status {
return await scanCover(image)
} else {
return ToolResult(text: "Camera access needed for book scanning")
}
})
```
</permissions>
<cost_awareness>
## Cost-Aware Design
Mobile users may be on cellular data or concerned about API costs. Design agents to be efficient.
### Model Tier Selection
Use the cheapest model that achieves the outcome:
```swift
enum ModelTier {
case fast // claude-3-haiku: ~$0.25/1M tokens
case balanced // claude-3-sonnet: ~$3/1M tokens
case powerful // claude-3-opus: ~$15/1M tokens
var modelId: String {
switch self {
case .fast: return "claude-3-haiku-20240307"
case .balanced: return "claude-3-sonnet-20240229"
case .powerful: return "claude-3-opus-20240229"
}
}
}
// Match model to task complexity
let agentConfigs: [AgentType: ModelTier] = [
.quickLookup: .fast, // "What's in my library?"
.chatAssistant: .balanced, // General conversation
.researchAgent: .balanced, // Web search + synthesis
.profileGenerator: .powerful, // Complex photo analysis
.introductionWriter: .balanced,
]
```
### Token Budgets
Limit tokens per agent session:
```swift
struct AgentConfig {
let modelTier: ModelTier
let maxInputTokens: Int
let maxOutputTokens: Int
let maxTurns: Int
static let research = AgentConfig(
modelTier: .balanced,
maxInputTokens: 50_000,
maxOutputTokens: 4_000,
maxTurns: 20
)
static let quickChat = AgentConfig(
modelTier: .fast,
maxInputTokens: 10_000,
maxOutputTokens: 1_000,
maxTurns: 5
)
}
class AgentSession {
var totalTokensUsed: Int = 0
func checkBudget() -> Bool {
if totalTokensUsed > config.maxInputTokens {
transition(to: .failed(AgentError.budgetExceeded))
return false
}
return true
}
}
```
### Network-Aware Execution
Defer heavy operations to WiFi:
```swift
class NetworkMonitor: ObservableObject {
@Published var isOnWiFi: Bool = false
@Published var isExpensive: Bool = false // Cellular or hotspot
private let monitor = NWPathMonitor()
func startMonitoring() {
monitor.pathUpdateHandler = { [weak self] path in
DispatchQueue.main.async {
self?.isOnWiFi = path.usesInterfaceType(.wifi)
self?.isExpensive = path.isExpensive
}
}
monitor.start(queue: .global())
}
}
class AgentOrchestrator {
@ObservedObject var network = NetworkMonitor()
func startResearchAgent(for book: Book) async {
if network.isExpensive {
// Warn user or defer
let proceed = await showAlert(
"Research uses data",
message: "This will use approximately 1-2 MB of cellular data. Continue?"
)
if !proceed { return }
}
// Proceed with research
await runAgent(ResearchAgent.create(book: book))
}
}
```
### Batch API Calls
Combine multiple small requests:
```swift
// BAD: Many small API calls
for book in books {
await agent.chat("Summarize \(book.title)")
}
// GOOD: Batch into one request
let bookList = books.map { $0.title }.joined(separator: ", ")
await agent.chat("Summarize each of these books briefly: \(bookList)")
```
### Caching
Cache expensive operations:
```swift
class ResearchCache {
private var cache: [String: CachedResearch] = [:]
func getCachedResearch(for bookId: String) -> CachedResearch? {
guard let cached = cache[bookId] else { return nil }
// Expire after 24 hours
if Date().timeIntervalSince(cached.timestamp) > 86400 {
cache.removeValue(forKey: bookId)
return nil
}
return cached
}
func cacheResearch(_ research: Research, for bookId: String) {
cache[bookId] = CachedResearch(
research: research,
timestamp: Date()
)
}
}
// In research tool
tool("web_search", async ({ query, bookId }) => {
// Check cache first
if let cached = cache.getCachedResearch(for: bookId) {
return ToolResult(text: cached.research.summary, cached: true)
}
// Otherwise, perform search
let results = await webSearch(query)
cache.cacheResearch(results, for: bookId)
return ToolResult(text: results.summary)
})
```
### Cost Visibility
Show users what they're spending:
```swift
struct AgentCostView: View {
@ObservedObject var session: AgentSession
var body: some View {
VStack(alignment: .leading) {
Text("Session Stats")
.font(.headline)
HStack {
Label("\(session.turnCount) turns", systemImage: "arrow.2.squarepath")
Spacer()
Label(formatTokens(session.totalTokensUsed), systemImage: "text.word.spacing")
}
if let estimatedCost = session.estimatedCost {
Text("Est. cost: \(estimatedCost, format: .currency(code: "USD"))")
.font(.caption)
.foregroundColor(.secondary)
}
}
}
}
```
</cost_awareness>
<offline_handling>
## Offline Graceful Degradation
Handle offline scenarios gracefully:
```swift
class ConnectivityAwareAgent {
@ObservedObject var network = NetworkMonitor()
func executeToolCall(_ toolCall: ToolCall) async -> ToolResult {
// Check if tool requires network
let requiresNetwork = ["web_search", "web_fetch", "call_api"]
.contains(toolCall.name)
if requiresNetwork && !network.isConnected {
return ToolResult(
text: """
I can't access the internet right now. Here's what I can do offline:
- Read your library and existing research
- Answer questions from cached data
- Write notes and drafts for later
Would you like me to try something that works offline?
""",
isError: false
)
}
return await executeOnline(toolCall)
}
}
```
### Offline-First Tools
Some tools should work entirely offline:
```swift
let offlineTools: Set<String> = [
"read_file",
"write_file",
"list_files",
"read_library", // Local database
"search_local", // Local search
]
let onlineTools: Set<String> = [
"web_search",
"web_fetch",
"publish_to_cloud",
]
let hybridTools: Set<String> = [
"publish_to_feed", // Works offline, syncs later
]
```
### Queued Actions
Queue actions that require connectivity:
```swift
class OfflineQueue: ObservableObject {
@Published var pendingActions: [QueuedAction] = []
func queue(_ action: QueuedAction) {
pendingActions.append(action)
persist()
}
func processWhenOnline() {
network.$isConnected
.filter { $0 }
.sink { [weak self] _ in
self?.processPendingActions()
}
}
private func processPendingActions() {
for action in pendingActions {
Task {
try await execute(action)
remove(action)
}
}
}
}
```
</offline_handling>
<battery_awareness>
## Battery-Aware Execution
Respect device battery state:
```swift
class BatteryMonitor: ObservableObject {
@Published var batteryLevel: Float = 1.0
@Published var isCharging: Bool = false
@Published var isLowPowerMode: Bool = false
var shouldDeferHeavyWork: Bool {
return batteryLevel < 0.2 && !isCharging
}
func startMonitoring() {
UIDevice.current.isBatteryMonitoringEnabled = true
NotificationCenter.default.addObserver(
forName: UIDevice.batteryLevelDidChangeNotification,
object: nil,
queue: .main
) { [weak self] _ in
self?.batteryLevel = UIDevice.current.batteryLevel
}
NotificationCenter.default.addObserver(
forName: NSNotification.Name.NSProcessInfoPowerStateDidChange,
object: nil,
queue: .main
) { [weak self] _ in
self?.isLowPowerMode = ProcessInfo.processInfo.isLowPowerModeEnabled
}
}
}
class AgentOrchestrator {
@ObservedObject var battery = BatteryMonitor()
func startAgent(_ config: AgentConfig) async {
if battery.shouldDeferHeavyWork && config.isHeavy {
let proceed = await showAlert(
"Low Battery",
message: "This task uses significant battery. Continue or defer until charging?"
)
if !proceed { return }
}
// Adjust model tier based on battery
let adjustedConfig = battery.isLowPowerMode
? config.withModelTier(.fast)
: config
await runAgent(adjustedConfig)
}
}
```
</battery_awareness>
<checklist>
## Mobile Agent-Native Checklist
**Background Execution:**
- [ ] Checkpoint/resume implemented for all agent sessions
- [ ] State machine for agent lifecycle (idle, running, backgrounded, etc.)
- [ ] Background task extension for critical saves
- [ ] User-visible status for backgrounded agents
**Permissions:**
- [ ] Permissions requested only when needed, not at launch
- [ ] Graceful degradation when permissions denied
- [ ] Clear error messages with Settings deep links
- [ ] Alternative paths when permissions unavailable
**Cost Awareness:**
- [ ] Model tier matched to task complexity
- [ ] Token budgets per session
- [ ] Network-aware (defer heavy work to WiFi)
- [ ] Caching for expensive operations
- [ ] Cost visibility to users
**Offline Handling:**
- [ ] Offline-capable tools identified
- [ ] Graceful degradation for online-only features
- [ ] Action queue for sync when online
- [ ] Clear user communication about offline state
**Battery Awareness:**
- [ ] Battery monitoring for heavy operations
- [ ] Low power mode detection
- [ ] Defer or downgrade based on battery state
</checklist>

View File

@@ -0,0 +1,680 @@
<overview>
Agents and users should work in the same data space, not separate sandboxes. When the agent writes a file, the user can see it. When the user edits something, the agent can read the changes. This creates transparency, enables collaboration, and eliminates the need for sync layers.
**Core principle:** The agent operates in the same filesystem as the user, not a walled garden.
</overview>
<why_shared_workspace>
## Why Shared Workspace?
### The Sandbox Anti-Pattern
Many agent implementations isolate the agent:
```
┌─────────────────┐ ┌─────────────────┐
│ User Space │ │ Agent Space │
├─────────────────┤ ├─────────────────┤
│ Documents/ │ │ agent_output/ │
│ user_files/ │ ←→ │ temp_files/ │
│ settings.json │sync │ cache/ │
└─────────────────┘ └─────────────────┘
```
Problems:
- Need a sync layer to move data between spaces
- User can't easily inspect agent work
- Agent can't build on user contributions
- Duplication of state
- Complexity in keeping spaces consistent
### The Shared Workspace Pattern
```
┌─────────────────────────────────────────┐
│ Shared Workspace │
├─────────────────────────────────────────┤
│ Documents/ │
│ ├── Research/ │
│ │ └── {bookId}/ ← Agent writes │
│ │ ├── full_text.txt │
│ │ ├── introduction.md ← User can edit │
│ │ └── sources/ │
│ ├── Chats/ ← Both read/write │
│ └── profile.md ← Agent generates, user refines │
└─────────────────────────────────────────┘
↑ ↑
User Agent
(UI) (Tools)
```
Benefits:
- Users can inspect, edit, and extend agent work
- Agents can build on user contributions
- No synchronization layer needed
- Complete transparency
- Single source of truth
</why_shared_workspace>
<directory_structure>
## Designing Your Shared Workspace
### Structure by Domain
Organize by what the data represents, not who created it:
```
Documents/
├── Research/
│ └── {bookId}/
│ ├── full_text.txt # Agent downloads
│ ├── introduction.md # Agent generates, user can edit
│ ├── notes.md # User adds, agent can read
│ └── sources/
│ └── {source}.md # Agent gathers
├── Chats/
│ └── {conversationId}.json # Both read/write
├── Exports/
│ └── {date}/ # Agent generates for user
└── profile.md # Agent generates from photos
```
### Don't Structure by Actor
```
# BAD - Separates by who created it
Documents/
├── user_created/
│ └── notes.md
├── agent_created/
│ └── research.md
└── system/
└── config.json
```
This creates artificial boundaries and makes collaboration harder.
### Use Conventions for Metadata
If you need to track who created/modified something:
```markdown
<!-- introduction.md -->
---
created_by: agent
created_at: 2024-01-15
last_modified_by: user
last_modified_at: 2024-01-16
---
# Introduction to Moby Dick
This personalized introduction was generated by your reading assistant
and refined by you on January 16th.
```
</directory_structure>
<file_tools>
## File Tools for Shared Workspace
Give the agent the same file primitives the app uses:
```swift
// iOS/Swift implementation
struct FileTools {
static func readFile() -> AgentTool {
tool(
name: "read_file",
description: "Read a file from the user's documents",
parameters: ["path": .string("File path relative to Documents/")],
execute: { params in
let path = params["path"] as! String
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
let fileURL = documentsURL.appendingPathComponent(path)
let content = try String(contentsOf: fileURL)
return ToolResult(text: content)
}
)
}
static func writeFile() -> AgentTool {
tool(
name: "write_file",
description: "Write a file to the user's documents",
parameters: [
"path": .string("File path relative to Documents/"),
"content": .string("File content")
],
execute: { params in
let path = params["path"] as! String
let content = params["content"] as! String
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
let fileURL = documentsURL.appendingPathComponent(path)
// Create parent directories if needed
try FileManager.default.createDirectory(
at: fileURL.deletingLastPathComponent(),
withIntermediateDirectories: true
)
try content.write(to: fileURL, atomically: true, encoding: .utf8)
return ToolResult(text: "Wrote \(path)")
}
)
}
static func listFiles() -> AgentTool {
tool(
name: "list_files",
description: "List files in a directory",
parameters: ["path": .string("Directory path relative to Documents/")],
execute: { params in
let path = params["path"] as! String
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
let dirURL = documentsURL.appendingPathComponent(path)
let contents = try FileManager.default.contentsOfDirectory(atPath: dirURL.path)
return ToolResult(text: contents.joined(separator: "\n"))
}
)
}
static func searchText() -> AgentTool {
tool(
name: "search_text",
description: "Search for text across files",
parameters: [
"query": .string("Text to search for"),
"path": .string("Directory to search in").optional()
],
execute: { params in
// Implement text search across documents
// Return matching files and snippets
}
)
}
}
```
### TypeScript/Node.js Implementation
```typescript
const fileTools = [
tool(
"read_file",
"Read a file from the workspace",
{ path: z.string().describe("File path") },
async ({ path }) => {
const content = await fs.readFile(path, 'utf-8');
return { text: content };
}
),
tool(
"write_file",
"Write a file to the workspace",
{
path: z.string().describe("File path"),
content: z.string().describe("File content")
},
async ({ path, content }) => {
await fs.mkdir(dirname(path), { recursive: true });
await fs.writeFile(path, content, 'utf-8');
return { text: `Wrote ${path}` };
}
),
tool(
"list_files",
"List files in a directory",
{ path: z.string().describe("Directory path") },
async ({ path }) => {
const files = await fs.readdir(path);
return { text: files.join('\n') };
}
),
tool(
"append_file",
"Append content to a file",
{
path: z.string().describe("File path"),
content: z.string().describe("Content to append")
},
async ({ path, content }) => {
await fs.appendFile(path, content, 'utf-8');
return { text: `Appended to ${path}` };
}
),
];
```
</file_tools>
<ui_integration>
## UI Integration with Shared Workspace
The UI should observe the same files the agent writes to:
### Pattern 1: File-Based Reactivity (iOS)
```swift
class ResearchViewModel: ObservableObject {
@Published var researchFiles: [ResearchFile] = []
private var watcher: DirectoryWatcher?
func startWatching(bookId: String) {
let researchPath = documentsURL
.appendingPathComponent("Research")
.appendingPathComponent(bookId)
watcher = DirectoryWatcher(url: researchPath) { [weak self] in
// Reload when agent writes new files
self?.loadResearchFiles(from: researchPath)
}
loadResearchFiles(from: researchPath)
}
}
// SwiftUI automatically updates when files change
struct ResearchView: View {
@StateObject var viewModel = ResearchViewModel()
var body: some View {
List(viewModel.researchFiles) { file in
ResearchFileRow(file: file)
}
}
}
```
### Pattern 2: Shared Data Store
When file-watching isn't practical, use a shared data store:
```swift
// Shared service that both UI and agent tools use
class BookLibraryService: ObservableObject {
static let shared = BookLibraryService()
@Published var books: [Book] = []
@Published var analysisRecords: [AnalysisRecord] = []
func addAnalysisRecord(_ record: AnalysisRecord) {
analysisRecords.append(record)
// Persists to shared storage
saveToStorage()
}
}
// Agent tool writes through the same service
tool("publish_to_feed", async ({ bookId, content, headline }) => {
let record = AnalysisRecord(bookId: bookId, content: content, headline: headline)
BookLibraryService.shared.addAnalysisRecord(record)
return { text: "Published to feed" }
})
// UI observes the same service
struct FeedView: View {
@StateObject var library = BookLibraryService.shared
var body: some View {
List(library.analysisRecords) { record in
FeedItemRow(record: record)
}
}
}
```
### Pattern 3: Hybrid (Files + Index)
Use files for content, database for indexing:
```
Documents/
├── Research/
│ └── book_123/
│ └── introduction.md # Actual content (file)
Database:
├── research_index
│ └── { bookId: "book_123", path: "Research/book_123/introduction.md", ... }
```
```swift
// Agent writes file
await writeFile("Research/\(bookId)/introduction.md", content)
// And updates index
await database.insert("research_index", {
bookId: bookId,
path: "Research/\(bookId)/introduction.md",
title: extractTitle(content),
createdAt: Date()
})
// UI queries index, then reads files
let items = database.query("research_index", where: bookId == "book_123")
for item in items {
let content = readFile(item.path)
// Display...
}
```
</ui_integration>
<collaboration_patterns>
## Agent-User Collaboration Patterns
### Pattern: Agent Drafts, User Refines
```
1. Agent generates introduction.md
2. User opens in Files app or in-app editor
3. User makes refinements
4. Agent can see changes via read_file
5. Future agent work builds on user refinements
```
The agent's system prompt should acknowledge this:
```markdown
## Working with User Content
When you create content (introductions, research notes, etc.), the user may
edit it afterward. Always read existing files before modifying them—the user
may have made improvements you should preserve.
If a file exists and has been modified by the user (check the metadata or
compare to your last known version), ask before overwriting.
```
### Pattern: User Seeds, Agent Expands
```
1. User creates notes.md with initial thoughts
2. User asks: "Research more about this"
3. Agent reads notes.md to understand context
4. Agent adds to notes.md or creates related files
5. User continues building on agent additions
```
### Pattern: Append-Only Collaboration
For chat logs or activity streams:
```markdown
<!-- activity.md - Both append, neither overwrites -->
## 2024-01-15
**User:** Started reading "Moby Dick"
**Agent:** Downloaded full text and created research folder
**User:** Added highlight about whale symbolism
**Agent:** Found 3 academic sources on whale symbolism in Melville's work
```
</collaboration_patterns>
<security_considerations>
## Security in Shared Workspace
### Scope the Workspace
Don't give agents access to the entire filesystem:
```swift
// GOOD: Scoped to app's documents
let documentsURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
tool("read_file", { path }) {
// Path is relative to documents, can't escape
let fileURL = documentsURL.appendingPathComponent(path)
guard fileURL.path.hasPrefix(documentsURL.path) else {
throw ToolError("Invalid path")
}
return try String(contentsOf: fileURL)
}
// BAD: Absolute paths allow escape
tool("read_file", { path }) {
return try String(contentsOf: URL(fileURLWithPath: path)) // Can read /etc/passwd!
}
```
### Protect Sensitive Files
```swift
let protectedPaths = [".env", "credentials.json", "secrets/"]
tool("read_file", { path }) {
if protectedPaths.any({ path.contains($0) }) {
throw ToolError("Cannot access protected file")
}
// ...
}
```
### Audit Agent Actions
Log what the agent reads/writes:
```swift
func logFileAccess(action: String, path: String, agentId: String) {
logger.info("[\(agentId)] \(action): \(path)")
}
tool("write_file", { path, content }) {
logFileAccess(action: "WRITE", path: path, agentId: context.agentId)
// ...
}
```
</security_considerations>
<examples>
## Real-World Example: Every Reader
The Every Reader app uses shared workspace for research:
```
Documents/
├── Research/
│ └── book_moby_dick/
│ ├── full_text.txt # Agent downloads from Gutenberg
│ ├── introduction.md # Agent generates, personalized
│ ├── sources/
│ │ ├── whale_symbolism.md # Agent researches
│ │ └── melville_bio.md # Agent researches
│ └── user_notes.md # User can add their own notes
├── Chats/
│ └── 2024-01-15.json # Chat history
└── profile.md # Agent generated from photos
```
**How it works:**
1. User adds "Moby Dick" to library
2. User starts research agent
3. Agent downloads full text to `Research/book_moby_dick/full_text.txt`
4. Agent researches and writes to `sources/`
5. Agent generates `introduction.md` based on user's reading profile
6. User can view all files in the app or Files.app
7. User can edit `introduction.md` to refine it
8. Chat agent can read all of this context when answering questions
</examples>
<icloud_sync>
## iCloud File Storage for Multi-Device Sync (iOS)
For agent-native iOS apps, use iCloud Drive's Documents folder for your shared workspace. This gives you **free, automatic multi-device sync** without building a sync layer or running a server.
### Why iCloud Documents?
| Approach | Cost | Complexity | Offline | Multi-Device |
|----------|------|------------|---------|--------------|
| Custom backend + sync | $$$ | High | Manual | Yes |
| CloudKit database | Free tier limits | Medium | Manual | Yes |
| **iCloud Documents** | Free (user's storage) | Low | Automatic | Automatic |
iCloud Documents:
- Uses user's existing iCloud storage (free 5GB, most users have more)
- Automatic sync across all user's devices
- Works offline, syncs when online
- Files visible in Files.app for transparency
- No server costs, no sync code to maintain
### Implementation Pattern
```swift
// Get the iCloud Documents container
func iCloudDocumentsURL() -> URL? {
FileManager.default.url(forUbiquityContainerIdentifier: nil)?
.appendingPathComponent("Documents")
}
// Your shared workspace lives in iCloud
class SharedWorkspace {
let rootURL: URL
init() {
// Use iCloud if available, fall back to local
if let iCloudURL = iCloudDocumentsURL() {
self.rootURL = iCloudURL
} else {
// Fallback to local Documents (user not signed into iCloud)
self.rootURL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
}
}
// All file operations go through this root
func researchPath(for bookId: String) -> URL {
rootURL.appendingPathComponent("Research/\(bookId)")
}
func journalPath() -> URL {
rootURL.appendingPathComponent("Journal")
}
}
```
### Directory Structure in iCloud
```
iCloud Drive/
└── YourApp/ # Your app's container
└── Documents/ # Visible in Files.app
├── Journal/
│ ├── user/
│ │ └── 2025-01-15.md # Syncs across devices
│ └── agent/
│ └── 2025-01-15.md # Agent observations sync too
├── Experiments/
│ └── magnesium-sleep/
│ ├── config.json
│ └── log.json
└── Research/
└── {topic}/
└── sources.md
```
### Handling Sync Conflicts
iCloud handles conflicts automatically, but you should design for it:
```swift
// Check for conflicts when reading
func readJournalEntry(at url: URL) throws -> JournalEntry {
// iCloud may create .icloud placeholder files for not-yet-downloaded content
if url.pathExtension == "icloud" {
// Trigger download
try FileManager.default.startDownloadingUbiquitousItem(at: url)
throw FileNotYetAvailableError()
}
let data = try Data(contentsOf: url)
return try JSONDecoder().decode(JournalEntry.self, from: data)
}
// For writes, use coordinated file access
func writeJournalEntry(_ entry: JournalEntry, to url: URL) throws {
let coordinator = NSFileCoordinator()
var error: NSError?
coordinator.coordinate(writingItemAt: url, options: .forReplacing, error: &error) { newURL in
let data = try? JSONEncoder().encode(entry)
try? data?.write(to: newURL)
}
if let error = error {
throw error
}
}
```
### What This Enables
1. **User starts experiment on iPhone** → Agent creates `Experiments/sleep-tracking/config.json`
2. **User opens app on iPad** → Same experiment visible, no sync code needed
3. **Agent logs observation on iPhone** → Syncs to iPad automatically
4. **User edits journal on iPad** → iPhone sees the edit
### Entitlements Required
Add to your app's entitlements:
```xml
<key>com.apple.developer.icloud-container-identifiers</key>
<array>
<string>iCloud.com.yourcompany.yourapp</string>
</array>
<key>com.apple.developer.icloud-services</key>
<array>
<string>CloudDocuments</string>
</array>
<key>com.apple.developer.ubiquity-container-identifiers</key>
<array>
<string>iCloud.com.yourcompany.yourapp</string>
</array>
```
### When NOT to Use iCloud Documents
- **Sensitive data** - Use Keychain or encrypted local storage instead
- **High-frequency writes** - iCloud sync has latency; use local + periodic sync
- **Large media files** - Consider CloudKit Assets or on-demand resources
- **Shared between users** - iCloud Documents is single-user; use CloudKit for sharing
</icloud_sync>
<checklist>
## Shared Workspace Checklist
Architecture:
- [ ] Single shared directory for agent and user data
- [ ] Organized by domain, not by actor
- [ ] File tools scoped to workspace (no escape)
- [ ] Protected paths for sensitive files
Tools:
- [ ] `read_file` - Read any file in workspace
- [ ] `write_file` - Write any file in workspace
- [ ] `list_files` - Browse directory structure
- [ ] `search_text` - Find content across files (optional)
UI Integration:
- [ ] UI observes same files agent writes
- [ ] Changes reflect immediately (file watching or shared store)
- [ ] User can edit agent-created files
- [ ] Agent reads user modifications before overwriting
Collaboration:
- [ ] System prompt acknowledges user may edit files
- [ ] Agent checks for user modifications before overwriting
- [ ] Metadata tracks who created/modified (optional)
Multi-Device (iOS):
- [ ] Use iCloud Documents for shared workspace (free sync)
- [ ] Fallback to local Documents if iCloud unavailable
- [ ] Handle `.icloud` placeholder files (trigger download)
- [ ] Use NSFileCoordinator for conflict-safe writes
</checklist>