[2.23.0] Major update to agent-native-architecture skill (#70)
Align skill with canonical Agent-Native Architecture document: ## Core Changes - Restructure SKILL.md with 5 named principles from canonical: - Parity: Agent can do whatever user can do - Granularity: Prefer atomic primitives - Composability: Features are prompts - Emergent Capability: Handle unanticipated requests - Improvement Over Time: Context accumulation - Add "The test" for each principle - Add "Why Now" section (Claude Code origin story) - Update terminology from "prompt-native" to "agent-native" - Add "The Ultimate Test" to success criteria ## New Reference Files - files-universal-interface.md: Why files, organization patterns, context.md pattern, conflict model - from-primitives-to-domain-tools.md: When to add domain tools, graduating to code - agent-execution-patterns.md: Completion signals, partial completion, context limits - product-implications.md: Progressive disclosure, latent demand discovery, approval matrix ## Updated Reference Files - mobile-patterns.md: Add iOS storage architecture (iCloud-first), "needs validation" callouts, on-device vs cloud section - architecture-patterns.md: Update overview to reference 5 principles and cross-link new files ## Anti-Patterns - Add missing anti-patterns: agent as router, build-then-add-agent, request/response thinking, defensive tool design, happy path in code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,467 @@
|
||||
<overview>
|
||||
Agent execution patterns for building robust agent loops. This covers how agents signal completion, track partial progress for resume, select appropriate model tiers, and handle context limits.
|
||||
</overview>
|
||||
|
||||
<completion_signals>
|
||||
## Completion Signals
|
||||
|
||||
Agents need an explicit way to say "I'm done."
|
||||
|
||||
### Anti-Pattern: Heuristic Detection
|
||||
|
||||
Detecting completion through heuristics is fragile:
|
||||
|
||||
- Consecutive iterations without tool calls
|
||||
- Checking for expected output files
|
||||
- Tracking "no progress" states
|
||||
- Time-based timeouts
|
||||
|
||||
These break in edge cases and create unpredictable behavior.
|
||||
|
||||
### Pattern: Explicit Completion Tool
|
||||
|
||||
Provide a `complete_task` tool that:
|
||||
- Takes a summary of what was accomplished
|
||||
- Returns a signal that stops the loop
|
||||
- Works identically across all agent types
|
||||
|
||||
```typescript
|
||||
tool("complete_task", {
|
||||
summary: z.string().describe("Summary of what was accomplished"),
|
||||
status: z.enum(["success", "partial", "blocked"]).optional(),
|
||||
}, async ({ summary, status = "success" }) => {
|
||||
return {
|
||||
text: summary,
|
||||
shouldContinue: false, // Key: signals loop should stop
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### The ToolResult Pattern
|
||||
|
||||
Structure tool results to separate success from continuation:
|
||||
|
||||
```swift
|
||||
struct ToolResult {
|
||||
let success: Bool // Did tool succeed?
|
||||
let output: String // What happened?
|
||||
let shouldContinue: Bool // Should agent loop continue?
|
||||
}
|
||||
|
||||
// Three common cases:
|
||||
extension ToolResult {
|
||||
static func success(_ output: String) -> ToolResult {
|
||||
// Tool succeeded, keep going
|
||||
ToolResult(success: true, output: output, shouldContinue: true)
|
||||
}
|
||||
|
||||
static func error(_ message: String) -> ToolResult {
|
||||
// Tool failed but recoverable, agent can try something else
|
||||
ToolResult(success: false, output: message, shouldContinue: true)
|
||||
}
|
||||
|
||||
static func complete(_ summary: String) -> ToolResult {
|
||||
// Task done, stop the loop
|
||||
ToolResult(success: true, output: summary, shouldContinue: false)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key Insight
|
||||
|
||||
**This is different from success/failure:**
|
||||
|
||||
- A tool can **succeed** AND signal **stop** (task complete)
|
||||
- A tool can **fail** AND signal **continue** (recoverable error, try something else)
|
||||
|
||||
```typescript
|
||||
// Examples:
|
||||
read_file("/missing.txt")
|
||||
// → { success: false, output: "File not found", shouldContinue: true }
|
||||
// Agent can try a different file or ask for clarification
|
||||
|
||||
complete_task("Organized all downloads into folders")
|
||||
// → { success: true, output: "...", shouldContinue: false }
|
||||
// Agent is done
|
||||
|
||||
write_file("/output.md", content)
|
||||
// → { success: true, output: "Wrote file", shouldContinue: true }
|
||||
// Agent keeps working toward the goal
|
||||
```
|
||||
|
||||
### System Prompt Guidance
|
||||
|
||||
Tell the agent when to complete:
|
||||
|
||||
```markdown
|
||||
## Completing Tasks
|
||||
|
||||
When you've accomplished the user's request:
|
||||
1. Verify your work (read back files you created, check results)
|
||||
2. Call `complete_task` with a summary of what you did
|
||||
3. Don't keep working after the goal is achieved
|
||||
|
||||
If you're blocked and can't proceed:
|
||||
- Call `complete_task` with status "blocked" and explain why
|
||||
- Don't loop forever trying the same thing
|
||||
```
|
||||
</completion_signals>
|
||||
|
||||
<partial_completion>
|
||||
## Partial Completion
|
||||
|
||||
For multi-step tasks, track progress at the task level for resume capability.
|
||||
|
||||
### Task State Tracking
|
||||
|
||||
```swift
|
||||
enum TaskStatus {
|
||||
case pending // Not yet started
|
||||
case inProgress // Currently working on
|
||||
case completed // Finished successfully
|
||||
case failed // Couldn't complete (with reason)
|
||||
case skipped // Intentionally not done
|
||||
}
|
||||
|
||||
struct AgentTask {
|
||||
let id: String
|
||||
let description: String
|
||||
var status: TaskStatus
|
||||
var notes: String? // Why it failed, what was done
|
||||
}
|
||||
|
||||
struct AgentSession {
|
||||
var tasks: [AgentTask]
|
||||
|
||||
var isComplete: Bool {
|
||||
tasks.allSatisfy { $0.status == .completed || $0.status == .skipped }
|
||||
}
|
||||
|
||||
var progress: (completed: Int, total: Int) {
|
||||
let done = tasks.filter { $0.status == .completed }.count
|
||||
return (done, tasks.count)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### UI Progress Display
|
||||
|
||||
Show users what's happening:
|
||||
|
||||
```
|
||||
Progress: 3/5 tasks complete (60%)
|
||||
✅ [1] Find source materials
|
||||
✅ [2] Download full text
|
||||
✅ [3] Extract key passages
|
||||
❌ [4] Generate summary - Error: context limit exceeded
|
||||
⏳ [5] Create outline - Pending
|
||||
```
|
||||
|
||||
### Partial Completion Scenarios
|
||||
|
||||
**Agent hits max iterations before finishing:**
|
||||
- Some tasks completed, some pending
|
||||
- Checkpoint saved with current state
|
||||
- Resume continues from where it left off, not from beginning
|
||||
|
||||
**Agent fails on one task:**
|
||||
- Task marked `.failed` with error in notes
|
||||
- Other tasks may continue (agent decides)
|
||||
- Orchestrator doesn't automatically abort entire session
|
||||
|
||||
**Network error mid-task:**
|
||||
- Current iteration throws
|
||||
- Session marked `.failed`
|
||||
- Checkpoint preserves messages up to that point
|
||||
- Resume possible from checkpoint
|
||||
|
||||
### Checkpoint Structure
|
||||
|
||||
```swift
|
||||
struct AgentCheckpoint: Codable {
|
||||
let sessionId: String
|
||||
let agentType: String
|
||||
let messages: [Message] // Full conversation history
|
||||
let iterationCount: Int
|
||||
let tasks: [AgentTask] // Task state
|
||||
let customState: [String: Any] // Agent-specific state
|
||||
let timestamp: Date
|
||||
|
||||
var isValid: Bool {
|
||||
// Checkpoints expire (default 1 hour)
|
||||
Date().timeIntervalSince(timestamp) < 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Resume Flow
|
||||
|
||||
1. On app launch, scan for valid checkpoints
|
||||
2. Show user: "You have an incomplete session. Resume?"
|
||||
3. On resume:
|
||||
- Restore messages to conversation
|
||||
- Restore task states
|
||||
- Continue agent loop from where it left off
|
||||
4. On dismiss:
|
||||
- Delete checkpoint
|
||||
- Start fresh if user tries again
|
||||
</partial_completion>
|
||||
|
||||
<model_tier_selection>
|
||||
## Model Tier Selection
|
||||
|
||||
Different agents need different intelligence levels. Use the cheapest model that achieves the outcome.
|
||||
|
||||
### Tier Guidelines
|
||||
|
||||
| Agent Type | Recommended Tier | Reasoning |
|
||||
|------------|-----------------|-----------|
|
||||
| Chat/Conversation | Balanced (Sonnet) | Fast responses, good reasoning |
|
||||
| Research | Balanced (Sonnet) | Tool loops, not ultra-complex synthesis |
|
||||
| Content Generation | Balanced (Sonnet) | Creative but not synthesis-heavy |
|
||||
| Complex Analysis | Powerful (Opus) | Multi-document synthesis, nuanced judgment |
|
||||
| Profile Generation | Powerful (Opus) | Photo analysis, complex pattern recognition |
|
||||
| Quick Queries | Fast (Haiku) | Simple lookups, quick transformations |
|
||||
| Simple Classification | Fast (Haiku) | High volume, simple decisions |
|
||||
|
||||
### Implementation
|
||||
|
||||
```swift
|
||||
enum ModelTier {
|
||||
case fast // claude-3-haiku: Quick, cheap, simple tasks
|
||||
case balanced // claude-sonnet: Good balance for most tasks
|
||||
case powerful // claude-opus: Complex reasoning, synthesis
|
||||
|
||||
var modelId: String {
|
||||
switch self {
|
||||
case .fast: return "claude-3-haiku-20240307"
|
||||
case .balanced: return "claude-sonnet-4-20250514"
|
||||
case .powerful: return "claude-opus-4-20250514"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct AgentConfig {
|
||||
let name: String
|
||||
let modelTier: ModelTier
|
||||
let tools: [AgentTool]
|
||||
let systemPrompt: String
|
||||
let maxIterations: Int
|
||||
}
|
||||
|
||||
// Examples
|
||||
let researchConfig = AgentConfig(
|
||||
name: "research",
|
||||
modelTier: .balanced,
|
||||
tools: researchTools,
|
||||
systemPrompt: researchPrompt,
|
||||
maxIterations: 20
|
||||
)
|
||||
|
||||
let quickLookupConfig = AgentConfig(
|
||||
name: "lookup",
|
||||
modelTier: .fast,
|
||||
tools: [readLibrary],
|
||||
systemPrompt: "Answer quick questions about the user's library.",
|
||||
maxIterations: 3
|
||||
)
|
||||
```
|
||||
|
||||
### Cost Optimization Strategies
|
||||
|
||||
1. **Start with balanced, upgrade if quality insufficient**
|
||||
2. **Use fast tier for tool-heavy loops** where each turn is simple
|
||||
3. **Reserve powerful tier for synthesis tasks** (comparing multiple sources)
|
||||
4. **Consider token limits per turn** to control costs
|
||||
5. **Cache expensive operations** to avoid repeated calls
|
||||
</model_tier_selection>
|
||||
|
||||
<context_limits>
|
||||
## Context Limits
|
||||
|
||||
Agent sessions can extend indefinitely, but context windows don't. Design for bounded context from the start.
|
||||
|
||||
### The Problem
|
||||
|
||||
```
|
||||
Turn 1: User asks question → 500 tokens
|
||||
Turn 2: Agent reads file → 10,000 tokens
|
||||
Turn 3: Agent reads another file → 10,000 tokens
|
||||
Turn 4: Agent researches → 20,000 tokens
|
||||
...
|
||||
Turn 10: Context window exceeded
|
||||
```
|
||||
|
||||
### Design Principles
|
||||
|
||||
**1. Tools should support iterative refinement**
|
||||
|
||||
Instead of all-or-nothing, design for summary → detail → full:
|
||||
|
||||
```typescript
|
||||
// Good: Supports iterative refinement
|
||||
tool("read_file", {
|
||||
path: z.string(),
|
||||
preview: z.boolean().default(true), // Return first 1000 chars by default
|
||||
full: z.boolean().default(false), // Opt-in to full content
|
||||
}, ...);
|
||||
|
||||
tool("search_files", {
|
||||
query: z.string(),
|
||||
summaryOnly: z.boolean().default(true), // Return matches, not full files
|
||||
}, ...);
|
||||
```
|
||||
|
||||
**2. Provide consolidation tools**
|
||||
|
||||
Give agents a way to consolidate learnings mid-session:
|
||||
|
||||
```typescript
|
||||
tool("summarize_and_continue", {
|
||||
keyPoints: z.array(z.string()),
|
||||
nextSteps: z.array(z.string()),
|
||||
}, async ({ keyPoints, nextSteps }) => {
|
||||
// Store summary, potentially truncate earlier messages
|
||||
await saveSessionSummary({ keyPoints, nextSteps });
|
||||
return { text: "Summary saved. Continuing with focus on: " + nextSteps.join(", ") };
|
||||
});
|
||||
```
|
||||
|
||||
**3. Design for truncation**
|
||||
|
||||
Assume the orchestrator may truncate early messages. Important context should be:
|
||||
- In the system prompt (always present)
|
||||
- In files (can be re-read)
|
||||
- Summarized in context.md
|
||||
|
||||
### Implementation Strategies
|
||||
|
||||
```swift
|
||||
class AgentOrchestrator {
|
||||
let maxContextTokens = 100_000
|
||||
let targetContextTokens = 80_000 // Leave headroom
|
||||
|
||||
func shouldTruncate() -> Bool {
|
||||
estimateTokens(messages) > targetContextTokens
|
||||
}
|
||||
|
||||
func truncateIfNeeded() {
|
||||
if shouldTruncate() {
|
||||
// Keep system prompt + recent messages
|
||||
// Summarize or drop older messages
|
||||
messages = [systemMessage] + summarizeOldMessages() + recentMessages
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### System Prompt Guidance
|
||||
|
||||
```markdown
|
||||
## Managing Context
|
||||
|
||||
For long tasks, periodically consolidate what you've learned:
|
||||
1. If you've gathered a lot of information, summarize key points
|
||||
2. Save important findings to files (they persist beyond context)
|
||||
3. Use `summarize_and_continue` if the conversation is getting long
|
||||
|
||||
Don't try to hold everything in memory. Write it down.
|
||||
```
|
||||
</context_limits>
|
||||
|
||||
<orchestrator_pattern>
|
||||
## Unified Agent Orchestrator
|
||||
|
||||
One execution engine, many agent types. All agents use the same orchestrator with different configurations.
|
||||
|
||||
```swift
|
||||
class AgentOrchestrator {
|
||||
static let shared = AgentOrchestrator()
|
||||
|
||||
func run(config: AgentConfig, userMessage: String) async -> AgentResult {
|
||||
var messages: [Message] = [
|
||||
.system(config.systemPrompt),
|
||||
.user(userMessage)
|
||||
]
|
||||
|
||||
var iteration = 0
|
||||
|
||||
while iteration < config.maxIterations {
|
||||
// Get agent response
|
||||
let response = await claude.message(
|
||||
model: config.modelTier.modelId,
|
||||
messages: messages,
|
||||
tools: config.tools
|
||||
)
|
||||
|
||||
messages.append(.assistant(response))
|
||||
|
||||
// Process tool calls
|
||||
for toolCall in response.toolCalls {
|
||||
let result = await executeToolCall(toolCall, config: config)
|
||||
messages.append(.toolResult(result))
|
||||
|
||||
// Check for completion signal
|
||||
if !result.shouldContinue {
|
||||
return AgentResult(
|
||||
status: .completed,
|
||||
output: result.output,
|
||||
iterations: iteration + 1
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
// No tool calls = agent is responding, might be done
|
||||
if response.toolCalls.isEmpty {
|
||||
// Could be done, or waiting for user
|
||||
break
|
||||
}
|
||||
|
||||
iteration += 1
|
||||
}
|
||||
|
||||
return AgentResult(
|
||||
status: iteration >= config.maxIterations ? .maxIterations : .responded,
|
||||
output: messages.last?.content ?? "",
|
||||
iterations: iteration
|
||||
)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
- Consistent lifecycle management across all agent types
|
||||
- Automatic checkpoint/resume (critical for mobile)
|
||||
- Shared tool protocol
|
||||
- Easy to add new agent types
|
||||
- Centralized error handling and logging
|
||||
</orchestrator_pattern>
|
||||
|
||||
<checklist>
|
||||
## Agent Execution Checklist
|
||||
|
||||
### Completion Signals
|
||||
- [ ] `complete_task` tool provided (explicit completion)
|
||||
- [ ] No heuristic completion detection
|
||||
- [ ] Tool results include `shouldContinue` flag
|
||||
- [ ] System prompt guides when to complete
|
||||
|
||||
### Partial Completion
|
||||
- [ ] Tasks tracked with status (pending, in_progress, completed, failed)
|
||||
- [ ] Checkpoints saved for resume
|
||||
- [ ] Progress visible to user
|
||||
- [ ] Resume continues from where left off
|
||||
|
||||
### Model Tiers
|
||||
- [ ] Tier selected based on task complexity
|
||||
- [ ] Cost optimization considered
|
||||
- [ ] Fast tier for simple operations
|
||||
- [ ] Powerful tier reserved for synthesis
|
||||
|
||||
### Context Limits
|
||||
- [ ] Tools support iterative refinement (preview vs full)
|
||||
- [ ] Consolidation mechanism available
|
||||
- [ ] Important context persisted to files
|
||||
- [ ] Truncation strategy defined
|
||||
</checklist>
|
||||
@@ -1,5 +1,12 @@
|
||||
<overview>
|
||||
Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
|
||||
Architectural patterns for building agent-native systems. These patterns emerge from the five core principles: Parity, Granularity, Composability, Emergent Capability, and Improvement Over Time.
|
||||
|
||||
Features are outcomes achieved by agents operating in a loop, not functions you write. Tools are atomic primitives. The agent applies judgment; the prompt defines the outcome.
|
||||
|
||||
See also:
|
||||
- [files-universal-interface.md](./files-universal-interface.md) for file organization and context.md patterns
|
||||
- [agent-execution-patterns.md](./agent-execution-patterns.md) for completion signals and partial completion
|
||||
- [product-implications.md](./product-implications.md) for progressive disclosure and approval patterns
|
||||
</overview>
|
||||
|
||||
<pattern name="event-driven-agent">
|
||||
|
||||
@@ -0,0 +1,301 @@
|
||||
<overview>
|
||||
Files are the universal interface for agent-native applications. Agents are naturally fluent with file operations—they already know how to read, write, and organize files. This document covers why files work so well, how to organize them, and the context.md pattern for accumulated knowledge.
|
||||
</overview>
|
||||
|
||||
<why_files>
|
||||
## Why Files
|
||||
|
||||
Agents are naturally good at files. Claude Code works because bash + filesystem is the most battle-tested agent interface. When building agent-native apps, lean into this.
|
||||
|
||||
### Agents Already Know How
|
||||
|
||||
You don't need to teach the agent your API—it already knows `cat`, `grep`, `mv`, `mkdir`. File operations are the primitives it's most fluent with.
|
||||
|
||||
### Files Are Inspectable
|
||||
|
||||
Users can see what the agent created, edit it, move it, delete it. No black box. Complete transparency into agent behavior.
|
||||
|
||||
### Files Are Portable
|
||||
|
||||
Export is trivial. Backup is trivial. Users own their data. No vendor lock-in, no complex migration paths.
|
||||
|
||||
### App State Stays in Sync
|
||||
|
||||
On mobile, if you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
|
||||
|
||||
### Directory Structure Is Information Architecture
|
||||
|
||||
The filesystem gives you hierarchy for free. `/projects/acme/notes/` is self-documenting in a way that `SELECT * FROM notes WHERE project_id = 123` isn't.
|
||||
</why_files>
|
||||
|
||||
<file_organization>
|
||||
## File Organization Patterns
|
||||
|
||||
> **Needs validation:** These conventions are one approach that's worked so far, not a prescription. Better solutions should be considered.
|
||||
|
||||
A general principle of agent-native design: **Design for what agents can reason about.** The best proxy for that is what would make sense to a human. If a human can look at your file structure and understand what's going on, an agent probably can too.
|
||||
|
||||
### Entity-Scoped Directories
|
||||
|
||||
Organize files around entities, not actors or file types:
|
||||
|
||||
```
|
||||
{entity_type}/{entity_id}/
|
||||
├── primary content
|
||||
├── metadata
|
||||
└── related materials
|
||||
```
|
||||
|
||||
**Example:** `Research/books/{bookId}/` contains everything about one book—full text, notes, sources, agent logs.
|
||||
|
||||
### Naming Conventions
|
||||
|
||||
| File Type | Naming Pattern | Example |
|
||||
|-----------|---------------|---------|
|
||||
| Entity data | `{entity}.json` | `library.json`, `status.json` |
|
||||
| Human-readable content | `{content_type}.md` | `introduction.md`, `profile.md` |
|
||||
| Agent reasoning | `agent_log.md` | Per-entity agent history |
|
||||
| Primary content | `full_text.txt` | Downloaded/extracted text |
|
||||
| Multi-volume | `volume{N}.txt` | `volume1.txt`, `volume2.txt` |
|
||||
| External sources | `{source_name}.md` | `wikipedia.md`, `sparknotes.md` |
|
||||
| Checkpoints | `{sessionId}.checkpoint` | UUID-based |
|
||||
| Configuration | `config.json` | Feature settings |
|
||||
|
||||
### Directory Naming
|
||||
|
||||
- **Entity-scoped:** `{entityType}/{entityId}/` (e.g., `Research/books/{bookId}/`)
|
||||
- **Type-scoped:** `{type}/` (e.g., `AgentCheckpoints/`, `AgentLogs/`)
|
||||
- **Convention:** Lowercase with underscores, not camelCase
|
||||
|
||||
### Ephemeral vs. Durable Separation
|
||||
|
||||
Separate agent working files from user's permanent data:
|
||||
|
||||
```
|
||||
Documents/
|
||||
├── AgentCheckpoints/ # Ephemeral (can delete)
|
||||
│ └── {sessionId}.checkpoint
|
||||
├── AgentLogs/ # Ephemeral (debugging)
|
||||
│ └── {type}/{sessionId}.md
|
||||
└── Research/ # Durable (user's work)
|
||||
└── books/{bookId}/
|
||||
```
|
||||
|
||||
### The Split: Markdown vs JSON
|
||||
|
||||
- **Markdown:** For content users might read or edit
|
||||
- **JSON:** For structured data the app queries
|
||||
</file_organization>
|
||||
|
||||
<context_md_pattern>
|
||||
## The context.md Pattern
|
||||
|
||||
A file the agent reads at the start of each session and updates as it learns:
|
||||
|
||||
```markdown
|
||||
# Context
|
||||
|
||||
## Who I Am
|
||||
Reading assistant for the Every app.
|
||||
|
||||
## What I Know About This User
|
||||
- Interested in military history and Russian literature
|
||||
- Prefers concise analysis
|
||||
- Currently reading War and Peace
|
||||
|
||||
## What Exists
|
||||
- 12 notes in /notes
|
||||
- 3 active projects
|
||||
- User preferences at /preferences.md
|
||||
|
||||
## Recent Activity
|
||||
- User created "Project kickoff" (2 hours ago)
|
||||
- Analyzed passage about Austerlitz (yesterday)
|
||||
|
||||
## My Guidelines
|
||||
- Don't spoil books they're reading
|
||||
- Use their interests to personalize insights
|
||||
|
||||
## Current State
|
||||
- No pending tasks
|
||||
- Last sync: 10 minutes ago
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
- **Agent behavior evolves without code changes** - Update the context, behavior changes
|
||||
- **Users can inspect and modify** - Complete transparency
|
||||
- **Natural place for accumulated context** - Learnings persist across sessions
|
||||
- **Portable across sessions** - Restart agent, knowledge preserved
|
||||
|
||||
### How It Works
|
||||
|
||||
1. Agent reads `context.md` at session start
|
||||
2. Agent updates it when learning something important
|
||||
3. System can also update it (recent activity, new resources)
|
||||
4. Context persists across sessions
|
||||
|
||||
### What to Include
|
||||
|
||||
| Section | Purpose |
|
||||
|---------|---------|
|
||||
| Who I Am | Agent identity and role |
|
||||
| What I Know About This User | Learned preferences, interests |
|
||||
| What Exists | Available resources, data |
|
||||
| Recent Activity | Context for continuity |
|
||||
| My Guidelines | Learned rules and constraints |
|
||||
| Current State | Session status, pending items |
|
||||
</context_md_pattern>
|
||||
|
||||
<files_vs_database>
|
||||
## Files vs. Database
|
||||
|
||||
> **Needs validation:** This framing is informed by mobile development. For web apps, the tradeoffs are different.
|
||||
|
||||
| Use files for... | Use database for... |
|
||||
|------------------|---------------------|
|
||||
| Content users should read/edit | High-volume structured data |
|
||||
| Configuration that benefits from version control | Data that needs complex queries |
|
||||
| Agent-generated content | Ephemeral state (sessions, caches) |
|
||||
| Anything that benefits from transparency | Data with relationships |
|
||||
| Large text content | Data that needs indexing |
|
||||
|
||||
**The principle:** Files for legibility, databases for structure. When in doubt, files—they're more transparent and users can always inspect them.
|
||||
|
||||
### When Files Work Best
|
||||
|
||||
- Scale is small (one user's library, not millions of records)
|
||||
- Transparency is valued over query speed
|
||||
- Cloud sync (iCloud, Dropbox) works well with files
|
||||
|
||||
### Hybrid Approach
|
||||
|
||||
Even if you need a database for performance, consider maintaining a file-based "source of truth" that the agent works with, synced to the database for the UI:
|
||||
|
||||
```
|
||||
Files (agent workspace):
|
||||
Research/book_123/introduction.md
|
||||
|
||||
Database (UI queries):
|
||||
research_index: { bookId, path, title, createdAt }
|
||||
```
|
||||
</files_vs_database>
|
||||
|
||||
<conflict_model>
|
||||
## Conflict Model
|
||||
|
||||
If agents and users write to the same files, you need a conflict model.
|
||||
|
||||
### Current Reality
|
||||
|
||||
Most implementations use **last-write-wins** via atomic writes:
|
||||
|
||||
```swift
|
||||
try data.write(to: url, options: [.atomic])
|
||||
```
|
||||
|
||||
This is simple but can lose changes.
|
||||
|
||||
### Options
|
||||
|
||||
| Strategy | Pros | Cons |
|
||||
|----------|------|------|
|
||||
| **Last write wins** | Simple | Changes can be lost |
|
||||
| **Agent checks before writing** | Preserves user edits | More complexity |
|
||||
| **Separate spaces** | No conflicts | Less collaboration |
|
||||
| **Append-only logs** | Never overwrites | Files grow forever |
|
||||
| **File locking** | Safe concurrent access | Complexity, can block |
|
||||
|
||||
### Recommended Approaches
|
||||
|
||||
**For files agents write frequently (logs, status):** Last-write-wins is fine. Conflicts are rare.
|
||||
|
||||
**For files users edit (profiles, notes):** Consider explicit handling:
|
||||
- Agent checks modification time before overwriting
|
||||
- Or keep agent output separate from user-editable content
|
||||
- Or use append-only pattern
|
||||
|
||||
### iCloud Considerations
|
||||
|
||||
iCloud sync adds complexity. It creates `{filename} (conflict).md` files when sync conflicts occur. Monitor for these:
|
||||
|
||||
```swift
|
||||
NotificationCenter.default.addObserver(
|
||||
forName: .NSMetadataQueryDidUpdate,
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
### System Prompt Guidance
|
||||
|
||||
Tell the agent about the conflict model:
|
||||
|
||||
```markdown
|
||||
## Working with User Content
|
||||
|
||||
When you create content, the user may edit it afterward. Always read
|
||||
existing files before modifying them—the user may have made improvements
|
||||
you should preserve.
|
||||
|
||||
If a file has been modified since you last wrote it, ask before overwriting.
|
||||
```
|
||||
</conflict_model>
|
||||
|
||||
<examples>
|
||||
## Example: Reading App File Structure
|
||||
|
||||
```
|
||||
Documents/
|
||||
├── Library/
|
||||
│ └── library.json # Book metadata
|
||||
├── Research/
|
||||
│ └── books/
|
||||
│ └── {bookId}/
|
||||
│ ├── full_text.txt # Downloaded content
|
||||
│ ├── introduction.md # Agent-generated, user-editable
|
||||
│ ├── notes.md # User notes
|
||||
│ └── sources/
|
||||
│ ├── wikipedia.md # Research gathered by agent
|
||||
│ └── reviews.md
|
||||
├── Chats/
|
||||
│ └── {conversationId}.json # Chat history
|
||||
├── Profile/
|
||||
│ └── profile.md # User reading profile
|
||||
└── context.md # Agent's accumulated knowledge
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. User adds book → creates entry in `library.json`
|
||||
2. Agent downloads text → saves to `Research/books/{id}/full_text.txt`
|
||||
3. Agent researches → saves to `sources/`
|
||||
4. Agent generates intro → saves to `introduction.md`
|
||||
5. User edits intro → agent sees changes on next read
|
||||
6. Agent updates `context.md` with learnings
|
||||
</examples>
|
||||
|
||||
<checklist>
|
||||
## Files as Universal Interface Checklist
|
||||
|
||||
### Organization
|
||||
- [ ] Entity-scoped directories (`{type}/{id}/`)
|
||||
- [ ] Consistent naming conventions
|
||||
- [ ] Ephemeral vs durable separation
|
||||
- [ ] Markdown for human content, JSON for structured data
|
||||
|
||||
### context.md
|
||||
- [ ] Agent reads context at session start
|
||||
- [ ] Agent updates context when learning
|
||||
- [ ] Includes: identity, user knowledge, what exists, guidelines
|
||||
- [ ] Persists across sessions
|
||||
|
||||
### Conflict Handling
|
||||
- [ ] Conflict model defined (last-write-wins, check-before-write, etc.)
|
||||
- [ ] Agent guidance in system prompt
|
||||
- [ ] iCloud conflict monitoring (if applicable)
|
||||
|
||||
### Integration
|
||||
- [ ] UI observes file changes (or shared service)
|
||||
- [ ] Agent can read user edits
|
||||
- [ ] User can inspect agent output
|
||||
</checklist>
|
||||
@@ -0,0 +1,359 @@
|
||||
<overview>
|
||||
Start with pure primitives: bash, file operations, basic storage. This proves the architecture works and reveals what the agent actually needs. As patterns emerge, add domain-specific tools deliberately. This document covers when and how to evolve from primitives to domain tools, and when to graduate to optimized code.
|
||||
</overview>
|
||||
|
||||
<start_with_primitives>
|
||||
## Start with Pure Primitives
|
||||
|
||||
Begin every agent-native system with the most atomic tools possible:
|
||||
|
||||
- `read_file` / `write_file` / `list_files`
|
||||
- `bash` (for everything else)
|
||||
- Basic storage (`store_item` / `get_item`)
|
||||
- HTTP requests (`fetch_url`)
|
||||
|
||||
**Why start here:**
|
||||
|
||||
1. **Proves the architecture** - If it works with primitives, your prompts are doing their job
|
||||
2. **Reveals actual needs** - You'll discover what domain concepts matter
|
||||
3. **Maximum flexibility** - Agent can do anything, not just what you anticipated
|
||||
4. **Forces good prompts** - You can't lean on tool logic as a crutch
|
||||
|
||||
### Example: Starting Primitive
|
||||
|
||||
```typescript
|
||||
// Start with just these
|
||||
const tools = [
|
||||
tool("read_file", { path: z.string() }, ...),
|
||||
tool("write_file", { path: z.string(), content: z.string() }, ...),
|
||||
tool("list_files", { path: z.string() }, ...),
|
||||
tool("bash", { command: z.string() }, ...),
|
||||
];
|
||||
|
||||
// Prompt handles the domain logic
|
||||
const prompt = `
|
||||
When processing feedback:
|
||||
1. Read existing feedback from data/feedback.json
|
||||
2. Add the new feedback with your assessment of importance (1-5)
|
||||
3. Write the updated file
|
||||
4. If importance >= 4, create a notification file in data/alerts/
|
||||
`;
|
||||
```
|
||||
</start_with_primitives>
|
||||
|
||||
<when_to_add_domain_tools>
|
||||
## When to Add Domain Tools
|
||||
|
||||
As patterns emerge, you'll want to add domain-specific tools. This is good—but do it deliberately.
|
||||
|
||||
### Vocabulary Anchoring
|
||||
|
||||
**Add a domain tool when:** The agent needs to understand domain concepts.
|
||||
|
||||
A `create_note` tool teaches the agent what "note" means in your system better than "write a file to the notes directory with this format."
|
||||
|
||||
```typescript
|
||||
// Without domain tool - agent must infer structure
|
||||
await agent.chat("Create a note about the meeting");
|
||||
// Agent: writes to... notes/? documents/? what format?
|
||||
|
||||
// With domain tool - vocabulary is anchored
|
||||
tool("create_note", {
|
||||
title: z.string(),
|
||||
content: z.string(),
|
||||
tags: z.array(z.string()).optional(),
|
||||
}, async ({ title, content, tags }) => {
|
||||
// Tool enforces structure, agent understands "note"
|
||||
});
|
||||
```
|
||||
|
||||
### Guardrails
|
||||
|
||||
**Add a domain tool when:** Some operations need validation or constraints that shouldn't be left to agent judgment.
|
||||
|
||||
```typescript
|
||||
// publish_to_feed might enforce format requirements or content policies
|
||||
tool("publish_to_feed", {
|
||||
bookId: z.string(),
|
||||
content: z.string(),
|
||||
headline: z.string().max(100), // Enforce headline length
|
||||
}, async ({ bookId, content, headline }) => {
|
||||
// Validate content meets guidelines
|
||||
if (containsProhibitedContent(content)) {
|
||||
return { text: "Content doesn't meet guidelines", isError: true };
|
||||
}
|
||||
// Enforce proper structure
|
||||
await feedService.publish({ bookId, content, headline, publishedAt: new Date() });
|
||||
});
|
||||
```
|
||||
|
||||
### Efficiency
|
||||
|
||||
**Add a domain tool when:** Common operations would take many primitive calls.
|
||||
|
||||
```typescript
|
||||
// Primitive approach: multiple calls
|
||||
await agent.chat("Get book details");
|
||||
// Agent: read library.json, parse, find book, read full_text.txt, read introduction.md...
|
||||
|
||||
// Domain tool: one call for common operation
|
||||
tool("get_book_with_content", { bookId: z.string() }, async ({ bookId }) => {
|
||||
const book = await library.getBook(bookId);
|
||||
const fullText = await readFile(`Research/${bookId}/full_text.txt`);
|
||||
const intro = await readFile(`Research/${bookId}/introduction.md`);
|
||||
return { text: JSON.stringify({ book, fullText, intro }) };
|
||||
});
|
||||
```
|
||||
</when_to_add_domain_tools>
|
||||
|
||||
<the_rule>
|
||||
## The Rule for Domain Tools
|
||||
|
||||
**Domain tools should represent one conceptual action from the user's perspective.**
|
||||
|
||||
They can include mechanical validation, but **judgment about what to do or whether to do it belongs in the prompt**.
|
||||
|
||||
### Wrong: Bundles Judgment
|
||||
|
||||
```typescript
|
||||
// WRONG - analyze_and_publish bundles judgment into the tool
|
||||
tool("analyze_and_publish", async ({ input }) => {
|
||||
const analysis = analyzeContent(input); // Tool decides how to analyze
|
||||
const shouldPublish = analysis.score > 0.7; // Tool decides whether to publish
|
||||
if (shouldPublish) {
|
||||
await publish(analysis.summary); // Tool decides what to publish
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### Right: One Action, Agent Decides
|
||||
|
||||
```typescript
|
||||
// RIGHT - separate tools, agent decides
|
||||
tool("analyze_content", { content: z.string() }, ...); // Returns analysis
|
||||
tool("publish", { content: z.string() }, ...); // Publishes what agent provides
|
||||
|
||||
// Prompt: "Analyze the content. If it's high quality, publish a summary."
|
||||
// Agent decides what "high quality" means and what summary to write.
|
||||
```
|
||||
|
||||
### The Test
|
||||
|
||||
Ask: "Who is making the decision here?"
|
||||
|
||||
- If the answer is "the tool code" → you've encoded judgment, refactor
|
||||
- If the answer is "the agent based on the prompt" → good
|
||||
</the_rule>
|
||||
|
||||
<keep_primitives_available>
|
||||
## Keep Primitives Available
|
||||
|
||||
**Domain tools are shortcuts, not gates.**
|
||||
|
||||
Unless there's a specific reason to restrict access (security, data integrity), the agent should still be able to use underlying primitives for edge cases.
|
||||
|
||||
```typescript
|
||||
// Domain tool for common case
|
||||
tool("create_note", { title, content }, ...);
|
||||
|
||||
// But primitives still available for edge cases
|
||||
tool("read_file", { path }, ...);
|
||||
tool("write_file", { path, content }, ...);
|
||||
|
||||
// Agent can use create_note normally, but for weird edge case:
|
||||
// "Create a note in a non-standard location with custom metadata"
|
||||
// → Agent uses write_file directly
|
||||
```
|
||||
|
||||
### When to Gate
|
||||
|
||||
Gating (making domain tool the only way) is appropriate for:
|
||||
|
||||
- **Security:** User authentication, payment processing
|
||||
- **Data integrity:** Operations that must maintain invariants
|
||||
- **Audit requirements:** Actions that must be logged in specific ways
|
||||
|
||||
**The default is open.** When you do gate something, make it a conscious decision with a clear reason.
|
||||
</keep_primitives_available>
|
||||
|
||||
<graduating_to_code>
|
||||
## Graduating to Code
|
||||
|
||||
Some operations will need to move from agent-orchestrated to optimized code for performance or reliability.
|
||||
|
||||
### The Progression
|
||||
|
||||
```
|
||||
Stage 1: Agent uses primitives in a loop
|
||||
→ Flexible, proves the concept
|
||||
→ Slow, potentially expensive
|
||||
|
||||
Stage 2: Add domain tools for common operations
|
||||
→ Faster, still agent-orchestrated
|
||||
→ Agent still decides when/whether to use
|
||||
|
||||
Stage 3: For hot paths, implement in optimized code
|
||||
→ Fast, deterministic
|
||||
→ Agent can still trigger, but execution is code
|
||||
```
|
||||
|
||||
### Example Progression
|
||||
|
||||
**Stage 1: Pure primitives**
|
||||
```markdown
|
||||
Prompt: "When user asks for a summary, read all notes in /notes,
|
||||
analyze them, and write a summary to /summaries/{date}.md"
|
||||
|
||||
Agent: Calls read_file 20 times, reasons about content, writes summary
|
||||
Time: 30 seconds, 50k tokens
|
||||
```
|
||||
|
||||
**Stage 2: Domain tool**
|
||||
```typescript
|
||||
tool("get_all_notes", {}, async () => {
|
||||
const notes = await readAllNotesFromDirectory();
|
||||
return { text: JSON.stringify(notes) };
|
||||
});
|
||||
|
||||
// Agent still decides how to summarize, but retrieval is faster
|
||||
// Time: 10 seconds, 30k tokens
|
||||
```
|
||||
|
||||
**Stage 3: Optimized code**
|
||||
```typescript
|
||||
tool("generate_weekly_summary", {}, async () => {
|
||||
// Entire operation in code for hot path
|
||||
const notes = await getNotes({ since: oneWeekAgo });
|
||||
const summary = await generateSummary(notes); // Could use cheaper model
|
||||
await writeSummary(summary);
|
||||
return { text: "Summary generated" };
|
||||
});
|
||||
|
||||
// Agent just triggers it
|
||||
// Time: 2 seconds, 5k tokens
|
||||
```
|
||||
|
||||
### The Caveat
|
||||
|
||||
**Even when an operation graduates to code, the agent should be able to:**
|
||||
|
||||
1. Trigger the optimized operation itself
|
||||
2. Fall back to primitives for edge cases the optimized path doesn't handle
|
||||
|
||||
Graduation is about efficiency. **Parity still holds.** The agent doesn't lose capability when you optimize.
|
||||
</graduating_to_code>
|
||||
|
||||
<decision_framework>
|
||||
## Decision Framework
|
||||
|
||||
### Should I Add a Domain Tool?
|
||||
|
||||
| Question | If Yes |
|
||||
|----------|--------|
|
||||
| Is the agent confused about what this concept means? | Add for vocabulary anchoring |
|
||||
| Does this operation need validation the agent shouldn't decide? | Add with guardrails |
|
||||
| Is this a common multi-step operation? | Add for efficiency |
|
||||
| Would changing behavior require code changes? | Keep as prompt instead |
|
||||
|
||||
### Should I Graduate to Code?
|
||||
|
||||
| Question | If Yes |
|
||||
|----------|--------|
|
||||
| Is this operation called very frequently? | Consider graduating |
|
||||
| Does latency matter significantly? | Consider graduating |
|
||||
| Are token costs problematic? | Consider graduating |
|
||||
| Do you need deterministic behavior? | Graduate to code |
|
||||
| Does the operation need complex state management? | Graduate to code |
|
||||
|
||||
### Should I Gate Access?
|
||||
|
||||
| Question | If Yes |
|
||||
|----------|--------|
|
||||
| Is there a security requirement? | Gate appropriately |
|
||||
| Must this operation maintain data integrity? | Gate appropriately |
|
||||
| Is there an audit/compliance requirement? | Gate appropriately |
|
||||
| Is it just "safer" with no specific risk? | Keep primitives available |
|
||||
</decision_framework>
|
||||
|
||||
<examples>
|
||||
## Examples
|
||||
|
||||
### Feedback Processing Evolution
|
||||
|
||||
**Stage 1: Primitives only**
|
||||
```typescript
|
||||
tools: [read_file, write_file, bash]
|
||||
prompt: "Store feedback in data/feedback.json, notify if important"
|
||||
// Agent figures out JSON structure, importance criteria, notification method
|
||||
```
|
||||
|
||||
**Stage 2: Domain tools for vocabulary**
|
||||
```typescript
|
||||
tools: [
|
||||
store_feedback, // Anchors "feedback" concept with proper structure
|
||||
send_notification, // Anchors "notify" with correct channels
|
||||
read_file, // Still available for edge cases
|
||||
write_file,
|
||||
]
|
||||
prompt: "Store feedback using store_feedback. Notify if importance >= 4."
|
||||
// Agent still decides importance, but vocabulary is anchored
|
||||
```
|
||||
|
||||
**Stage 3: Graduated hot path**
|
||||
```typescript
|
||||
tools: [
|
||||
process_feedback_batch, // Optimized for high-volume processing
|
||||
store_feedback, // For individual items
|
||||
send_notification,
|
||||
read_file,
|
||||
write_file,
|
||||
]
|
||||
// Batch processing is code, but agent can still use store_feedback for special cases
|
||||
```
|
||||
|
||||
### When NOT to Add Domain Tools
|
||||
|
||||
**Don't add a domain tool just to make things "cleaner":**
|
||||
```typescript
|
||||
// Unnecessary - agent can compose primitives
|
||||
tool("organize_files_by_date", ...) // Just use move_file + judgment
|
||||
|
||||
// Unnecessary - puts decision in wrong place
|
||||
tool("decide_file_importance", ...) // This is prompt territory
|
||||
```
|
||||
|
||||
**Don't add a domain tool if behavior might change:**
|
||||
```typescript
|
||||
// Bad - locked into code
|
||||
tool("generate_standard_report", ...) // What if report format evolves?
|
||||
|
||||
// Better - keep in prompt
|
||||
prompt: "Generate a report covering X, Y, Z. Format for readability."
|
||||
// Can adjust format by editing prompt
|
||||
```
|
||||
</examples>
|
||||
|
||||
<checklist>
|
||||
## Checklist: Primitives to Domain Tools
|
||||
|
||||
### Starting Out
|
||||
- [ ] Begin with pure primitives (read, write, list, bash)
|
||||
- [ ] Write behavior in prompts, not tool logic
|
||||
- [ ] Let patterns emerge from actual usage
|
||||
|
||||
### Adding Domain Tools
|
||||
- [ ] Clear reason: vocabulary anchoring, guardrails, or efficiency
|
||||
- [ ] Tool represents one conceptual action
|
||||
- [ ] Judgment stays in prompts, not tool code
|
||||
- [ ] Primitives remain available alongside domain tools
|
||||
|
||||
### Graduating to Code
|
||||
- [ ] Hot path identified (frequent, latency-sensitive, or expensive)
|
||||
- [ ] Optimized version doesn't remove agent capability
|
||||
- [ ] Fallback to primitives for edge cases still works
|
||||
|
||||
### Gating Decisions
|
||||
- [ ] Specific reason for each gate (security, integrity, audit)
|
||||
- [ ] Default is open access
|
||||
- [ ] Gates are conscious decisions, not defaults
|
||||
</checklist>
|
||||
@@ -1,10 +1,188 @@
|
||||
<overview>
|
||||
Mobile agent-native apps face unique challenges: background execution limits, system permissions, network constraints, and cost sensitivity. This guide covers patterns for building robust agent experiences on iOS and Android.
|
||||
Mobile is a first-class platform for agent-native apps. It has unique constraints and opportunities. This guide covers why mobile matters, iOS storage architecture, checkpoint/resume patterns, and cost-aware design.
|
||||
</overview>
|
||||
|
||||
<why_mobile>
|
||||
## Why Mobile Matters
|
||||
|
||||
Mobile devices offer unique advantages for agent-native apps:
|
||||
|
||||
### A File System
|
||||
Agents can work with files naturally, using the same primitives that work everywhere else. The filesystem is the universal interface.
|
||||
|
||||
### Rich Context
|
||||
A walled garden you get access to. Health data, location, photos, calendars—context that doesn't exist on desktop or web. This enables deeply personalized agent experiences.
|
||||
|
||||
### Local Apps
|
||||
Everyone has their own copy of the app. This opens opportunities that aren't fully realized yet: apps that modify themselves, fork themselves, evolve per-user. App Store policies constrain some of this today, but the foundation is there.
|
||||
|
||||
### Cross-Device Sync
|
||||
If you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
|
||||
|
||||
### The Challenge
|
||||
|
||||
**Agents are long-running. Mobile apps are not.**
|
||||
|
||||
An agent might need 30 seconds, 5 minutes, or an hour to complete a task. But iOS will background your app after seconds of inactivity, and may kill it entirely to reclaim memory. The user might switch apps, take a call, or lock their phone mid-task.
|
||||
|
||||
This means mobile agent apps need:
|
||||
- **Checkpointing** — Saving state so work isn't lost
|
||||
- **Resuming** — Picking up where you left off after interruption
|
||||
- **Background execution** — Using the limited time iOS gives you wisely
|
||||
- **On-device vs. cloud decisions** — What runs locally vs. what needs a server
|
||||
</why_mobile>
|
||||
|
||||
<ios_storage>
|
||||
## iOS Storage Architecture
|
||||
|
||||
> **Needs validation:** This is an approach that works well, but better solutions may exist.
|
||||
|
||||
For agent-native iOS apps, use iCloud Drive's Documents folder for your shared workspace. This gives you **free, automatic multi-device sync** without building a sync layer or running a server.
|
||||
|
||||
### Why iCloud Documents?
|
||||
|
||||
| Approach | Cost | Complexity | Offline | Multi-Device |
|
||||
|----------|------|------------|---------|--------------|
|
||||
| Custom backend + sync | $$$ | High | Manual | Yes |
|
||||
| CloudKit database | Free tier limits | Medium | Manual | Yes |
|
||||
| **iCloud Documents** | Free (user's storage) | Low | Automatic | Automatic |
|
||||
|
||||
iCloud Documents:
|
||||
- Uses user's existing iCloud storage (free 5GB, most users have more)
|
||||
- Automatic sync across all user's devices
|
||||
- Works offline, syncs when online
|
||||
- Files visible in Files.app for transparency
|
||||
- No server costs, no sync code to maintain
|
||||
|
||||
### Implementation: iCloud-First with Local Fallback
|
||||
|
||||
```swift
|
||||
// Get the iCloud Documents container
|
||||
func iCloudDocumentsURL() -> URL? {
|
||||
FileManager.default.url(forUbiquityContainerIdentifier: nil)?
|
||||
.appendingPathComponent("Documents")
|
||||
}
|
||||
|
||||
// Your shared workspace lives in iCloud
|
||||
class SharedWorkspace {
|
||||
let rootURL: URL
|
||||
|
||||
init() {
|
||||
// Use iCloud if available, fall back to local
|
||||
if let iCloudURL = iCloudDocumentsURL() {
|
||||
self.rootURL = iCloudURL
|
||||
} else {
|
||||
// Fallback to local Documents (user not signed into iCloud)
|
||||
self.rootURL = FileManager.default.urls(
|
||||
for: .documentDirectory,
|
||||
in: .userDomainMask
|
||||
).first!
|
||||
}
|
||||
}
|
||||
|
||||
// All file operations go through this root
|
||||
func researchPath(for bookId: String) -> URL {
|
||||
rootURL.appendingPathComponent("Research/\(bookId)")
|
||||
}
|
||||
|
||||
func journalPath() -> URL {
|
||||
rootURL.appendingPathComponent("Journal")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Directory Structure in iCloud
|
||||
|
||||
```
|
||||
iCloud Drive/
|
||||
└── YourApp/ # Your app's container
|
||||
└── Documents/ # Visible in Files.app
|
||||
├── Journal/
|
||||
│ ├── user/
|
||||
│ │ └── 2025-01-15.md # Syncs across devices
|
||||
│ └── agent/
|
||||
│ └── 2025-01-15.md # Agent observations sync too
|
||||
├── Research/
|
||||
│ └── {bookId}/
|
||||
│ ├── full_text.txt
|
||||
│ └── sources/
|
||||
├── Chats/
|
||||
│ └── {conversationId}.json
|
||||
└── context.md # Agent's accumulated knowledge
|
||||
```
|
||||
|
||||
### Handling iCloud File States
|
||||
|
||||
iCloud files may not be downloaded locally. Handle this:
|
||||
|
||||
```swift
|
||||
func readFile(at url: URL) throws -> String {
|
||||
// iCloud may create .icloud placeholder files
|
||||
if url.pathExtension == "icloud" {
|
||||
// Trigger download
|
||||
try FileManager.default.startDownloadingUbiquitousItem(at: url)
|
||||
throw FileNotYetAvailableError()
|
||||
}
|
||||
|
||||
return try String(contentsOf: url, encoding: .utf8)
|
||||
}
|
||||
|
||||
// For writes, use coordinated file access
|
||||
func writeFile(_ content: String, to url: URL) throws {
|
||||
let coordinator = NSFileCoordinator()
|
||||
var error: NSError?
|
||||
|
||||
coordinator.coordinate(
|
||||
writingItemAt: url,
|
||||
options: .forReplacing,
|
||||
error: &error
|
||||
) { newURL in
|
||||
try? content.write(to: newURL, atomically: true, encoding: .utf8)
|
||||
}
|
||||
|
||||
if let error = error { throw error }
|
||||
}
|
||||
```
|
||||
|
||||
### What iCloud Enables
|
||||
|
||||
1. **User starts experiment on iPhone** → Agent creates config file
|
||||
2. **User opens app on iPad** → Same experiment visible, no sync code needed
|
||||
3. **Agent logs observation on iPhone** → Syncs to iPad automatically
|
||||
4. **User edits journal on iPad** → iPhone sees the edit
|
||||
|
||||
### Entitlements Required
|
||||
|
||||
Add to your app's entitlements:
|
||||
|
||||
```xml
|
||||
<key>com.apple.developer.icloud-container-identifiers</key>
|
||||
<array>
|
||||
<string>iCloud.com.yourcompany.yourapp</string>
|
||||
</array>
|
||||
<key>com.apple.developer.icloud-services</key>
|
||||
<array>
|
||||
<string>CloudDocuments</string>
|
||||
</array>
|
||||
<key>com.apple.developer.ubiquity-container-identifiers</key>
|
||||
<array>
|
||||
<string>iCloud.com.yourcompany.yourapp</string>
|
||||
</array>
|
||||
```
|
||||
|
||||
### When NOT to Use iCloud Documents
|
||||
|
||||
- **Sensitive data** - Use Keychain or encrypted local storage instead
|
||||
- **High-frequency writes** - iCloud sync has latency; use local + periodic sync
|
||||
- **Large media files** - Consider CloudKit Assets or on-demand resources
|
||||
- **Shared between users** - iCloud Documents is single-user; use CloudKit for sharing
|
||||
</ios_storage>
|
||||
|
||||
<background_execution>
|
||||
## Background Execution & Resumption
|
||||
|
||||
> **Needs validation:** These patterns work but better solutions may exist.
|
||||
|
||||
Mobile apps can be suspended or terminated at any time. Agents must handle this gracefully.
|
||||
|
||||
### The Challenge
|
||||
@@ -623,13 +801,48 @@ class AgentOrchestrator {
|
||||
```
|
||||
</battery_awareness>
|
||||
|
||||
<on_device_vs_cloud>
|
||||
## On-Device vs. Cloud
|
||||
|
||||
Understanding what runs where in a mobile agent-native app:
|
||||
|
||||
| Component | On-Device | Cloud |
|
||||
|-----------|-----------|-------|
|
||||
| Orchestration | ✅ | |
|
||||
| Tool execution | ✅ (file ops, photo access, HealthKit) | |
|
||||
| LLM calls | | ✅ (Anthropic API) |
|
||||
| Checkpoints | ✅ (local files) | Optional via iCloud |
|
||||
| Long-running agents | Limited by iOS | Possible with server |
|
||||
|
||||
### Implications
|
||||
|
||||
**Network required for reasoning:**
|
||||
- The app needs network connectivity for LLM calls
|
||||
- Design tools to degrade gracefully when network is unavailable
|
||||
- Consider offline caching for common queries
|
||||
|
||||
**Data stays local:**
|
||||
- File operations happen on device
|
||||
- Sensitive data never leaves the device unless explicitly synced
|
||||
- Privacy is preserved by default
|
||||
|
||||
**Long-running agents:**
|
||||
For truly long-running agents (hours), consider a server-side orchestrator that can run indefinitely, with the mobile app as a viewer and input mechanism.
|
||||
</on_device_vs_cloud>
|
||||
|
||||
<checklist>
|
||||
## Mobile Agent-Native Checklist
|
||||
|
||||
**iOS Storage:**
|
||||
- [ ] iCloud Documents as primary storage (or conscious alternative)
|
||||
- [ ] Local Documents fallback when iCloud unavailable
|
||||
- [ ] Handle `.icloud` placeholder files (trigger download)
|
||||
- [ ] Use NSFileCoordinator for conflict-safe writes
|
||||
|
||||
**Background Execution:**
|
||||
- [ ] Checkpoint/resume implemented for all agent sessions
|
||||
- [ ] State machine for agent lifecycle (idle, running, backgrounded, etc.)
|
||||
- [ ] Background task extension for critical saves
|
||||
- [ ] Background task extension for critical saves (30 second window)
|
||||
- [ ] User-visible status for backgrounded agents
|
||||
|
||||
**Permissions:**
|
||||
|
||||
@@ -0,0 +1,443 @@
|
||||
<overview>
|
||||
Agent-native architecture has consequences for how products feel, not just how they're built. This document covers progressive disclosure of complexity, discovering latent demand through agent usage, and designing approval flows that match stakes and reversibility.
|
||||
</overview>
|
||||
|
||||
<progressive_disclosure>
|
||||
## Progressive Disclosure of Complexity
|
||||
|
||||
The best agent-native applications are simple to start but endlessly powerful.
|
||||
|
||||
### The Excel Analogy
|
||||
|
||||
Excel is the canonical example: you can use it for a grocery list, or you can build complex financial models. The same tool, radically different depths of use.
|
||||
|
||||
Claude Code has this quality: fix a typo, or refactor an entire codebase. The interface is the same—natural language—but the capability scales with the ask.
|
||||
|
||||
### The Pattern
|
||||
|
||||
Agent-native applications should aspire to this:
|
||||
|
||||
**Simple entry:** Basic requests work immediately with no learning curve
|
||||
```
|
||||
User: "Organize my downloads"
|
||||
Agent: [Does it immediately, no configuration needed]
|
||||
```
|
||||
|
||||
**Discoverable depth:** Users find they can do more as they explore
|
||||
```
|
||||
User: "Organize my downloads by project"
|
||||
Agent: [Adapts to preference]
|
||||
|
||||
User: "Every Monday, review last week's downloads"
|
||||
Agent: [Sets up recurring workflow]
|
||||
```
|
||||
|
||||
**No ceiling:** Power users can push the system in ways you didn't anticipate
|
||||
```
|
||||
User: "Cross-reference my downloads with my calendar and flag
|
||||
anything I downloaded during a meeting that I haven't
|
||||
followed up on"
|
||||
Agent: [Composes capabilities to accomplish this]
|
||||
```
|
||||
|
||||
### How This Emerges
|
||||
|
||||
This isn't something you design directly. It **emerges naturally from the architecture:**
|
||||
|
||||
1. When features are prompts and tools are composable...
|
||||
2. Users can start simple ("organize my downloads")...
|
||||
3. And gradually discover complexity ("every Monday, review last week's...")...
|
||||
4. Without you having to build each level explicitly
|
||||
|
||||
The agent meets users where they are.
|
||||
|
||||
### Design Implications
|
||||
|
||||
- **Don't force configuration upfront** - Let users start immediately
|
||||
- **Don't hide capabilities** - Make them discoverable through use
|
||||
- **Don't cap complexity** - If the agent can do it, let users ask for it
|
||||
- **Do provide hints** - Help users discover what's possible
|
||||
</progressive_disclosure>
|
||||
|
||||
<latent_demand_discovery>
|
||||
## Latent Demand Discovery
|
||||
|
||||
Traditional product development: imagine what users want, build it, see if you're right.
|
||||
|
||||
Agent-native product development: build a capable foundation, observe what users ask the agent to do, formalize the patterns that emerge.
|
||||
|
||||
### The Shift
|
||||
|
||||
**Traditional approach:**
|
||||
```
|
||||
1. Imagine features users might want
|
||||
2. Build them
|
||||
3. Ship
|
||||
4. Hope you guessed right
|
||||
5. If wrong, rebuild
|
||||
```
|
||||
|
||||
**Agent-native approach:**
|
||||
```
|
||||
1. Build capable foundation (atomic tools, parity)
|
||||
2. Ship
|
||||
3. Users ask agent for things
|
||||
4. Observe what they're asking for
|
||||
5. Patterns emerge
|
||||
6. Formalize patterns into domain tools or prompts
|
||||
7. Repeat
|
||||
```
|
||||
|
||||
### The Flywheel
|
||||
|
||||
```
|
||||
Build with atomic tools and parity
|
||||
↓
|
||||
Users ask for things you didn't anticipate
|
||||
↓
|
||||
Agent composes tools to accomplish them
|
||||
(or fails, revealing a capability gap)
|
||||
↓
|
||||
You observe patterns in what's being requested
|
||||
↓
|
||||
Add domain tools or prompts to optimize common patterns
|
||||
↓
|
||||
(Repeat)
|
||||
```
|
||||
|
||||
### What You Learn
|
||||
|
||||
**When users ask and the agent succeeds:**
|
||||
- This is a real need
|
||||
- Your architecture supports it
|
||||
- Consider optimizing with a domain tool if it's common
|
||||
|
||||
**When users ask and the agent fails:**
|
||||
- This is a real need
|
||||
- You have a capability gap
|
||||
- Fix the gap: add tool, fix parity, improve context
|
||||
|
||||
**When users don't ask for something:**
|
||||
- Maybe they don't need it
|
||||
- Or maybe they don't know it's possible (capability hiding)
|
||||
|
||||
### Implementation
|
||||
|
||||
**Log agent requests:**
|
||||
```typescript
|
||||
async function handleAgentRequest(request: string) {
|
||||
// Log what users are asking for
|
||||
await analytics.log({
|
||||
type: 'agent_request',
|
||||
request: request,
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
|
||||
// Process request...
|
||||
}
|
||||
```
|
||||
|
||||
**Track success/failure:**
|
||||
```typescript
|
||||
async function completeAgentSession(session: AgentSession) {
|
||||
await analytics.log({
|
||||
type: 'agent_session',
|
||||
request: session.initialRequest,
|
||||
succeeded: session.status === 'completed',
|
||||
toolsUsed: session.toolCalls.map(t => t.name),
|
||||
iterations: session.iterationCount,
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Review patterns:**
|
||||
- What are users asking for most?
|
||||
- What's failing? Why?
|
||||
- What would benefit from a domain tool?
|
||||
- What needs better context injection?
|
||||
|
||||
### Example: Discovering "Weekly Review"
|
||||
|
||||
```
|
||||
Week 1: Users start asking "summarize my activity this week"
|
||||
Agent: Composes list_files + read_file, works but slow
|
||||
|
||||
Week 2: More users asking similar things
|
||||
Pattern emerges: weekly review is common
|
||||
|
||||
Week 3: Add prompt section for weekly review
|
||||
Faster, more consistent, still flexible
|
||||
|
||||
Week 4: If still common and performance matters
|
||||
Add domain tool: generate_weekly_summary
|
||||
```
|
||||
|
||||
You didn't have to guess that weekly review would be popular. You discovered it.
|
||||
</latent_demand_discovery>
|
||||
|
||||
<approval_and_agency>
|
||||
## Approval and User Agency
|
||||
|
||||
When agents take unsolicited actions—doing things on their own rather than responding to explicit requests—you need to decide how much autonomy to grant.
|
||||
|
||||
> **Note:** This framework applies to unsolicited agent actions. If the user explicitly asks the agent to do something ("send that email"), that's already approval—the agent just does it.
|
||||
|
||||
### The Stakes/Reversibility Matrix
|
||||
|
||||
Consider two dimensions:
|
||||
- **Stakes:** How much does it matter if this goes wrong?
|
||||
- **Reversibility:** How easy is it to undo?
|
||||
|
||||
| Stakes | Reversibility | Pattern | Example |
|
||||
|--------|---------------|---------|---------|
|
||||
| Low | Easy | **Auto-apply** | Organizing files |
|
||||
| Low | Hard | **Quick confirm** | Publishing to a private feed |
|
||||
| High | Easy | **Suggest + apply** | Code changes with undo |
|
||||
| High | Hard | **Explicit approval** | Sending emails, payments |
|
||||
|
||||
### Patterns in Detail
|
||||
|
||||
**Auto-apply (low stakes, easy reversal):**
|
||||
```
|
||||
Agent: [Organizes files into folders]
|
||||
Agent: "I organized your downloads into folders by type.
|
||||
You can undo with Cmd+Z or move them back."
|
||||
```
|
||||
User doesn't need to approve—it's easy to undo and doesn't matter much.
|
||||
|
||||
**Quick confirm (low stakes, hard reversal):**
|
||||
```
|
||||
Agent: "I've drafted a post about your reading insights.
|
||||
Publish to your feed?"
|
||||
[Publish] [Edit first] [Cancel]
|
||||
```
|
||||
One-tap confirm because stakes are low, but it's hard to un-publish.
|
||||
|
||||
**Suggest + apply (high stakes, easy reversal):**
|
||||
```
|
||||
Agent: "I recommend these code changes to fix the bug:
|
||||
[Shows diff]
|
||||
Apply? Changes can be reverted with git."
|
||||
[Apply] [Modify] [Cancel]
|
||||
```
|
||||
Shows what will happen, makes reversal clear.
|
||||
|
||||
**Explicit approval (high stakes, hard reversal):**
|
||||
```
|
||||
Agent: "I've drafted this email to your team about the deadline change:
|
||||
[Shows full email]
|
||||
This will send immediately and cannot be unsent.
|
||||
Type 'send' to confirm."
|
||||
```
|
||||
Requires explicit action, makes consequences clear.
|
||||
|
||||
### Implementation
|
||||
|
||||
```swift
|
||||
enum ApprovalLevel {
|
||||
case autoApply // Just do it
|
||||
case quickConfirm // One-tap approval
|
||||
case suggestApply // Show preview, ask to apply
|
||||
case explicitApproval // Require explicit confirmation
|
||||
}
|
||||
|
||||
func approvalLevelFor(action: AgentAction) -> ApprovalLevel {
|
||||
let stakes = assessStakes(action)
|
||||
let reversibility = assessReversibility(action)
|
||||
|
||||
switch (stakes, reversibility) {
|
||||
case (.low, .easy): return .autoApply
|
||||
case (.low, .hard): return .quickConfirm
|
||||
case (.high, .easy): return .suggestApply
|
||||
case (.high, .hard): return .explicitApproval
|
||||
}
|
||||
}
|
||||
|
||||
func assessStakes(_ action: AgentAction) -> Stakes {
|
||||
switch action {
|
||||
case .organizeFiles: return .low
|
||||
case .publishToFeed: return .low
|
||||
case .modifyCode: return .high
|
||||
case .sendEmail: return .high
|
||||
case .makePayment: return .high
|
||||
}
|
||||
}
|
||||
|
||||
func assessReversibility(_ action: AgentAction) -> Reversibility {
|
||||
switch action {
|
||||
case .organizeFiles: return .easy // Can move back
|
||||
case .publishToFeed: return .hard // People might see it
|
||||
case .modifyCode: return .easy // Git revert
|
||||
case .sendEmail: return .hard // Can't unsend
|
||||
case .makePayment: return .hard // Money moved
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Self-Modification Considerations
|
||||
|
||||
When agents can modify their own behavior—changing prompts, updating preferences, adjusting workflows—the goals are:
|
||||
|
||||
1. **Visibility:** User can see what changed
|
||||
2. **Understanding:** User understands the effects
|
||||
3. **Rollback:** User can undo changes
|
||||
|
||||
Approval flows are one way to achieve this. Audit logs with easy rollback could be another. **The principle is: make it legible.**
|
||||
|
||||
```swift
|
||||
// When agent modifies its own prompt
|
||||
func agentSelfModify(change: PromptChange) async {
|
||||
// Log the change
|
||||
await auditLog.record(change)
|
||||
|
||||
// Create checkpoint for rollback
|
||||
await createCheckpoint(currentState)
|
||||
|
||||
// Notify user (could be async/batched)
|
||||
await notifyUser("I've adjusted my approach: \(change.summary)")
|
||||
|
||||
// Apply change
|
||||
await applyChange(change)
|
||||
}
|
||||
```
|
||||
</approval_and_agency>
|
||||
|
||||
<capability_visibility>
|
||||
## Capability Visibility
|
||||
|
||||
Users need to discover what the agent can do. Hidden capabilities lead to underutilization.
|
||||
|
||||
### The Problem
|
||||
|
||||
```
|
||||
User: "Help me with my reading"
|
||||
Agent: "What would you like help with?"
|
||||
// Agent doesn't mention it can publish to feed, research books,
|
||||
// generate introductions, analyze themes...
|
||||
```
|
||||
|
||||
The agent can do these things, but the user doesn't know.
|
||||
|
||||
### Solutions
|
||||
|
||||
**Onboarding hints:**
|
||||
```
|
||||
Agent: "I can help you with your reading in several ways:
|
||||
- Research any book (web search + save findings)
|
||||
- Generate personalized introductions
|
||||
- Publish insights to your reading feed
|
||||
- Analyze themes across your library
|
||||
What interests you?"
|
||||
```
|
||||
|
||||
**Contextual suggestions:**
|
||||
```
|
||||
User: "I just finished reading 1984"
|
||||
Agent: "Great choice! Would you like me to:
|
||||
- Research historical context?
|
||||
- Compare it to other books in your library?
|
||||
- Publish an insight about it to your feed?"
|
||||
```
|
||||
|
||||
**Progressive revelation:**
|
||||
```
|
||||
// After user uses basic features
|
||||
Agent: "By the way, you can also ask me to set up
|
||||
recurring tasks, like 'every Monday, review my
|
||||
reading progress.' Just let me know!"
|
||||
```
|
||||
|
||||
### Balance
|
||||
|
||||
- **Don't overwhelm** with all capabilities upfront
|
||||
- **Do reveal** capabilities naturally through use
|
||||
- **Don't assume** users will discover things on their own
|
||||
- **Do make** capabilities visible when relevant
|
||||
</capability_visibility>
|
||||
|
||||
<designing_for_trust>
|
||||
## Designing for Trust
|
||||
|
||||
Agent-native apps require trust. Users are giving an AI significant capability. Build trust through:
|
||||
|
||||
### Transparency
|
||||
|
||||
- Show what the agent is doing (tool calls, progress)
|
||||
- Explain reasoning when it matters
|
||||
- Make all agent work inspectable (files, logs)
|
||||
|
||||
### Predictability
|
||||
|
||||
- Consistent behavior for similar requests
|
||||
- Clear patterns for when approval is needed
|
||||
- No surprises in what the agent can access
|
||||
|
||||
### Reversibility
|
||||
|
||||
- Easy undo for agent actions
|
||||
- Checkpoints before significant changes
|
||||
- Clear rollback paths
|
||||
|
||||
### Control
|
||||
|
||||
- User can stop agent at any time
|
||||
- User can adjust agent behavior (prompts, preferences)
|
||||
- User can restrict capabilities if desired
|
||||
|
||||
### Implementation
|
||||
|
||||
```swift
|
||||
struct AgentTransparency {
|
||||
// Show what's happening
|
||||
func onToolCall(_ tool: ToolCall) {
|
||||
showInUI("Using \(tool.name)...")
|
||||
}
|
||||
|
||||
// Explain reasoning
|
||||
func onDecision(_ decision: AgentDecision) {
|
||||
if decision.needsExplanation {
|
||||
showInUI("I chose this because: \(decision.reasoning)")
|
||||
}
|
||||
}
|
||||
|
||||
// Make work inspectable
|
||||
func onOutput(_ output: AgentOutput) {
|
||||
// All output is in files user can see
|
||||
// Or in visible UI state
|
||||
}
|
||||
}
|
||||
```
|
||||
</designing_for_trust>
|
||||
|
||||
<checklist>
|
||||
## Product Design Checklist
|
||||
|
||||
### Progressive Disclosure
|
||||
- [ ] Basic requests work immediately (no config)
|
||||
- [ ] Depth is discoverable through use
|
||||
- [ ] No artificial ceiling on complexity
|
||||
- [ ] Capability hints provided
|
||||
|
||||
### Latent Demand Discovery
|
||||
- [ ] Agent requests are logged
|
||||
- [ ] Success/failure is tracked
|
||||
- [ ] Patterns are reviewed regularly
|
||||
- [ ] Common patterns formalized into tools/prompts
|
||||
|
||||
### Approval & Agency
|
||||
- [ ] Stakes assessed for each action type
|
||||
- [ ] Reversibility assessed for each action type
|
||||
- [ ] Approval pattern matches stakes/reversibility
|
||||
- [ ] Self-modification is legible (visible, understandable, reversible)
|
||||
|
||||
### Capability Visibility
|
||||
- [ ] Onboarding reveals key capabilities
|
||||
- [ ] Contextual suggestions provided
|
||||
- [ ] Users aren't expected to guess what's possible
|
||||
|
||||
### Trust
|
||||
- [ ] Agent actions are transparent
|
||||
- [ ] Behavior is predictable
|
||||
- [ ] Actions are reversible
|
||||
- [ ] User has control
|
||||
</checklist>
|
||||
Reference in New Issue
Block a user