[2.23.0] Major update to agent-native-architecture skill (#70)

Align skill with canonical Agent-Native Architecture document:

## Core Changes
- Restructure SKILL.md with 5 named principles from canonical:
  - Parity: Agent can do whatever user can do
  - Granularity: Prefer atomic primitives
  - Composability: Features are prompts
  - Emergent Capability: Handle unanticipated requests
  - Improvement Over Time: Context accumulation

- Add "The test" for each principle
- Add "Why Now" section (Claude Code origin story)
- Update terminology from "prompt-native" to "agent-native"
- Add "The Ultimate Test" to success criteria

## New Reference Files
- files-universal-interface.md: Why files, organization patterns, context.md pattern, conflict model
- from-primitives-to-domain-tools.md: When to add domain tools, graduating to code
- agent-execution-patterns.md: Completion signals, partial completion, context limits
- product-implications.md: Progressive disclosure, latent demand discovery, approval matrix

## Updated Reference Files
- mobile-patterns.md: Add iOS storage architecture (iCloud-first), "needs validation" callouts, on-device vs cloud section
- architecture-patterns.md: Update overview to reference 5 principles and cross-link new files

## Anti-Patterns
- Add missing anti-patterns: agent as router, build-then-add-agent, request/response thinking, defensive tool design, happy path in code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Dan Shipper
2026-01-07 11:50:58 -05:00
committed by GitHub
parent be30002bbe
commit 68aa93678c
7 changed files with 2098 additions and 218 deletions

View File

@@ -0,0 +1,467 @@
<overview>
Agent execution patterns for building robust agent loops. This covers how agents signal completion, track partial progress for resume, select appropriate model tiers, and handle context limits.
</overview>
<completion_signals>
## Completion Signals
Agents need an explicit way to say "I'm done."
### Anti-Pattern: Heuristic Detection
Detecting completion through heuristics is fragile:
- Consecutive iterations without tool calls
- Checking for expected output files
- Tracking "no progress" states
- Time-based timeouts
These break in edge cases and create unpredictable behavior.
### Pattern: Explicit Completion Tool
Provide a `complete_task` tool that:
- Takes a summary of what was accomplished
- Returns a signal that stops the loop
- Works identically across all agent types
```typescript
tool("complete_task", {
summary: z.string().describe("Summary of what was accomplished"),
status: z.enum(["success", "partial", "blocked"]).optional(),
}, async ({ summary, status = "success" }) => {
return {
text: summary,
shouldContinue: false, // Key: signals loop should stop
};
});
```
### The ToolResult Pattern
Structure tool results to separate success from continuation:
```swift
struct ToolResult {
let success: Bool // Did tool succeed?
let output: String // What happened?
let shouldContinue: Bool // Should agent loop continue?
}
// Three common cases:
extension ToolResult {
static func success(_ output: String) -> ToolResult {
// Tool succeeded, keep going
ToolResult(success: true, output: output, shouldContinue: true)
}
static func error(_ message: String) -> ToolResult {
// Tool failed but recoverable, agent can try something else
ToolResult(success: false, output: message, shouldContinue: true)
}
static func complete(_ summary: String) -> ToolResult {
// Task done, stop the loop
ToolResult(success: true, output: summary, shouldContinue: false)
}
}
```
### Key Insight
**This is different from success/failure:**
- A tool can **succeed** AND signal **stop** (task complete)
- A tool can **fail** AND signal **continue** (recoverable error, try something else)
```typescript
// Examples:
read_file("/missing.txt")
// → { success: false, output: "File not found", shouldContinue: true }
// Agent can try a different file or ask for clarification
complete_task("Organized all downloads into folders")
// → { success: true, output: "...", shouldContinue: false }
// Agent is done
write_file("/output.md", content)
// → { success: true, output: "Wrote file", shouldContinue: true }
// Agent keeps working toward the goal
```
### System Prompt Guidance
Tell the agent when to complete:
```markdown
## Completing Tasks
When you've accomplished the user's request:
1. Verify your work (read back files you created, check results)
2. Call `complete_task` with a summary of what you did
3. Don't keep working after the goal is achieved
If you're blocked and can't proceed:
- Call `complete_task` with status "blocked" and explain why
- Don't loop forever trying the same thing
```
</completion_signals>
<partial_completion>
## Partial Completion
For multi-step tasks, track progress at the task level for resume capability.
### Task State Tracking
```swift
enum TaskStatus {
case pending // Not yet started
case inProgress // Currently working on
case completed // Finished successfully
case failed // Couldn't complete (with reason)
case skipped // Intentionally not done
}
struct AgentTask {
let id: String
let description: String
var status: TaskStatus
var notes: String? // Why it failed, what was done
}
struct AgentSession {
var tasks: [AgentTask]
var isComplete: Bool {
tasks.allSatisfy { $0.status == .completed || $0.status == .skipped }
}
var progress: (completed: Int, total: Int) {
let done = tasks.filter { $0.status == .completed }.count
return (done, tasks.count)
}
}
```
### UI Progress Display
Show users what's happening:
```
Progress: 3/5 tasks complete (60%)
✅ [1] Find source materials
✅ [2] Download full text
✅ [3] Extract key passages
❌ [4] Generate summary - Error: context limit exceeded
⏳ [5] Create outline - Pending
```
### Partial Completion Scenarios
**Agent hits max iterations before finishing:**
- Some tasks completed, some pending
- Checkpoint saved with current state
- Resume continues from where it left off, not from beginning
**Agent fails on one task:**
- Task marked `.failed` with error in notes
- Other tasks may continue (agent decides)
- Orchestrator doesn't automatically abort entire session
**Network error mid-task:**
- Current iteration throws
- Session marked `.failed`
- Checkpoint preserves messages up to that point
- Resume possible from checkpoint
### Checkpoint Structure
```swift
struct AgentCheckpoint: Codable {
let sessionId: String
let agentType: String
let messages: [Message] // Full conversation history
let iterationCount: Int
let tasks: [AgentTask] // Task state
let customState: [String: Any] // Agent-specific state
let timestamp: Date
var isValid: Bool {
// Checkpoints expire (default 1 hour)
Date().timeIntervalSince(timestamp) < 3600
}
}
```
### Resume Flow
1. On app launch, scan for valid checkpoints
2. Show user: "You have an incomplete session. Resume?"
3. On resume:
- Restore messages to conversation
- Restore task states
- Continue agent loop from where it left off
4. On dismiss:
- Delete checkpoint
- Start fresh if user tries again
</partial_completion>
<model_tier_selection>
## Model Tier Selection
Different agents need different intelligence levels. Use the cheapest model that achieves the outcome.
### Tier Guidelines
| Agent Type | Recommended Tier | Reasoning |
|------------|-----------------|-----------|
| Chat/Conversation | Balanced (Sonnet) | Fast responses, good reasoning |
| Research | Balanced (Sonnet) | Tool loops, not ultra-complex synthesis |
| Content Generation | Balanced (Sonnet) | Creative but not synthesis-heavy |
| Complex Analysis | Powerful (Opus) | Multi-document synthesis, nuanced judgment |
| Profile Generation | Powerful (Opus) | Photo analysis, complex pattern recognition |
| Quick Queries | Fast (Haiku) | Simple lookups, quick transformations |
| Simple Classification | Fast (Haiku) | High volume, simple decisions |
### Implementation
```swift
enum ModelTier {
case fast // claude-3-haiku: Quick, cheap, simple tasks
case balanced // claude-sonnet: Good balance for most tasks
case powerful // claude-opus: Complex reasoning, synthesis
var modelId: String {
switch self {
case .fast: return "claude-3-haiku-20240307"
case .balanced: return "claude-sonnet-4-20250514"
case .powerful: return "claude-opus-4-20250514"
}
}
}
struct AgentConfig {
let name: String
let modelTier: ModelTier
let tools: [AgentTool]
let systemPrompt: String
let maxIterations: Int
}
// Examples
let researchConfig = AgentConfig(
name: "research",
modelTier: .balanced,
tools: researchTools,
systemPrompt: researchPrompt,
maxIterations: 20
)
let quickLookupConfig = AgentConfig(
name: "lookup",
modelTier: .fast,
tools: [readLibrary],
systemPrompt: "Answer quick questions about the user's library.",
maxIterations: 3
)
```
### Cost Optimization Strategies
1. **Start with balanced, upgrade if quality insufficient**
2. **Use fast tier for tool-heavy loops** where each turn is simple
3. **Reserve powerful tier for synthesis tasks** (comparing multiple sources)
4. **Consider token limits per turn** to control costs
5. **Cache expensive operations** to avoid repeated calls
</model_tier_selection>
<context_limits>
## Context Limits
Agent sessions can extend indefinitely, but context windows don't. Design for bounded context from the start.
### The Problem
```
Turn 1: User asks question → 500 tokens
Turn 2: Agent reads file → 10,000 tokens
Turn 3: Agent reads another file → 10,000 tokens
Turn 4: Agent researches → 20,000 tokens
...
Turn 10: Context window exceeded
```
### Design Principles
**1. Tools should support iterative refinement**
Instead of all-or-nothing, design for summary → detail → full:
```typescript
// Good: Supports iterative refinement
tool("read_file", {
path: z.string(),
preview: z.boolean().default(true), // Return first 1000 chars by default
full: z.boolean().default(false), // Opt-in to full content
}, ...);
tool("search_files", {
query: z.string(),
summaryOnly: z.boolean().default(true), // Return matches, not full files
}, ...);
```
**2. Provide consolidation tools**
Give agents a way to consolidate learnings mid-session:
```typescript
tool("summarize_and_continue", {
keyPoints: z.array(z.string()),
nextSteps: z.array(z.string()),
}, async ({ keyPoints, nextSteps }) => {
// Store summary, potentially truncate earlier messages
await saveSessionSummary({ keyPoints, nextSteps });
return { text: "Summary saved. Continuing with focus on: " + nextSteps.join(", ") };
});
```
**3. Design for truncation**
Assume the orchestrator may truncate early messages. Important context should be:
- In the system prompt (always present)
- In files (can be re-read)
- Summarized in context.md
### Implementation Strategies
```swift
class AgentOrchestrator {
let maxContextTokens = 100_000
let targetContextTokens = 80_000 // Leave headroom
func shouldTruncate() -> Bool {
estimateTokens(messages) > targetContextTokens
}
func truncateIfNeeded() {
if shouldTruncate() {
// Keep system prompt + recent messages
// Summarize or drop older messages
messages = [systemMessage] + summarizeOldMessages() + recentMessages
}
}
}
```
### System Prompt Guidance
```markdown
## Managing Context
For long tasks, periodically consolidate what you've learned:
1. If you've gathered a lot of information, summarize key points
2. Save important findings to files (they persist beyond context)
3. Use `summarize_and_continue` if the conversation is getting long
Don't try to hold everything in memory. Write it down.
```
</context_limits>
<orchestrator_pattern>
## Unified Agent Orchestrator
One execution engine, many agent types. All agents use the same orchestrator with different configurations.
```swift
class AgentOrchestrator {
static let shared = AgentOrchestrator()
func run(config: AgentConfig, userMessage: String) async -> AgentResult {
var messages: [Message] = [
.system(config.systemPrompt),
.user(userMessage)
]
var iteration = 0
while iteration < config.maxIterations {
// Get agent response
let response = await claude.message(
model: config.modelTier.modelId,
messages: messages,
tools: config.tools
)
messages.append(.assistant(response))
// Process tool calls
for toolCall in response.toolCalls {
let result = await executeToolCall(toolCall, config: config)
messages.append(.toolResult(result))
// Check for completion signal
if !result.shouldContinue {
return AgentResult(
status: .completed,
output: result.output,
iterations: iteration + 1
)
}
}
// No tool calls = agent is responding, might be done
if response.toolCalls.isEmpty {
// Could be done, or waiting for user
break
}
iteration += 1
}
return AgentResult(
status: iteration >= config.maxIterations ? .maxIterations : .responded,
output: messages.last?.content ?? "",
iterations: iteration
)
}
}
```
### Benefits
- Consistent lifecycle management across all agent types
- Automatic checkpoint/resume (critical for mobile)
- Shared tool protocol
- Easy to add new agent types
- Centralized error handling and logging
</orchestrator_pattern>
<checklist>
## Agent Execution Checklist
### Completion Signals
- [ ] `complete_task` tool provided (explicit completion)
- [ ] No heuristic completion detection
- [ ] Tool results include `shouldContinue` flag
- [ ] System prompt guides when to complete
### Partial Completion
- [ ] Tasks tracked with status (pending, in_progress, completed, failed)
- [ ] Checkpoints saved for resume
- [ ] Progress visible to user
- [ ] Resume continues from where left off
### Model Tiers
- [ ] Tier selected based on task complexity
- [ ] Cost optimization considered
- [ ] Fast tier for simple operations
- [ ] Powerful tier reserved for synthesis
### Context Limits
- [ ] Tools support iterative refinement (preview vs full)
- [ ] Consolidation mechanism available
- [ ] Important context persisted to files
- [ ] Truncation strategy defined
</checklist>

View File

@@ -1,5 +1,12 @@
<overview>
Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
Architectural patterns for building agent-native systems. These patterns emerge from the five core principles: Parity, Granularity, Composability, Emergent Capability, and Improvement Over Time.
Features are outcomes achieved by agents operating in a loop, not functions you write. Tools are atomic primitives. The agent applies judgment; the prompt defines the outcome.
See also:
- [files-universal-interface.md](./files-universal-interface.md) for file organization and context.md patterns
- [agent-execution-patterns.md](./agent-execution-patterns.md) for completion signals and partial completion
- [product-implications.md](./product-implications.md) for progressive disclosure and approval patterns
</overview>
<pattern name="event-driven-agent">

View File

@@ -0,0 +1,301 @@
<overview>
Files are the universal interface for agent-native applications. Agents are naturally fluent with file operations—they already know how to read, write, and organize files. This document covers why files work so well, how to organize them, and the context.md pattern for accumulated knowledge.
</overview>
<why_files>
## Why Files
Agents are naturally good at files. Claude Code works because bash + filesystem is the most battle-tested agent interface. When building agent-native apps, lean into this.
### Agents Already Know How
You don't need to teach the agent your API—it already knows `cat`, `grep`, `mv`, `mkdir`. File operations are the primitives it's most fluent with.
### Files Are Inspectable
Users can see what the agent created, edit it, move it, delete it. No black box. Complete transparency into agent behavior.
### Files Are Portable
Export is trivial. Backup is trivial. Users own their data. No vendor lock-in, no complex migration paths.
### App State Stays in Sync
On mobile, if you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
### Directory Structure Is Information Architecture
The filesystem gives you hierarchy for free. `/projects/acme/notes/` is self-documenting in a way that `SELECT * FROM notes WHERE project_id = 123` isn't.
</why_files>
<file_organization>
## File Organization Patterns
> **Needs validation:** These conventions are one approach that's worked so far, not a prescription. Better solutions should be considered.
A general principle of agent-native design: **Design for what agents can reason about.** The best proxy for that is what would make sense to a human. If a human can look at your file structure and understand what's going on, an agent probably can too.
### Entity-Scoped Directories
Organize files around entities, not actors or file types:
```
{entity_type}/{entity_id}/
├── primary content
├── metadata
└── related materials
```
**Example:** `Research/books/{bookId}/` contains everything about one book—full text, notes, sources, agent logs.
### Naming Conventions
| File Type | Naming Pattern | Example |
|-----------|---------------|---------|
| Entity data | `{entity}.json` | `library.json`, `status.json` |
| Human-readable content | `{content_type}.md` | `introduction.md`, `profile.md` |
| Agent reasoning | `agent_log.md` | Per-entity agent history |
| Primary content | `full_text.txt` | Downloaded/extracted text |
| Multi-volume | `volume{N}.txt` | `volume1.txt`, `volume2.txt` |
| External sources | `{source_name}.md` | `wikipedia.md`, `sparknotes.md` |
| Checkpoints | `{sessionId}.checkpoint` | UUID-based |
| Configuration | `config.json` | Feature settings |
### Directory Naming
- **Entity-scoped:** `{entityType}/{entityId}/` (e.g., `Research/books/{bookId}/`)
- **Type-scoped:** `{type}/` (e.g., `AgentCheckpoints/`, `AgentLogs/`)
- **Convention:** Lowercase with underscores, not camelCase
### Ephemeral vs. Durable Separation
Separate agent working files from user's permanent data:
```
Documents/
├── AgentCheckpoints/ # Ephemeral (can delete)
│ └── {sessionId}.checkpoint
├── AgentLogs/ # Ephemeral (debugging)
│ └── {type}/{sessionId}.md
└── Research/ # Durable (user's work)
└── books/{bookId}/
```
### The Split: Markdown vs JSON
- **Markdown:** For content users might read or edit
- **JSON:** For structured data the app queries
</file_organization>
<context_md_pattern>
## The context.md Pattern
A file the agent reads at the start of each session and updates as it learns:
```markdown
# Context
## Who I Am
Reading assistant for the Every app.
## What I Know About This User
- Interested in military history and Russian literature
- Prefers concise analysis
- Currently reading War and Peace
## What Exists
- 12 notes in /notes
- 3 active projects
- User preferences at /preferences.md
## Recent Activity
- User created "Project kickoff" (2 hours ago)
- Analyzed passage about Austerlitz (yesterday)
## My Guidelines
- Don't spoil books they're reading
- Use their interests to personalize insights
## Current State
- No pending tasks
- Last sync: 10 minutes ago
```
### Benefits
- **Agent behavior evolves without code changes** - Update the context, behavior changes
- **Users can inspect and modify** - Complete transparency
- **Natural place for accumulated context** - Learnings persist across sessions
- **Portable across sessions** - Restart agent, knowledge preserved
### How It Works
1. Agent reads `context.md` at session start
2. Agent updates it when learning something important
3. System can also update it (recent activity, new resources)
4. Context persists across sessions
### What to Include
| Section | Purpose |
|---------|---------|
| Who I Am | Agent identity and role |
| What I Know About This User | Learned preferences, interests |
| What Exists | Available resources, data |
| Recent Activity | Context for continuity |
| My Guidelines | Learned rules and constraints |
| Current State | Session status, pending items |
</context_md_pattern>
<files_vs_database>
## Files vs. Database
> **Needs validation:** This framing is informed by mobile development. For web apps, the tradeoffs are different.
| Use files for... | Use database for... |
|------------------|---------------------|
| Content users should read/edit | High-volume structured data |
| Configuration that benefits from version control | Data that needs complex queries |
| Agent-generated content | Ephemeral state (sessions, caches) |
| Anything that benefits from transparency | Data with relationships |
| Large text content | Data that needs indexing |
**The principle:** Files for legibility, databases for structure. When in doubt, files—they're more transparent and users can always inspect them.
### When Files Work Best
- Scale is small (one user's library, not millions of records)
- Transparency is valued over query speed
- Cloud sync (iCloud, Dropbox) works well with files
### Hybrid Approach
Even if you need a database for performance, consider maintaining a file-based "source of truth" that the agent works with, synced to the database for the UI:
```
Files (agent workspace):
Research/book_123/introduction.md
Database (UI queries):
research_index: { bookId, path, title, createdAt }
```
</files_vs_database>
<conflict_model>
## Conflict Model
If agents and users write to the same files, you need a conflict model.
### Current Reality
Most implementations use **last-write-wins** via atomic writes:
```swift
try data.write(to: url, options: [.atomic])
```
This is simple but can lose changes.
### Options
| Strategy | Pros | Cons |
|----------|------|------|
| **Last write wins** | Simple | Changes can be lost |
| **Agent checks before writing** | Preserves user edits | More complexity |
| **Separate spaces** | No conflicts | Less collaboration |
| **Append-only logs** | Never overwrites | Files grow forever |
| **File locking** | Safe concurrent access | Complexity, can block |
### Recommended Approaches
**For files agents write frequently (logs, status):** Last-write-wins is fine. Conflicts are rare.
**For files users edit (profiles, notes):** Consider explicit handling:
- Agent checks modification time before overwriting
- Or keep agent output separate from user-editable content
- Or use append-only pattern
### iCloud Considerations
iCloud sync adds complexity. It creates `{filename} (conflict).md` files when sync conflicts occur. Monitor for these:
```swift
NotificationCenter.default.addObserver(
forName: .NSMetadataQueryDidUpdate,
...
)
```
### System Prompt Guidance
Tell the agent about the conflict model:
```markdown
## Working with User Content
When you create content, the user may edit it afterward. Always read
existing files before modifying them—the user may have made improvements
you should preserve.
If a file has been modified since you last wrote it, ask before overwriting.
```
</conflict_model>
<examples>
## Example: Reading App File Structure
```
Documents/
├── Library/
│ └── library.json # Book metadata
├── Research/
│ └── books/
│ └── {bookId}/
│ ├── full_text.txt # Downloaded content
│ ├── introduction.md # Agent-generated, user-editable
│ ├── notes.md # User notes
│ └── sources/
│ ├── wikipedia.md # Research gathered by agent
│ └── reviews.md
├── Chats/
│ └── {conversationId}.json # Chat history
├── Profile/
│ └── profile.md # User reading profile
└── context.md # Agent's accumulated knowledge
```
**How it works:**
1. User adds book → creates entry in `library.json`
2. Agent downloads text → saves to `Research/books/{id}/full_text.txt`
3. Agent researches → saves to `sources/`
4. Agent generates intro → saves to `introduction.md`
5. User edits intro → agent sees changes on next read
6. Agent updates `context.md` with learnings
</examples>
<checklist>
## Files as Universal Interface Checklist
### Organization
- [ ] Entity-scoped directories (`{type}/{id}/`)
- [ ] Consistent naming conventions
- [ ] Ephemeral vs durable separation
- [ ] Markdown for human content, JSON for structured data
### context.md
- [ ] Agent reads context at session start
- [ ] Agent updates context when learning
- [ ] Includes: identity, user knowledge, what exists, guidelines
- [ ] Persists across sessions
### Conflict Handling
- [ ] Conflict model defined (last-write-wins, check-before-write, etc.)
- [ ] Agent guidance in system prompt
- [ ] iCloud conflict monitoring (if applicable)
### Integration
- [ ] UI observes file changes (or shared service)
- [ ] Agent can read user edits
- [ ] User can inspect agent output
</checklist>

View File

@@ -0,0 +1,359 @@
<overview>
Start with pure primitives: bash, file operations, basic storage. This proves the architecture works and reveals what the agent actually needs. As patterns emerge, add domain-specific tools deliberately. This document covers when and how to evolve from primitives to domain tools, and when to graduate to optimized code.
</overview>
<start_with_primitives>
## Start with Pure Primitives
Begin every agent-native system with the most atomic tools possible:
- `read_file` / `write_file` / `list_files`
- `bash` (for everything else)
- Basic storage (`store_item` / `get_item`)
- HTTP requests (`fetch_url`)
**Why start here:**
1. **Proves the architecture** - If it works with primitives, your prompts are doing their job
2. **Reveals actual needs** - You'll discover what domain concepts matter
3. **Maximum flexibility** - Agent can do anything, not just what you anticipated
4. **Forces good prompts** - You can't lean on tool logic as a crutch
### Example: Starting Primitive
```typescript
// Start with just these
const tools = [
tool("read_file", { path: z.string() }, ...),
tool("write_file", { path: z.string(), content: z.string() }, ...),
tool("list_files", { path: z.string() }, ...),
tool("bash", { command: z.string() }, ...),
];
// Prompt handles the domain logic
const prompt = `
When processing feedback:
1. Read existing feedback from data/feedback.json
2. Add the new feedback with your assessment of importance (1-5)
3. Write the updated file
4. If importance >= 4, create a notification file in data/alerts/
`;
```
</start_with_primitives>
<when_to_add_domain_tools>
## When to Add Domain Tools
As patterns emerge, you'll want to add domain-specific tools. This is good—but do it deliberately.
### Vocabulary Anchoring
**Add a domain tool when:** The agent needs to understand domain concepts.
A `create_note` tool teaches the agent what "note" means in your system better than "write a file to the notes directory with this format."
```typescript
// Without domain tool - agent must infer structure
await agent.chat("Create a note about the meeting");
// Agent: writes to... notes/? documents/? what format?
// With domain tool - vocabulary is anchored
tool("create_note", {
title: z.string(),
content: z.string(),
tags: z.array(z.string()).optional(),
}, async ({ title, content, tags }) => {
// Tool enforces structure, agent understands "note"
});
```
### Guardrails
**Add a domain tool when:** Some operations need validation or constraints that shouldn't be left to agent judgment.
```typescript
// publish_to_feed might enforce format requirements or content policies
tool("publish_to_feed", {
bookId: z.string(),
content: z.string(),
headline: z.string().max(100), // Enforce headline length
}, async ({ bookId, content, headline }) => {
// Validate content meets guidelines
if (containsProhibitedContent(content)) {
return { text: "Content doesn't meet guidelines", isError: true };
}
// Enforce proper structure
await feedService.publish({ bookId, content, headline, publishedAt: new Date() });
});
```
### Efficiency
**Add a domain tool when:** Common operations would take many primitive calls.
```typescript
// Primitive approach: multiple calls
await agent.chat("Get book details");
// Agent: read library.json, parse, find book, read full_text.txt, read introduction.md...
// Domain tool: one call for common operation
tool("get_book_with_content", { bookId: z.string() }, async ({ bookId }) => {
const book = await library.getBook(bookId);
const fullText = await readFile(`Research/${bookId}/full_text.txt`);
const intro = await readFile(`Research/${bookId}/introduction.md`);
return { text: JSON.stringify({ book, fullText, intro }) };
});
```
</when_to_add_domain_tools>
<the_rule>
## The Rule for Domain Tools
**Domain tools should represent one conceptual action from the user's perspective.**
They can include mechanical validation, but **judgment about what to do or whether to do it belongs in the prompt**.
### Wrong: Bundles Judgment
```typescript
// WRONG - analyze_and_publish bundles judgment into the tool
tool("analyze_and_publish", async ({ input }) => {
const analysis = analyzeContent(input); // Tool decides how to analyze
const shouldPublish = analysis.score > 0.7; // Tool decides whether to publish
if (shouldPublish) {
await publish(analysis.summary); // Tool decides what to publish
}
});
```
### Right: One Action, Agent Decides
```typescript
// RIGHT - separate tools, agent decides
tool("analyze_content", { content: z.string() }, ...); // Returns analysis
tool("publish", { content: z.string() }, ...); // Publishes what agent provides
// Prompt: "Analyze the content. If it's high quality, publish a summary."
// Agent decides what "high quality" means and what summary to write.
```
### The Test
Ask: "Who is making the decision here?"
- If the answer is "the tool code" → you've encoded judgment, refactor
- If the answer is "the agent based on the prompt" → good
</the_rule>
<keep_primitives_available>
## Keep Primitives Available
**Domain tools are shortcuts, not gates.**
Unless there's a specific reason to restrict access (security, data integrity), the agent should still be able to use underlying primitives for edge cases.
```typescript
// Domain tool for common case
tool("create_note", { title, content }, ...);
// But primitives still available for edge cases
tool("read_file", { path }, ...);
tool("write_file", { path, content }, ...);
// Agent can use create_note normally, but for weird edge case:
// "Create a note in a non-standard location with custom metadata"
// → Agent uses write_file directly
```
### When to Gate
Gating (making domain tool the only way) is appropriate for:
- **Security:** User authentication, payment processing
- **Data integrity:** Operations that must maintain invariants
- **Audit requirements:** Actions that must be logged in specific ways
**The default is open.** When you do gate something, make it a conscious decision with a clear reason.
</keep_primitives_available>
<graduating_to_code>
## Graduating to Code
Some operations will need to move from agent-orchestrated to optimized code for performance or reliability.
### The Progression
```
Stage 1: Agent uses primitives in a loop
→ Flexible, proves the concept
→ Slow, potentially expensive
Stage 2: Add domain tools for common operations
→ Faster, still agent-orchestrated
→ Agent still decides when/whether to use
Stage 3: For hot paths, implement in optimized code
→ Fast, deterministic
→ Agent can still trigger, but execution is code
```
### Example Progression
**Stage 1: Pure primitives**
```markdown
Prompt: "When user asks for a summary, read all notes in /notes,
analyze them, and write a summary to /summaries/{date}.md"
Agent: Calls read_file 20 times, reasons about content, writes summary
Time: 30 seconds, 50k tokens
```
**Stage 2: Domain tool**
```typescript
tool("get_all_notes", {}, async () => {
const notes = await readAllNotesFromDirectory();
return { text: JSON.stringify(notes) };
});
// Agent still decides how to summarize, but retrieval is faster
// Time: 10 seconds, 30k tokens
```
**Stage 3: Optimized code**
```typescript
tool("generate_weekly_summary", {}, async () => {
// Entire operation in code for hot path
const notes = await getNotes({ since: oneWeekAgo });
const summary = await generateSummary(notes); // Could use cheaper model
await writeSummary(summary);
return { text: "Summary generated" };
});
// Agent just triggers it
// Time: 2 seconds, 5k tokens
```
### The Caveat
**Even when an operation graduates to code, the agent should be able to:**
1. Trigger the optimized operation itself
2. Fall back to primitives for edge cases the optimized path doesn't handle
Graduation is about efficiency. **Parity still holds.** The agent doesn't lose capability when you optimize.
</graduating_to_code>
<decision_framework>
## Decision Framework
### Should I Add a Domain Tool?
| Question | If Yes |
|----------|--------|
| Is the agent confused about what this concept means? | Add for vocabulary anchoring |
| Does this operation need validation the agent shouldn't decide? | Add with guardrails |
| Is this a common multi-step operation? | Add for efficiency |
| Would changing behavior require code changes? | Keep as prompt instead |
### Should I Graduate to Code?
| Question | If Yes |
|----------|--------|
| Is this operation called very frequently? | Consider graduating |
| Does latency matter significantly? | Consider graduating |
| Are token costs problematic? | Consider graduating |
| Do you need deterministic behavior? | Graduate to code |
| Does the operation need complex state management? | Graduate to code |
### Should I Gate Access?
| Question | If Yes |
|----------|--------|
| Is there a security requirement? | Gate appropriately |
| Must this operation maintain data integrity? | Gate appropriately |
| Is there an audit/compliance requirement? | Gate appropriately |
| Is it just "safer" with no specific risk? | Keep primitives available |
</decision_framework>
<examples>
## Examples
### Feedback Processing Evolution
**Stage 1: Primitives only**
```typescript
tools: [read_file, write_file, bash]
prompt: "Store feedback in data/feedback.json, notify if important"
// Agent figures out JSON structure, importance criteria, notification method
```
**Stage 2: Domain tools for vocabulary**
```typescript
tools: [
store_feedback, // Anchors "feedback" concept with proper structure
send_notification, // Anchors "notify" with correct channels
read_file, // Still available for edge cases
write_file,
]
prompt: "Store feedback using store_feedback. Notify if importance >= 4."
// Agent still decides importance, but vocabulary is anchored
```
**Stage 3: Graduated hot path**
```typescript
tools: [
process_feedback_batch, // Optimized for high-volume processing
store_feedback, // For individual items
send_notification,
read_file,
write_file,
]
// Batch processing is code, but agent can still use store_feedback for special cases
```
### When NOT to Add Domain Tools
**Don't add a domain tool just to make things "cleaner":**
```typescript
// Unnecessary - agent can compose primitives
tool("organize_files_by_date", ...) // Just use move_file + judgment
// Unnecessary - puts decision in wrong place
tool("decide_file_importance", ...) // This is prompt territory
```
**Don't add a domain tool if behavior might change:**
```typescript
// Bad - locked into code
tool("generate_standard_report", ...) // What if report format evolves?
// Better - keep in prompt
prompt: "Generate a report covering X, Y, Z. Format for readability."
// Can adjust format by editing prompt
```
</examples>
<checklist>
## Checklist: Primitives to Domain Tools
### Starting Out
- [ ] Begin with pure primitives (read, write, list, bash)
- [ ] Write behavior in prompts, not tool logic
- [ ] Let patterns emerge from actual usage
### Adding Domain Tools
- [ ] Clear reason: vocabulary anchoring, guardrails, or efficiency
- [ ] Tool represents one conceptual action
- [ ] Judgment stays in prompts, not tool code
- [ ] Primitives remain available alongside domain tools
### Graduating to Code
- [ ] Hot path identified (frequent, latency-sensitive, or expensive)
- [ ] Optimized version doesn't remove agent capability
- [ ] Fallback to primitives for edge cases still works
### Gating Decisions
- [ ] Specific reason for each gate (security, integrity, audit)
- [ ] Default is open access
- [ ] Gates are conscious decisions, not defaults
</checklist>

View File

@@ -1,10 +1,188 @@
<overview>
Mobile agent-native apps face unique challenges: background execution limits, system permissions, network constraints, and cost sensitivity. This guide covers patterns for building robust agent experiences on iOS and Android.
Mobile is a first-class platform for agent-native apps. It has unique constraints and opportunities. This guide covers why mobile matters, iOS storage architecture, checkpoint/resume patterns, and cost-aware design.
</overview>
<why_mobile>
## Why Mobile Matters
Mobile devices offer unique advantages for agent-native apps:
### A File System
Agents can work with files naturally, using the same primitives that work everywhere else. The filesystem is the universal interface.
### Rich Context
A walled garden you get access to. Health data, location, photos, calendars—context that doesn't exist on desktop or web. This enables deeply personalized agent experiences.
### Local Apps
Everyone has their own copy of the app. This opens opportunities that aren't fully realized yet: apps that modify themselves, fork themselves, evolve per-user. App Store policies constrain some of this today, but the foundation is there.
### Cross-Device Sync
If you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
### The Challenge
**Agents are long-running. Mobile apps are not.**
An agent might need 30 seconds, 5 minutes, or an hour to complete a task. But iOS will background your app after seconds of inactivity, and may kill it entirely to reclaim memory. The user might switch apps, take a call, or lock their phone mid-task.
This means mobile agent apps need:
- **Checkpointing** — Saving state so work isn't lost
- **Resuming** — Picking up where you left off after interruption
- **Background execution** — Using the limited time iOS gives you wisely
- **On-device vs. cloud decisions** — What runs locally vs. what needs a server
</why_mobile>
<ios_storage>
## iOS Storage Architecture
> **Needs validation:** This is an approach that works well, but better solutions may exist.
For agent-native iOS apps, use iCloud Drive's Documents folder for your shared workspace. This gives you **free, automatic multi-device sync** without building a sync layer or running a server.
### Why iCloud Documents?
| Approach | Cost | Complexity | Offline | Multi-Device |
|----------|------|------------|---------|--------------|
| Custom backend + sync | $$$ | High | Manual | Yes |
| CloudKit database | Free tier limits | Medium | Manual | Yes |
| **iCloud Documents** | Free (user's storage) | Low | Automatic | Automatic |
iCloud Documents:
- Uses user's existing iCloud storage (free 5GB, most users have more)
- Automatic sync across all user's devices
- Works offline, syncs when online
- Files visible in Files.app for transparency
- No server costs, no sync code to maintain
### Implementation: iCloud-First with Local Fallback
```swift
// Get the iCloud Documents container
func iCloudDocumentsURL() -> URL? {
FileManager.default.url(forUbiquityContainerIdentifier: nil)?
.appendingPathComponent("Documents")
}
// Your shared workspace lives in iCloud
class SharedWorkspace {
let rootURL: URL
init() {
// Use iCloud if available, fall back to local
if let iCloudURL = iCloudDocumentsURL() {
self.rootURL = iCloudURL
} else {
// Fallback to local Documents (user not signed into iCloud)
self.rootURL = FileManager.default.urls(
for: .documentDirectory,
in: .userDomainMask
).first!
}
}
// All file operations go through this root
func researchPath(for bookId: String) -> URL {
rootURL.appendingPathComponent("Research/\(bookId)")
}
func journalPath() -> URL {
rootURL.appendingPathComponent("Journal")
}
}
```
### Directory Structure in iCloud
```
iCloud Drive/
└── YourApp/ # Your app's container
└── Documents/ # Visible in Files.app
├── Journal/
│ ├── user/
│ │ └── 2025-01-15.md # Syncs across devices
│ └── agent/
│ └── 2025-01-15.md # Agent observations sync too
├── Research/
│ └── {bookId}/
│ ├── full_text.txt
│ └── sources/
├── Chats/
│ └── {conversationId}.json
└── context.md # Agent's accumulated knowledge
```
### Handling iCloud File States
iCloud files may not be downloaded locally. Handle this:
```swift
func readFile(at url: URL) throws -> String {
// iCloud may create .icloud placeholder files
if url.pathExtension == "icloud" {
// Trigger download
try FileManager.default.startDownloadingUbiquitousItem(at: url)
throw FileNotYetAvailableError()
}
return try String(contentsOf: url, encoding: .utf8)
}
// For writes, use coordinated file access
func writeFile(_ content: String, to url: URL) throws {
let coordinator = NSFileCoordinator()
var error: NSError?
coordinator.coordinate(
writingItemAt: url,
options: .forReplacing,
error: &error
) { newURL in
try? content.write(to: newURL, atomically: true, encoding: .utf8)
}
if let error = error { throw error }
}
```
### What iCloud Enables
1. **User starts experiment on iPhone** → Agent creates config file
2. **User opens app on iPad** → Same experiment visible, no sync code needed
3. **Agent logs observation on iPhone** → Syncs to iPad automatically
4. **User edits journal on iPad** → iPhone sees the edit
### Entitlements Required
Add to your app's entitlements:
```xml
<key>com.apple.developer.icloud-container-identifiers</key>
<array>
<string>iCloud.com.yourcompany.yourapp</string>
</array>
<key>com.apple.developer.icloud-services</key>
<array>
<string>CloudDocuments</string>
</array>
<key>com.apple.developer.ubiquity-container-identifiers</key>
<array>
<string>iCloud.com.yourcompany.yourapp</string>
</array>
```
### When NOT to Use iCloud Documents
- **Sensitive data** - Use Keychain or encrypted local storage instead
- **High-frequency writes** - iCloud sync has latency; use local + periodic sync
- **Large media files** - Consider CloudKit Assets or on-demand resources
- **Shared between users** - iCloud Documents is single-user; use CloudKit for sharing
</ios_storage>
<background_execution>
## Background Execution & Resumption
> **Needs validation:** These patterns work but better solutions may exist.
Mobile apps can be suspended or terminated at any time. Agents must handle this gracefully.
### The Challenge
@@ -623,13 +801,48 @@ class AgentOrchestrator {
```
</battery_awareness>
<on_device_vs_cloud>
## On-Device vs. Cloud
Understanding what runs where in a mobile agent-native app:
| Component | On-Device | Cloud |
|-----------|-----------|-------|
| Orchestration | ✅ | |
| Tool execution | ✅ (file ops, photo access, HealthKit) | |
| LLM calls | | ✅ (Anthropic API) |
| Checkpoints | ✅ (local files) | Optional via iCloud |
| Long-running agents | Limited by iOS | Possible with server |
### Implications
**Network required for reasoning:**
- The app needs network connectivity for LLM calls
- Design tools to degrade gracefully when network is unavailable
- Consider offline caching for common queries
**Data stays local:**
- File operations happen on device
- Sensitive data never leaves the device unless explicitly synced
- Privacy is preserved by default
**Long-running agents:**
For truly long-running agents (hours), consider a server-side orchestrator that can run indefinitely, with the mobile app as a viewer and input mechanism.
</on_device_vs_cloud>
<checklist>
## Mobile Agent-Native Checklist
**iOS Storage:**
- [ ] iCloud Documents as primary storage (or conscious alternative)
- [ ] Local Documents fallback when iCloud unavailable
- [ ] Handle `.icloud` placeholder files (trigger download)
- [ ] Use NSFileCoordinator for conflict-safe writes
**Background Execution:**
- [ ] Checkpoint/resume implemented for all agent sessions
- [ ] State machine for agent lifecycle (idle, running, backgrounded, etc.)
- [ ] Background task extension for critical saves
- [ ] Background task extension for critical saves (30 second window)
- [ ] User-visible status for backgrounded agents
**Permissions:**

View File

@@ -0,0 +1,443 @@
<overview>
Agent-native architecture has consequences for how products feel, not just how they're built. This document covers progressive disclosure of complexity, discovering latent demand through agent usage, and designing approval flows that match stakes and reversibility.
</overview>
<progressive_disclosure>
## Progressive Disclosure of Complexity
The best agent-native applications are simple to start but endlessly powerful.
### The Excel Analogy
Excel is the canonical example: you can use it for a grocery list, or you can build complex financial models. The same tool, radically different depths of use.
Claude Code has this quality: fix a typo, or refactor an entire codebase. The interface is the same—natural language—but the capability scales with the ask.
### The Pattern
Agent-native applications should aspire to this:
**Simple entry:** Basic requests work immediately with no learning curve
```
User: "Organize my downloads"
Agent: [Does it immediately, no configuration needed]
```
**Discoverable depth:** Users find they can do more as they explore
```
User: "Organize my downloads by project"
Agent: [Adapts to preference]
User: "Every Monday, review last week's downloads"
Agent: [Sets up recurring workflow]
```
**No ceiling:** Power users can push the system in ways you didn't anticipate
```
User: "Cross-reference my downloads with my calendar and flag
anything I downloaded during a meeting that I haven't
followed up on"
Agent: [Composes capabilities to accomplish this]
```
### How This Emerges
This isn't something you design directly. It **emerges naturally from the architecture:**
1. When features are prompts and tools are composable...
2. Users can start simple ("organize my downloads")...
3. And gradually discover complexity ("every Monday, review last week's...")...
4. Without you having to build each level explicitly
The agent meets users where they are.
### Design Implications
- **Don't force configuration upfront** - Let users start immediately
- **Don't hide capabilities** - Make them discoverable through use
- **Don't cap complexity** - If the agent can do it, let users ask for it
- **Do provide hints** - Help users discover what's possible
</progressive_disclosure>
<latent_demand_discovery>
## Latent Demand Discovery
Traditional product development: imagine what users want, build it, see if you're right.
Agent-native product development: build a capable foundation, observe what users ask the agent to do, formalize the patterns that emerge.
### The Shift
**Traditional approach:**
```
1. Imagine features users might want
2. Build them
3. Ship
4. Hope you guessed right
5. If wrong, rebuild
```
**Agent-native approach:**
```
1. Build capable foundation (atomic tools, parity)
2. Ship
3. Users ask agent for things
4. Observe what they're asking for
5. Patterns emerge
6. Formalize patterns into domain tools or prompts
7. Repeat
```
### The Flywheel
```
Build with atomic tools and parity
Users ask for things you didn't anticipate
Agent composes tools to accomplish them
(or fails, revealing a capability gap)
You observe patterns in what's being requested
Add domain tools or prompts to optimize common patterns
(Repeat)
```
### What You Learn
**When users ask and the agent succeeds:**
- This is a real need
- Your architecture supports it
- Consider optimizing with a domain tool if it's common
**When users ask and the agent fails:**
- This is a real need
- You have a capability gap
- Fix the gap: add tool, fix parity, improve context
**When users don't ask for something:**
- Maybe they don't need it
- Or maybe they don't know it's possible (capability hiding)
### Implementation
**Log agent requests:**
```typescript
async function handleAgentRequest(request: string) {
// Log what users are asking for
await analytics.log({
type: 'agent_request',
request: request,
timestamp: Date.now(),
});
// Process request...
}
```
**Track success/failure:**
```typescript
async function completeAgentSession(session: AgentSession) {
await analytics.log({
type: 'agent_session',
request: session.initialRequest,
succeeded: session.status === 'completed',
toolsUsed: session.toolCalls.map(t => t.name),
iterations: session.iterationCount,
});
}
```
**Review patterns:**
- What are users asking for most?
- What's failing? Why?
- What would benefit from a domain tool?
- What needs better context injection?
### Example: Discovering "Weekly Review"
```
Week 1: Users start asking "summarize my activity this week"
Agent: Composes list_files + read_file, works but slow
Week 2: More users asking similar things
Pattern emerges: weekly review is common
Week 3: Add prompt section for weekly review
Faster, more consistent, still flexible
Week 4: If still common and performance matters
Add domain tool: generate_weekly_summary
```
You didn't have to guess that weekly review would be popular. You discovered it.
</latent_demand_discovery>
<approval_and_agency>
## Approval and User Agency
When agents take unsolicited actions—doing things on their own rather than responding to explicit requests—you need to decide how much autonomy to grant.
> **Note:** This framework applies to unsolicited agent actions. If the user explicitly asks the agent to do something ("send that email"), that's already approval—the agent just does it.
### The Stakes/Reversibility Matrix
Consider two dimensions:
- **Stakes:** How much does it matter if this goes wrong?
- **Reversibility:** How easy is it to undo?
| Stakes | Reversibility | Pattern | Example |
|--------|---------------|---------|---------|
| Low | Easy | **Auto-apply** | Organizing files |
| Low | Hard | **Quick confirm** | Publishing to a private feed |
| High | Easy | **Suggest + apply** | Code changes with undo |
| High | Hard | **Explicit approval** | Sending emails, payments |
### Patterns in Detail
**Auto-apply (low stakes, easy reversal):**
```
Agent: [Organizes files into folders]
Agent: "I organized your downloads into folders by type.
You can undo with Cmd+Z or move them back."
```
User doesn't need to approve—it's easy to undo and doesn't matter much.
**Quick confirm (low stakes, hard reversal):**
```
Agent: "I've drafted a post about your reading insights.
Publish to your feed?"
[Publish] [Edit first] [Cancel]
```
One-tap confirm because stakes are low, but it's hard to un-publish.
**Suggest + apply (high stakes, easy reversal):**
```
Agent: "I recommend these code changes to fix the bug:
[Shows diff]
Apply? Changes can be reverted with git."
[Apply] [Modify] [Cancel]
```
Shows what will happen, makes reversal clear.
**Explicit approval (high stakes, hard reversal):**
```
Agent: "I've drafted this email to your team about the deadline change:
[Shows full email]
This will send immediately and cannot be unsent.
Type 'send' to confirm."
```
Requires explicit action, makes consequences clear.
### Implementation
```swift
enum ApprovalLevel {
case autoApply // Just do it
case quickConfirm // One-tap approval
case suggestApply // Show preview, ask to apply
case explicitApproval // Require explicit confirmation
}
func approvalLevelFor(action: AgentAction) -> ApprovalLevel {
let stakes = assessStakes(action)
let reversibility = assessReversibility(action)
switch (stakes, reversibility) {
case (.low, .easy): return .autoApply
case (.low, .hard): return .quickConfirm
case (.high, .easy): return .suggestApply
case (.high, .hard): return .explicitApproval
}
}
func assessStakes(_ action: AgentAction) -> Stakes {
switch action {
case .organizeFiles: return .low
case .publishToFeed: return .low
case .modifyCode: return .high
case .sendEmail: return .high
case .makePayment: return .high
}
}
func assessReversibility(_ action: AgentAction) -> Reversibility {
switch action {
case .organizeFiles: return .easy // Can move back
case .publishToFeed: return .hard // People might see it
case .modifyCode: return .easy // Git revert
case .sendEmail: return .hard // Can't unsend
case .makePayment: return .hard // Money moved
}
}
```
### Self-Modification Considerations
When agents can modify their own behavior—changing prompts, updating preferences, adjusting workflows—the goals are:
1. **Visibility:** User can see what changed
2. **Understanding:** User understands the effects
3. **Rollback:** User can undo changes
Approval flows are one way to achieve this. Audit logs with easy rollback could be another. **The principle is: make it legible.**
```swift
// When agent modifies its own prompt
func agentSelfModify(change: PromptChange) async {
// Log the change
await auditLog.record(change)
// Create checkpoint for rollback
await createCheckpoint(currentState)
// Notify user (could be async/batched)
await notifyUser("I've adjusted my approach: \(change.summary)")
// Apply change
await applyChange(change)
}
```
</approval_and_agency>
<capability_visibility>
## Capability Visibility
Users need to discover what the agent can do. Hidden capabilities lead to underutilization.
### The Problem
```
User: "Help me with my reading"
Agent: "What would you like help with?"
// Agent doesn't mention it can publish to feed, research books,
// generate introductions, analyze themes...
```
The agent can do these things, but the user doesn't know.
### Solutions
**Onboarding hints:**
```
Agent: "I can help you with your reading in several ways:
- Research any book (web search + save findings)
- Generate personalized introductions
- Publish insights to your reading feed
- Analyze themes across your library
What interests you?"
```
**Contextual suggestions:**
```
User: "I just finished reading 1984"
Agent: "Great choice! Would you like me to:
- Research historical context?
- Compare it to other books in your library?
- Publish an insight about it to your feed?"
```
**Progressive revelation:**
```
// After user uses basic features
Agent: "By the way, you can also ask me to set up
recurring tasks, like 'every Monday, review my
reading progress.' Just let me know!"
```
### Balance
- **Don't overwhelm** with all capabilities upfront
- **Do reveal** capabilities naturally through use
- **Don't assume** users will discover things on their own
- **Do make** capabilities visible when relevant
</capability_visibility>
<designing_for_trust>
## Designing for Trust
Agent-native apps require trust. Users are giving an AI significant capability. Build trust through:
### Transparency
- Show what the agent is doing (tool calls, progress)
- Explain reasoning when it matters
- Make all agent work inspectable (files, logs)
### Predictability
- Consistent behavior for similar requests
- Clear patterns for when approval is needed
- No surprises in what the agent can access
### Reversibility
- Easy undo for agent actions
- Checkpoints before significant changes
- Clear rollback paths
### Control
- User can stop agent at any time
- User can adjust agent behavior (prompts, preferences)
- User can restrict capabilities if desired
### Implementation
```swift
struct AgentTransparency {
// Show what's happening
func onToolCall(_ tool: ToolCall) {
showInUI("Using \(tool.name)...")
}
// Explain reasoning
func onDecision(_ decision: AgentDecision) {
if decision.needsExplanation {
showInUI("I chose this because: \(decision.reasoning)")
}
}
// Make work inspectable
func onOutput(_ output: AgentOutput) {
// All output is in files user can see
// Or in visible UI state
}
}
```
</designing_for_trust>
<checklist>
## Product Design Checklist
### Progressive Disclosure
- [ ] Basic requests work immediately (no config)
- [ ] Depth is discoverable through use
- [ ] No artificial ceiling on complexity
- [ ] Capability hints provided
### Latent Demand Discovery
- [ ] Agent requests are logged
- [ ] Success/failure is tracked
- [ ] Patterns are reviewed regularly
- [ ] Common patterns formalized into tools/prompts
### Approval & Agency
- [ ] Stakes assessed for each action type
- [ ] Reversibility assessed for each action type
- [ ] Approval pattern matches stakes/reversibility
- [ ] Self-modification is legible (visible, understandable, reversible)
### Capability Visibility
- [ ] Onboarding reveals key capabilities
- [ ] Contextual suggestions provided
- [ ] Users aren't expected to guess what's possible
### Trust
- [ ] Agent actions are transparent
- [ ] Behavior is predictable
- [ ] Actions are reversible
- [ ] User has control
</checklist>