[2.23.0] Major update to agent-native-architecture skill (#70)

Align skill with canonical Agent-Native Architecture document:

## Core Changes
- Restructure SKILL.md with 5 named principles from canonical:
  - Parity: Agent can do whatever user can do
  - Granularity: Prefer atomic primitives
  - Composability: Features are prompts
  - Emergent Capability: Handle unanticipated requests
  - Improvement Over Time: Context accumulation

- Add "The test" for each principle
- Add "Why Now" section (Claude Code origin story)
- Update terminology from "prompt-native" to "agent-native"
- Add "The Ultimate Test" to success criteria

## New Reference Files
- files-universal-interface.md: Why files, organization patterns, context.md pattern, conflict model
- from-primitives-to-domain-tools.md: When to add domain tools, graduating to code
- agent-execution-patterns.md: Completion signals, partial completion, context limits
- product-implications.md: Progressive disclosure, latent demand discovery, approval matrix

## Updated Reference Files
- mobile-patterns.md: Add iOS storage architecture (iCloud-first), "needs validation" callouts, on-device vs cloud section
- architecture-patterns.md: Update overview to reference 5 principles and cross-link new files

## Anti-Patterns
- Add missing anti-patterns: agent as router, build-then-add-agent, request/response thinking, defensive tool design, happy path in code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Dan Shipper
2026-01-07 11:50:58 -05:00
committed by GitHub
parent be30002bbe
commit 68aa93678c
7 changed files with 2098 additions and 218 deletions

View File

@@ -1,76 +1,174 @@
--- ---
name: agent-native-architecture name: agent-native-architecture
description: This skill should be used when building AI agents using prompt-native architecture where features are defined in prompts, not code. Use it when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy. description: Build applications where agents are first-class citizens. Use this skill when designing autonomous agents, creating MCP tools, implementing self-modifying systems, or building apps where features are outcomes achieved by agents operating in a loop.
--- ---
<essential_principles> <why_now>
## The Prompt-Native Philosophy ## Why Now
Agent native engineering inverts traditional software architecture. Instead of writing code that the agent executes, you define outcomes in prompts and let the agent figure out HOW to achieve them. Software agents work reliably now. Claude Code demonstrated that an LLM with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.
### The Foundational Principle The surprising discovery: **a really good coding agent is actually a really good general-purpose agent.** The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.
**Whatever the user can do, the agent can do. Many things the developer can do, the agent can do.** The Claude Code SDK makes this accessible. You can build applications where features aren't code you write—they're outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.
Don't artificially limit the agent. If a user could read files, write code, browse the web, deploy an app—the agent should be able to do those things too. The agent figures out HOW to achieve an outcome; it doesn't just call your pre-written functions. This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding.
</why_now>
### Features Are Prompts <core_principles>
## Core Principles
Each feature is a prompt that defines an outcome and gives the agent the tools it needs. The agent then figures out how to accomplish it. ### 1. Parity
**Traditional:** Feature = function in codebase that agent calls **Whatever the user can do through the UI, the agent should be able to achieve through tools.**
**Prompt-native:** Feature = prompt defining desired outcome + primitive tools
The agent doesn't execute your code. It uses primitives to achieve outcomes you describe. This is the foundational principle. Without it, nothing else matters.
### Tools Provide Capability, Not Behavior Imagine you build a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks the agent: "Create a note summarizing my meeting and tag it as urgent."
Tools should be primitives that enable capability. The prompt defines what to do with that capability. If you built UI for creating notes but no agent capability to do the same, the agent is stuck. It might apologize or ask clarifying questions, but it can't help—even though the action is trivial for a human using the interface.
**Wrong:** `generate_dashboard(data, layout, filters)` — agent executes your workflow **The fix:** Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do.
**Right:** `read_file`, `write_file`, `list_files` — agent figures out how to build a dashboard
Pure primitives are better, but domain primitives (like `store_feedback`) are OK if they don't encode logic—just storage/retrieval. This isn't about creating a 1:1 mapping of UI buttons to tools. It's about ensuring the agent can **achieve the same outcomes**. Sometimes that's a single tool (`create_note`). Sometimes it's composing primitives (`write_file` to a notes directory with proper formatting).
### The Development Lifecycle **The discipline:** When adding any UI capability, ask: can the agent achieve this outcome? If not, add the necessary tools or primitives.
1. **Start in the prompt** - New features begin as natural language defining outcomes A capability map helps:
2. **Iterate rapidly** - Change behavior by editing prose, not refactoring code
3. **Graduate when stable** - Harden to code when requirements stabilize AND speed/reliability matter
4. **Many features stay as prompts** - Not everything needs to become code
### Self-Modification (Advanced) | User Action | How Agent Achieves It |
|-------------|----------------------|
| Create a note | `write_file` to notes directory, or `create_note` tool |
| Tag a note as urgent | `update_file` metadata, or `tag_note` tool |
| Search notes | `search_files` or `search_notes` tool |
| Delete a note | `delete_file` or `delete_note` tool |
The advanced tier: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future. **The test:** Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?
When implementing: ---
- Approval gates for code changes
- Auto-commit before modifications (rollback capability)
- Health checks after changes
- Build verification before restart
### When NOT to Use This Approach ### 2. Granularity
- **High-frequency operations** - thousands of calls per second **Prefer atomic primitives. Features are outcomes achieved by an agent operating in a loop.**
- **Deterministic requirements** - exact same output every time
- **Cost-sensitive scenarios** - when API costs would be prohibitive A tool is a primitive capability: read a file, write a file, run a bash command, store a record, send a notification.
- **High security** - though this is overblown for most apps
</essential_principles> A **feature** is not a function you write. It's an outcome you describe in a prompt, achieved by an agent that has tools and operates in a loop until the outcome is reached.
**Less granular (limits the agent):**
```
Tool: classify_and_organize_files(files)
→ You wrote the decision logic
→ Agent executes your code
→ To change behavior, you refactor
```
**More granular (empowers the agent):**
```
Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "Organize the user's downloads folder. Analyze each file,
determine appropriate locations based on content and recency,
and move them there."
Agent: Operates in a loop—reads files, makes judgments, moves things,
checks results—until the folder is organized.
→ Agent makes the decisions
→ To change behavior, you edit the prompt
```
**The key shift:** The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It might encounter unexpected file types, adjust its approach, or ask clarifying questions. The loop continues until the outcome is achieved.
The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.
**The test:** To change how a feature behaves, do you edit prose or refactor code?
---
### 3. Composability
**With atomic tools and parity, you can create new features just by writing new prompts.**
This is the payoff of the first two principles. When your tools are atomic and the agent can do anything users can do, new features are just new prompts.
Want a "weekly review" feature that summarizes activity and suggests priorities? That's a prompt:
```
"Review files modified this week. Summarize key changes. Based on
incomplete items and approaching deadlines, suggest three priorities
for next week."
```
The agent uses `list_files`, `read_file`, and its judgment to accomplish this. You didn't write weekly-review code. You described an outcome, and the agent operates in a loop until it's achieved.
**This works for developers and users.** You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own. "When I say 'file this,' always move it to my Action folder and tag it urgent" becomes a user-level prompt that extends the application.
**The constraint:** This only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, or the agent can't access key capabilities, composition breaks down.
**The test:** Can you add a new feature by writing a new prompt section, without adding new code?
---
### 4. Emergent Capability
**The agent can accomplish things you didn't explicitly design for.**
When tools are atomic, parity is maintained, and prompts are composable, users will ask the agent for things you never anticipated. And often, the agent can figure it out.
*"Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled."*
You didn't build a "commitment tracker" feature. But if the agent can read notes, read tasks, and reason about them—operating in a loop until it has an answer—it can accomplish this.
**This reveals latent demand.** Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.
**The flywheel:**
1. Build with atomic tools and parity
2. Users ask for things you didn't anticipate
3. Agent composes tools to accomplish them (or fails, revealing a gap)
4. You observe patterns in what's being requested
5. Add domain tools or prompts to make common patterns efficient
6. Repeat
This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.
**The test:** Give the agent an open-ended request relevant to your domain. Can it figure out a reasonable approach, operating in a loop until it succeeds? If it just says "I don't have a feature for that," your architecture is too constrained.
---
### 5. Improvement Over Time
**Agent-native applications get better through accumulated context and prompt refinement.**
Unlike traditional software, agent-native applications can improve without shipping code:
**Accumulated context:** The agent can maintain state across sessions—what exists, what the user has done, what worked, what didn't. A `context.md` file the agent reads and updates is layer one. More sophisticated approaches involve structured memory and learned preferences.
**Prompt refinement at multiple levels:**
- **Developer level:** You ship updated prompts that change agent behavior for all users
- **User level:** Users customize prompts for their workflow
- **Agent level:** The agent modifies its own prompts based on feedback (advanced)
**Self-modification (advanced):** Agents that can edit their own prompts or even their own code. For production use cases, consider adding safety rails—approval gates, automatic checkpoints for rollback, health checks. This is where things are heading.
The improvement mechanisms are still being discovered. Context and prompt refinement are proven. Self-modification is emerging. What's clear: the architecture supports getting better in ways traditional software doesn't.
**The test:** Does the application work better after a month of use than on day one, even without code changes?
</core_principles>
<intake> <intake>
What aspect of agent native architecture do you need help with? ## What aspect of agent-native architecture do you need help with?
1. **Design architecture** - Plan a new prompt-native agent system 1. **Design architecture** - Plan a new agent-native system from scratch
2. **Create MCP tools** - Build primitive tools following the philosophy 2. **Files & workspace** - Use files as the universal interface, shared workspace patterns
3. **Write system prompts** - Define agent behavior in prompts 3. **Tool design** - Build primitive tools, dynamic capability discovery, CRUD completeness
4. **Self-modification** - Enable agents to safely evolve themselves 4. **Domain tools** - Know when to add domain tools vs stay with primitives
5. **Review/refactor** - Make existing code more prompt-native 5. **Execution patterns** - Completion signals, partial completion, context limits
6. **Context injection** - Inject runtime app state into agent prompts 6. **System prompts** - Define agent behavior in prompts, judgment criteria
7. **Action parity** - Ensure agents can do everything users can do 7. **Context injection** - Inject runtime app state into agent prompts
8. **Shared workspace** - Set up agents and users in the same data space 8. **Action parity** - Ensure agents can do everything users can do
9. **Testing** - Test agent-native apps for capability and parity 9. **Self-modification** - Enable agents to safely evolve themselves
10. **Mobile patterns** - Handle background execution, permissions, cost 10. **Product design** - Progressive disclosure, latent demand, approval patterns
11. **API integration** - Connect to external APIs (HealthKit, HomeKit, GraphQL) 11. **Mobile patterns** - iOS storage, background execution, checkpoint/resume
12. **Testing** - Test agent-native apps for capability and parity
13. **Refactoring** - Make existing code more agent-native
**Wait for response before proceeding.** **Wait for response before proceeding.**
</intake> </intake>
@@ -79,63 +177,77 @@ What aspect of agent native architecture do you need help with?
| Response | Action | | Response | Action |
|----------|--------| |----------|--------|
| 1, "design", "architecture", "plan" | Read [architecture-patterns.md](./references/architecture-patterns.md), then apply Architecture Checklist below | | 1, "design", "architecture", "plan" | Read [architecture-patterns.md](./references/architecture-patterns.md), then apply Architecture Checklist below |
| 2, "tool", "mcp", "primitive" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) | | 2, "files", "workspace", "filesystem" | Read [files-universal-interface.md](./references/files-universal-interface.md) and [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) |
| 3, "prompt", "system prompt", "behavior" | Read [system-prompt-design.md](./references/system-prompt-design.md) | | 3, "tool", "mcp", "primitive", "crud" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) |
| 4, "self-modify", "evolve", "git" | Read [self-modification.md](./references/self-modification.md) | | 4, "domain tool", "when to add" | Read [from-primitives-to-domain-tools.md](./references/from-primitives-to-domain-tools.md) |
| 5, "review", "refactor", "existing" | Read [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) | | 5, "execution", "completion", "loop" | Read [agent-execution-patterns.md](./references/agent-execution-patterns.md) |
| 6, "context", "inject", "runtime", "dynamic" | Read [dynamic-context-injection.md](./references/dynamic-context-injection.md) | | 6, "prompt", "system prompt", "behavior" | Read [system-prompt-design.md](./references/system-prompt-design.md) |
| 7, "parity", "ui action", "capability map" | Read [action-parity-discipline.md](./references/action-parity-discipline.md) | | 7, "context", "inject", "runtime", "dynamic" | Read [dynamic-context-injection.md](./references/dynamic-context-injection.md) |
| 8, "workspace", "shared", "files", "filesystem" | Read [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) | | 8, "parity", "ui action", "capability map" | Read [action-parity-discipline.md](./references/action-parity-discipline.md) |
| 9, "test", "testing", "verify", "validate" | Read [agent-native-testing.md](./references/agent-native-testing.md) | | 9, "self-modify", "evolve", "git" | Read [self-modification.md](./references/self-modification.md) |
| 10, "mobile", "ios", "android", "background" | Read [mobile-patterns.md](./references/mobile-patterns.md) | | 10, "product", "progressive", "approval", "latent demand" | Read [product-implications.md](./references/product-implications.md) |
| 11, "api", "healthkit", "homekit", "graphql", "external" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) (Dynamic Capability Discovery section) | | 11, "mobile", "ios", "android", "background", "checkpoint" | Read [mobile-patterns.md](./references/mobile-patterns.md) |
| 12, "test", "testing", "verify", "validate" | Read [agent-native-testing.md](./references/agent-native-testing.md) |
| 13, "review", "refactor", "existing" | Read [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) |
**After reading the reference, apply those patterns to the user's specific context.** **After reading the reference, apply those patterns to the user's specific context.**
</routing> </routing>
<architecture_checklist> <architecture_checklist>
## Architecture Review Checklist (Apply During Design) ## Architecture Review Checklist
When designing an agent-native system, verify these **before implementation**: When designing an agent-native system, verify these **before implementation**:
### Core Principles
- [ ] **Parity:** Every UI action has a corresponding agent capability
- [ ] **Granularity:** Tools are primitives; features are prompt-defined outcomes
- [ ] **Composability:** New features can be added via prompts alone
- [ ] **Emergent Capability:** Agent can handle open-ended requests in your domain
### Tool Design ### Tool Design
- [ ] **Dynamic vs Static:** For external APIs where agent should have full user-level access (HealthKit, HomeKit, GraphQL), use Dynamic Capability Discovery. Only use static mapping if intentionally limiting agent scope. - [ ] **Dynamic vs Static:** For external APIs where agent should have full access, use Dynamic Capability Discovery
- [ ] **CRUD Completeness:** Every entity has create, read, update, AND delete tools - [ ] **CRUD Completeness:** Every entity has create, read, update, AND delete
- [ ] **Primitives not Workflows:** Tools enable capability, they don't encode business logic - [ ] **Primitives not Workflows:** Tools enable capability, don't encode business logic
- [ ] **API as Validator:** Use `z.string()` inputs when the API validates, not `z.enum()` - [ ] **API as Validator:** Use `z.string()` inputs when the API validates, not `z.enum()`
### Action Parity ### Files & Workspace
- [ ] **Capability Map:** Every UI action has a corresponding agent tool - [ ] **Shared Workspace:** Agent and user work in same data space
- [ ] **Edit/Delete:** If UI can edit or delete, agent must be able to too - [ ] **context.md Pattern:** Agent reads/updates context file for accumulated knowledge
- [ ] **The Write Test:** "Write something to [app location]" must work for all locations - [ ] **File Organization:** Entity-scoped directories with consistent naming
### UI Integration ### Agent Execution
- [ ] **Agent → UI:** Define how agent changes reflect in UI (shared service, file watching, or event bus) - [ ] **Completion Signals:** Agent has explicit `complete_task` tool (not heuristic detection)
- [ ] **No Silent Actions:** Agent writes should trigger UI updates immediately - [ ] **Partial Completion:** Multi-step tasks track progress for resume
- [ ] **Capability Discovery:** Users can learn what agent can do (onboarding, hints) - [ ] **Context Limits:** Designed for bounded context from the start
### Context Injection ### Context Injection
- [ ] **Available Resources:** System prompt includes what exists (files, data, types) - [ ] **Available Resources:** System prompt includes what exists (files, data, types)
- [ ] **Available Capabilities:** System prompt documents what agent can do with user vocabulary - [ ] **Available Capabilities:** System prompt documents tools with user vocabulary
- [ ] **Dynamic Context:** Context refreshes for long sessions (or provide `refresh_context` tool) - [ ] **Dynamic Context:** Context refreshes for long sessions (or provide `refresh_context` tool)
### UI Integration
- [ ] **Agent → UI:** Agent changes reflect in UI (shared service, file watching, or event bus)
- [ ] **No Silent Actions:** Agent writes trigger UI updates immediately
- [ ] **Capability Discovery:** Users can learn what agent can do
### Mobile (if applicable) ### Mobile (if applicable)
- [ ] **Background Execution:** Checkpoint/resume pattern for iOS app suspension - [ ] **Checkpoint/Resume:** Handle iOS app suspension gracefully
- [ ] **Permissions:** Just-in-time permission requests in tools - [ ] **iCloud Storage:** iCloud-first with local fallback for multi-device sync
- [ ] **Cost Awareness:** Model tier selection (Haiku/Sonnet/Opus) - [ ] **Cost Awareness:** Model tier selection (Haiku/Sonnet/Opus)
**When designing architecture, explicitly address each checkbox in your plan.** **When designing architecture, explicitly address each checkbox in your plan.**
</architecture_checklist> </architecture_checklist>
<quick_start> <quick_start>
Build a prompt-native agent in three steps: ## Quick Start: Build an Agent-Native Feature
**Step 1: Define primitive tools** **Step 1: Define atomic tools**
```typescript ```typescript
const tools = [ const tools = [
tool("read_file", "Read any file", { path: z.string() }, ...), tool("read_file", "Read any file", { path: z.string() }, ...),
tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...), tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
tool("list_files", "List directory", { path: z.string() }, ...), tool("list_files", "List directory", { path: z.string() }, ...),
tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
]; ];
``` ```
@@ -145,201 +257,179 @@ const tools = [
When asked to organize content, you should: When asked to organize content, you should:
1. Read existing files to understand the structure 1. Read existing files to understand the structure
2. Analyze what organization makes sense 2. Analyze what organization makes sense
3. Create appropriate pages using write_file 3. Create/move files using your tools
4. Use your judgment about layout and formatting 4. Use your judgment about layout and formatting
5. Call complete_task when you're done
You decide the structure. Make it good. You decide the structure. Make it good.
``` ```
**Step 3: Let the agent work** **Step 3: Let the agent work in a loop**
```typescript ```typescript
query({ const result = await agent.run({
prompt: userMessage, prompt: userMessage,
options: { tools: tools,
systemPrompt, systemPrompt: systemPrompt,
mcpServers: { files: fileServer }, // Agent loops until it calls complete_task
permissionMode: "acceptEdits",
}
}); });
``` ```
</quick_start> </quick_start>
<reference_index> <reference_index>
## Domain Knowledge ## Reference Files
All references in `references/`: All references in `references/`:
**Core Patterns:** **Core Patterns:**
- **Architecture:** [architecture-patterns.md](./references/architecture-patterns.md) - [architecture-patterns.md](./references/architecture-patterns.md) - Event-driven, unified orchestrator, agent-to-UI
- **Tool Design:** [mcp-tool-design.md](./references/mcp-tool-design.md) - includes Dynamic Capability Discovery, CRUD Completeness - [files-universal-interface.md](./references/files-universal-interface.md) - Why files, organization patterns, context.md
- **Prompts:** [system-prompt-design.md](./references/system-prompt-design.md) - [mcp-tool-design.md](./references/mcp-tool-design.md) - Tool design, dynamic capability discovery, CRUD
- **Self-Modification:** [self-modification.md](./references/self-modification.md) - [from-primitives-to-domain-tools.md](./references/from-primitives-to-domain-tools.md) - When to add domain tools, graduating to code
- **Refactoring:** [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) - [agent-execution-patterns.md](./references/agent-execution-patterns.md) - Completion signals, partial completion, context limits
- [system-prompt-design.md](./references/system-prompt-design.md) - Features as prompts, judgment criteria
**Agent-Native Disciplines:** **Agent-Native Disciplines:**
- **Context Injection:** [dynamic-context-injection.md](./references/dynamic-context-injection.md) - [dynamic-context-injection.md](./references/dynamic-context-injection.md) - Runtime context, what to inject
- **Action Parity:** [action-parity-discipline.md](./references/action-parity-discipline.md) - [action-parity-discipline.md](./references/action-parity-discipline.md) - Capability mapping, parity workflow
- **Shared Workspace:** [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) - [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) - Shared data space, UI integration
- **Testing:** [agent-native-testing.md](./references/agent-native-testing.md) - [product-implications.md](./references/product-implications.md) - Progressive disclosure, latent demand, approval
- **Mobile Patterns:** [mobile-patterns.md](./references/mobile-patterns.md) - [agent-native-testing.md](./references/agent-native-testing.md) - Testing outcomes, parity tests
**Platform-Specific:**
- [mobile-patterns.md](./references/mobile-patterns.md) - iOS storage, checkpoint/resume, cost awareness
- [self-modification.md](./references/self-modification.md) - Git-based evolution, guardrails
- [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) - Migrating existing code
</reference_index> </reference_index>
<anti_patterns> <anti_patterns>
## What NOT to Do ## Anti-Patterns
### Common Approaches That Aren't Fully Agent-Native
These aren't necessarily wrong—they may be appropriate for your use case. But they're worth recognizing as different from the architecture this document describes.
**Agent as router** — The agent figures out what the user wants, then calls the right function. The agent's intelligence is used to route, not to act. This can work, but you're using a fraction of what agents can do.
**Build the app, then add agent** — You build features the traditional way (as code), then expose them to an agent. The agent can only do what your features already do. You won't get emergent capability.
**Request/response thinking** — Agent gets input, does one thing, returns output. This misses the loop: agent gets an outcome to achieve, operates until it's done, handles unexpected situations along the way.
**Defensive tool design** — You over-constrain tool inputs because you're used to defensive programming. Strict enums, validation at every layer. This is safe, but it prevents the agent from doing things you didn't anticipate.
**Happy path in code, agent just executes** — Traditional software handles edge cases in code—you write the logic for what happens when X goes wrong. Agent-native lets the agent handle edge cases with judgment. If your code handles all the edge cases, the agent is just a caller.
---
### Specific Anti-Patterns
**THE CARDINAL SIN: Agent executes your code instead of figuring things out** **THE CARDINAL SIN: Agent executes your code instead of figuring things out**
This is the most common mistake. You fall back into writing workflow code and having the agent call it, instead of defining outcomes and letting the agent figure out HOW.
```typescript ```typescript
// WRONG - You wrote the workflow, agent just executes it // WRONG - You wrote the workflow, agent just executes it
tool("process_feedback", async ({ message }) => { tool("process_feedback", async ({ message }) => {
const category = categorize(message); // Your code const category = categorize(message); // Your code decides
const priority = calculatePriority(message); // Your code const priority = calculatePriority(message); // Your code decides
await store(message, category, priority); // Your code await store(message, category, priority); // Your code orchestrates
if (priority > 3) await notify(); // Your code if (priority > 3) await notify(); // Your code decides
}); });
// RIGHT - Agent figures out how to process feedback // RIGHT - Agent figures out how to process feedback
tool("store_item", { key, value }, ...); // Primitive tools: store_item, send_message // Primitives
tool("send_message", { channel, content }, ...); // Primitive prompt: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
// Prompt says: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
``` ```
**Don't artificially limit what the agent can do** **Workflow-shaped tools**`analyze_and_organize` bundles judgment into the tool. Break it into primitives and let the agent compose them.
If a user could do it, the agent should be able to do it. **Context starvation** — Agent doesn't know what resources exist in the app.
```typescript
// WRONG - limiting agent capabilities
tool("read_approved_files", { path }, async ({ path }) => {
if (!ALLOWED_PATHS.includes(path)) throw new Error("Not allowed");
return readFile(path);
});
// RIGHT - give full capability, use guardrails appropriately
tool("read_file", { path }, ...); // Agent can read anything
// Use approval gates for writes, not artificial limits on reads
```
**Don't encode decisions in tools**
```typescript
// Wrong - tool decides format
tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) }, ...)
// Right - agent decides format via prompt
tool("write_file", ...) // Agent chooses what to write
```
**Don't over-specify in prompts**
```markdown
// Wrong - micromanaging the HOW
When creating a summary, use exactly 3 bullet points,
each under 20 words, formatted with em-dashes...
// Right - define outcome, trust intelligence
Create clear, useful summaries. Use your judgment.
```
### Agent-Native Anti-Patterns
**Context Starvation**
Agent doesn't know what resources exist in the app.
``` ```
User: "Write something about Catherine the Great in my feed" User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand what system you're referring to." Agent: "What feed? I don't understand what system you're referring to."
``` ```
Fix: Inject available resources, capabilities, and vocabulary into the system prompt at runtime. Fix: Inject available resources, capabilities, and vocabulary into system prompt.
**Orphan Features** **Orphan UI actions** — User can do something through the UI that the agent can't achieve. Fix: maintain parity.
UI action with no agent equivalent.
```swift **Silent actions** — Agent changes state but UI doesn't update. Fix: Use shared data stores with reactive binding, or file system observation.
// UI has a "Publish to Feed" button
Button("Publish") { publishToFeed(insight) } **Heuristic completion detection** — Detecting agent completion through heuristics (consecutive iterations without tool calls, checking for expected output files). This is fragile. Fix: Require agents to explicitly signal completion through a `complete_task` tool.
// But no agent tool exists to do the same thing
**Static tool mapping for dynamic APIs** — Building 50 tools for 50 API endpoints when a `discover` + `access` pattern would give more flexibility.
```typescript
// WRONG - Every API type needs a hardcoded tool
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required
// RIGHT - Dynamic capability discovery
tool("list_available_types", ...) // Discover what's available
tool("read_health_data", { dataType: z.string() }, ...) // Access any type
``` ```
Fix: Add corresponding tool and document in system prompt for every UI action.
**Sandbox Isolation** **Incomplete CRUD** — Agent can create but not update or delete.
Agent works in separate data space from user. ```typescript
// User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...) // Missing: update, delete
```
Fix: Every entity needs full CRUD.
**Sandbox isolation** — Agent works in separate data space from user.
``` ```
Documents/ Documents/
├── user_files/ ← User's space ├── user_files/ ← User's space
└── agent_output/ ← Agent's space (isolated) └── agent_output/ ← Agent's space (isolated)
``` ```
Fix: Use shared workspace where both agent and user operate on the same files. Fix: Use shared workspace where both operate on same files.
**Silent Actions** **Gates without reason** — Domain tool is the only way to do something, and you didn't intend to restrict access. The default is open. Keep primitives available unless there's a specific reason to gate.
Agent changes state but UI doesn't update.
```typescript
// Agent writes to database
await db.insert("feed", content);
// But UI doesn't observe this table - user sees nothing
```
Fix: Use shared data stores with reactive binding, or file system observation.
**Capability Hiding** **Artificial capability limits** — Restricting what the agent can do out of vague safety concerns rather than specific risks. Be thoughtful about restricting capabilities. The agent should generally be able to do what users can do.
Users can't discover what agents can do.
```
User: "Help me with my reading"
Agent: "What would you like help with?"
// Agent doesn't mention it can publish to feed, research books, etc.
```
Fix: Include capability hints in agent responses or provide onboarding.
**Static Tool Mapping (for agent-native apps)**
Building individual tools for each API endpoint when you want the agent to have full access.
```typescript
// You built 50 tools for 50 HealthKit types
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required
// Agent can only access what you anticipated
```
Fix: Use Dynamic Capability Discovery - one `list_*` tool to discover what's available, one generic tool to access any type. See [mcp-tool-design.md](./references/mcp-tool-design.md). (Note: Static mapping is fine for constrained agents with intentionally limited scope.)
**Incomplete CRUD**
Agent can create but not update or delete.
```typescript
// ❌ User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...)
// Missing: update_journal_entry, delete_journal_entry
```
Fix: Every entity needs full CRUD (Create, Read, Update, Delete). The CRUD Audit: for each entity, verify all four operations exist.
</anti_patterns> </anti_patterns>
<success_criteria> <success_criteria>
You've built a prompt-native agent when: ## Success Criteria
**Core Prompt-Native Criteria:** You've built an agent-native application when:
- [ ] The agent figures out HOW to achieve outcomes, not just calls your functions
- [ ] Whatever a user could do, the agent can do (no artificial limits)
- [ ] Features are prompts that define outcomes, not code that defines workflows
- [ ] Tools are primitives (read, write, store, call API) that enable capability
- [ ] Changing behavior means editing prose, not refactoring code
- [ ] The agent can surprise you with clever approaches you didn't anticipate
- [ ] You could add a new feature by writing a new prompt section, not new code
**Tool Design Criteria:** ### Architecture
- [ ] External APIs (where agent should have full access) use Dynamic Capability Discovery - [ ] The agent can achieve anything users can achieve through the UI (parity)
- [ ] Every entity has full CRUD (Create, Read, Update, Delete) - [ ] Tools are atomic primitives; domain tools are shortcuts, not gates (granularity)
- [ ] API validates inputs, not your enum definitions - [ ] New features can be added by writing new prompts (composability)
- [ ] Discovery tools exist for each API surface (`list_*`, `discover_*`) - [ ] The agent can accomplish tasks you didn't explicitly design for (emergent capability)
- [ ] Changing behavior means editing prompts, not refactoring code
**Agent-Native Criteria:** ### Implementation
- [ ] System prompt includes dynamic context about app state (available resources, recent activity) - [ ] System prompt includes dynamic context about app state
- [ ] Every UI action has a corresponding agent tool (action parity) - [ ] Every UI action has a corresponding agent tool (action parity)
- [ ] Agent tools are documented in the system prompt with user vocabulary - [ ] Agent tools are documented in system prompt with user vocabulary
- [ ] Agent and user work in the same data space (shared workspace) - [ ] Agent and user work in the same data space (shared workspace)
- [ ] Agent actions are immediately reflected in the UI (shared service, file watching, or event bus) - [ ] Agent actions are immediately reflected in the UI
- [ ] The "write something to [app location]" test passes for all locations - [ ] Every entity has full CRUD (Create, Read, Update, Delete)
- [ ] Users can discover what the agent can do (capability hints, onboarding) - [ ] Agents explicitly signal completion (no heuristic detection)
- [ ] Context refreshes for long sessions (or `refresh_context` tool exists) - [ ] context.md or equivalent for accumulated knowledge
**Mobile-Specific Criteria (if applicable):** ### Product
- [ ] Background execution handling implemented (checkpoint/resume) - [ ] Simple requests work immediately with no learning curve
- [ ] Permission requests handled gracefully in tools - [ ] Power users can push the system in unexpected directions
- [ ] Cost-aware design (appropriate model tiers, batching) - [ ] You're learning what users want by observing what they ask the agent to do
- [ ] Approval requirements match stakes and reversibility
### Mobile (if applicable)
- [ ] Checkpoint/resume handles app interruption
- [ ] iCloud-first storage with local fallback
- [ ] Background execution uses available time wisely
- [ ] Model tier matched to task complexity
---
### The Ultimate Test
**Describe an outcome to the agent that's within your application's domain but that you didn't build a specific feature for.**
Can it figure out how to accomplish it, operating in a loop until it succeeds?
If yes, you've built something agent-native.
If it says "I don't have a feature for that"—your architecture is still too constrained.
</success_criteria> </success_criteria>

View File

@@ -0,0 +1,467 @@
<overview>
Agent execution patterns for building robust agent loops. This covers how agents signal completion, track partial progress for resume, select appropriate model tiers, and handle context limits.
</overview>
<completion_signals>
## Completion Signals
Agents need an explicit way to say "I'm done."
### Anti-Pattern: Heuristic Detection
Detecting completion through heuristics is fragile:
- Consecutive iterations without tool calls
- Checking for expected output files
- Tracking "no progress" states
- Time-based timeouts
These break in edge cases and create unpredictable behavior.
### Pattern: Explicit Completion Tool
Provide a `complete_task` tool that:
- Takes a summary of what was accomplished
- Returns a signal that stops the loop
- Works identically across all agent types
```typescript
tool("complete_task", {
summary: z.string().describe("Summary of what was accomplished"),
status: z.enum(["success", "partial", "blocked"]).optional(),
}, async ({ summary, status = "success" }) => {
return {
text: summary,
shouldContinue: false, // Key: signals loop should stop
};
});
```
### The ToolResult Pattern
Structure tool results to separate success from continuation:
```swift
struct ToolResult {
let success: Bool // Did tool succeed?
let output: String // What happened?
let shouldContinue: Bool // Should agent loop continue?
}
// Three common cases:
extension ToolResult {
static func success(_ output: String) -> ToolResult {
// Tool succeeded, keep going
ToolResult(success: true, output: output, shouldContinue: true)
}
static func error(_ message: String) -> ToolResult {
// Tool failed but recoverable, agent can try something else
ToolResult(success: false, output: message, shouldContinue: true)
}
static func complete(_ summary: String) -> ToolResult {
// Task done, stop the loop
ToolResult(success: true, output: summary, shouldContinue: false)
}
}
```
### Key Insight
**This is different from success/failure:**
- A tool can **succeed** AND signal **stop** (task complete)
- A tool can **fail** AND signal **continue** (recoverable error, try something else)
```typescript
// Examples:
read_file("/missing.txt")
// → { success: false, output: "File not found", shouldContinue: true }
// Agent can try a different file or ask for clarification
complete_task("Organized all downloads into folders")
// → { success: true, output: "...", shouldContinue: false }
// Agent is done
write_file("/output.md", content)
// → { success: true, output: "Wrote file", shouldContinue: true }
// Agent keeps working toward the goal
```
### System Prompt Guidance
Tell the agent when to complete:
```markdown
## Completing Tasks
When you've accomplished the user's request:
1. Verify your work (read back files you created, check results)
2. Call `complete_task` with a summary of what you did
3. Don't keep working after the goal is achieved
If you're blocked and can't proceed:
- Call `complete_task` with status "blocked" and explain why
- Don't loop forever trying the same thing
```
</completion_signals>
<partial_completion>
## Partial Completion
For multi-step tasks, track progress at the task level for resume capability.
### Task State Tracking
```swift
enum TaskStatus {
case pending // Not yet started
case inProgress // Currently working on
case completed // Finished successfully
case failed // Couldn't complete (with reason)
case skipped // Intentionally not done
}
struct AgentTask {
let id: String
let description: String
var status: TaskStatus
var notes: String? // Why it failed, what was done
}
struct AgentSession {
var tasks: [AgentTask]
var isComplete: Bool {
tasks.allSatisfy { $0.status == .completed || $0.status == .skipped }
}
var progress: (completed: Int, total: Int) {
let done = tasks.filter { $0.status == .completed }.count
return (done, tasks.count)
}
}
```
### UI Progress Display
Show users what's happening:
```
Progress: 3/5 tasks complete (60%)
✅ [1] Find source materials
✅ [2] Download full text
✅ [3] Extract key passages
❌ [4] Generate summary - Error: context limit exceeded
⏳ [5] Create outline - Pending
```
### Partial Completion Scenarios
**Agent hits max iterations before finishing:**
- Some tasks completed, some pending
- Checkpoint saved with current state
- Resume continues from where it left off, not from beginning
**Agent fails on one task:**
- Task marked `.failed` with error in notes
- Other tasks may continue (agent decides)
- Orchestrator doesn't automatically abort entire session
**Network error mid-task:**
- Current iteration throws
- Session marked `.failed`
- Checkpoint preserves messages up to that point
- Resume possible from checkpoint
### Checkpoint Structure
```swift
struct AgentCheckpoint: Codable {
let sessionId: String
let agentType: String
let messages: [Message] // Full conversation history
let iterationCount: Int
let tasks: [AgentTask] // Task state
let customState: [String: Any] // Agent-specific state
let timestamp: Date
var isValid: Bool {
// Checkpoints expire (default 1 hour)
Date().timeIntervalSince(timestamp) < 3600
}
}
```
### Resume Flow
1. On app launch, scan for valid checkpoints
2. Show user: "You have an incomplete session. Resume?"
3. On resume:
- Restore messages to conversation
- Restore task states
- Continue agent loop from where it left off
4. On dismiss:
- Delete checkpoint
- Start fresh if user tries again
</partial_completion>
<model_tier_selection>
## Model Tier Selection
Different agents need different intelligence levels. Use the cheapest model that achieves the outcome.
### Tier Guidelines
| Agent Type | Recommended Tier | Reasoning |
|------------|-----------------|-----------|
| Chat/Conversation | Balanced (Sonnet) | Fast responses, good reasoning |
| Research | Balanced (Sonnet) | Tool loops, not ultra-complex synthesis |
| Content Generation | Balanced (Sonnet) | Creative but not synthesis-heavy |
| Complex Analysis | Powerful (Opus) | Multi-document synthesis, nuanced judgment |
| Profile Generation | Powerful (Opus) | Photo analysis, complex pattern recognition |
| Quick Queries | Fast (Haiku) | Simple lookups, quick transformations |
| Simple Classification | Fast (Haiku) | High volume, simple decisions |
### Implementation
```swift
enum ModelTier {
case fast // claude-3-haiku: Quick, cheap, simple tasks
case balanced // claude-sonnet: Good balance for most tasks
case powerful // claude-opus: Complex reasoning, synthesis
var modelId: String {
switch self {
case .fast: return "claude-3-haiku-20240307"
case .balanced: return "claude-sonnet-4-20250514"
case .powerful: return "claude-opus-4-20250514"
}
}
}
struct AgentConfig {
let name: String
let modelTier: ModelTier
let tools: [AgentTool]
let systemPrompt: String
let maxIterations: Int
}
// Examples
let researchConfig = AgentConfig(
name: "research",
modelTier: .balanced,
tools: researchTools,
systemPrompt: researchPrompt,
maxIterations: 20
)
let quickLookupConfig = AgentConfig(
name: "lookup",
modelTier: .fast,
tools: [readLibrary],
systemPrompt: "Answer quick questions about the user's library.",
maxIterations: 3
)
```
### Cost Optimization Strategies
1. **Start with balanced, upgrade if quality insufficient**
2. **Use fast tier for tool-heavy loops** where each turn is simple
3. **Reserve powerful tier for synthesis tasks** (comparing multiple sources)
4. **Consider token limits per turn** to control costs
5. **Cache expensive operations** to avoid repeated calls
</model_tier_selection>
<context_limits>
## Context Limits
Agent sessions can extend indefinitely, but context windows don't. Design for bounded context from the start.
### The Problem
```
Turn 1: User asks question → 500 tokens
Turn 2: Agent reads file → 10,000 tokens
Turn 3: Agent reads another file → 10,000 tokens
Turn 4: Agent researches → 20,000 tokens
...
Turn 10: Context window exceeded
```
### Design Principles
**1. Tools should support iterative refinement**
Instead of all-or-nothing, design for summary → detail → full:
```typescript
// Good: Supports iterative refinement
tool("read_file", {
path: z.string(),
preview: z.boolean().default(true), // Return first 1000 chars by default
full: z.boolean().default(false), // Opt-in to full content
}, ...);
tool("search_files", {
query: z.string(),
summaryOnly: z.boolean().default(true), // Return matches, not full files
}, ...);
```
**2. Provide consolidation tools**
Give agents a way to consolidate learnings mid-session:
```typescript
tool("summarize_and_continue", {
keyPoints: z.array(z.string()),
nextSteps: z.array(z.string()),
}, async ({ keyPoints, nextSteps }) => {
// Store summary, potentially truncate earlier messages
await saveSessionSummary({ keyPoints, nextSteps });
return { text: "Summary saved. Continuing with focus on: " + nextSteps.join(", ") };
});
```
**3. Design for truncation**
Assume the orchestrator may truncate early messages. Important context should be:
- In the system prompt (always present)
- In files (can be re-read)
- Summarized in context.md
### Implementation Strategies
```swift
class AgentOrchestrator {
let maxContextTokens = 100_000
let targetContextTokens = 80_000 // Leave headroom
func shouldTruncate() -> Bool {
estimateTokens(messages) > targetContextTokens
}
func truncateIfNeeded() {
if shouldTruncate() {
// Keep system prompt + recent messages
// Summarize or drop older messages
messages = [systemMessage] + summarizeOldMessages() + recentMessages
}
}
}
```
### System Prompt Guidance
```markdown
## Managing Context
For long tasks, periodically consolidate what you've learned:
1. If you've gathered a lot of information, summarize key points
2. Save important findings to files (they persist beyond context)
3. Use `summarize_and_continue` if the conversation is getting long
Don't try to hold everything in memory. Write it down.
```
</context_limits>
<orchestrator_pattern>
## Unified Agent Orchestrator
One execution engine, many agent types. All agents use the same orchestrator with different configurations.
```swift
class AgentOrchestrator {
static let shared = AgentOrchestrator()
func run(config: AgentConfig, userMessage: String) async -> AgentResult {
var messages: [Message] = [
.system(config.systemPrompt),
.user(userMessage)
]
var iteration = 0
while iteration < config.maxIterations {
// Get agent response
let response = await claude.message(
model: config.modelTier.modelId,
messages: messages,
tools: config.tools
)
messages.append(.assistant(response))
// Process tool calls
for toolCall in response.toolCalls {
let result = await executeToolCall(toolCall, config: config)
messages.append(.toolResult(result))
// Check for completion signal
if !result.shouldContinue {
return AgentResult(
status: .completed,
output: result.output,
iterations: iteration + 1
)
}
}
// No tool calls = agent is responding, might be done
if response.toolCalls.isEmpty {
// Could be done, or waiting for user
break
}
iteration += 1
}
return AgentResult(
status: iteration >= config.maxIterations ? .maxIterations : .responded,
output: messages.last?.content ?? "",
iterations: iteration
)
}
}
```
### Benefits
- Consistent lifecycle management across all agent types
- Automatic checkpoint/resume (critical for mobile)
- Shared tool protocol
- Easy to add new agent types
- Centralized error handling and logging
</orchestrator_pattern>
<checklist>
## Agent Execution Checklist
### Completion Signals
- [ ] `complete_task` tool provided (explicit completion)
- [ ] No heuristic completion detection
- [ ] Tool results include `shouldContinue` flag
- [ ] System prompt guides when to complete
### Partial Completion
- [ ] Tasks tracked with status (pending, in_progress, completed, failed)
- [ ] Checkpoints saved for resume
- [ ] Progress visible to user
- [ ] Resume continues from where left off
### Model Tiers
- [ ] Tier selected based on task complexity
- [ ] Cost optimization considered
- [ ] Fast tier for simple operations
- [ ] Powerful tier reserved for synthesis
### Context Limits
- [ ] Tools support iterative refinement (preview vs full)
- [ ] Consolidation mechanism available
- [ ] Important context persisted to files
- [ ] Truncation strategy defined
</checklist>

View File

@@ -1,5 +1,12 @@
<overview> <overview>
Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives. Architectural patterns for building agent-native systems. These patterns emerge from the five core principles: Parity, Granularity, Composability, Emergent Capability, and Improvement Over Time.
Features are outcomes achieved by agents operating in a loop, not functions you write. Tools are atomic primitives. The agent applies judgment; the prompt defines the outcome.
See also:
- [files-universal-interface.md](./files-universal-interface.md) for file organization and context.md patterns
- [agent-execution-patterns.md](./agent-execution-patterns.md) for completion signals and partial completion
- [product-implications.md](./product-implications.md) for progressive disclosure and approval patterns
</overview> </overview>
<pattern name="event-driven-agent"> <pattern name="event-driven-agent">

View File

@@ -0,0 +1,301 @@
<overview>
Files are the universal interface for agent-native applications. Agents are naturally fluent with file operations—they already know how to read, write, and organize files. This document covers why files work so well, how to organize them, and the context.md pattern for accumulated knowledge.
</overview>
<why_files>
## Why Files
Agents are naturally good at files. Claude Code works because bash + filesystem is the most battle-tested agent interface. When building agent-native apps, lean into this.
### Agents Already Know How
You don't need to teach the agent your API—it already knows `cat`, `grep`, `mv`, `mkdir`. File operations are the primitives it's most fluent with.
### Files Are Inspectable
Users can see what the agent created, edit it, move it, delete it. No black box. Complete transparency into agent behavior.
### Files Are Portable
Export is trivial. Backup is trivial. Users own their data. No vendor lock-in, no complex migration paths.
### App State Stays in Sync
On mobile, if you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
### Directory Structure Is Information Architecture
The filesystem gives you hierarchy for free. `/projects/acme/notes/` is self-documenting in a way that `SELECT * FROM notes WHERE project_id = 123` isn't.
</why_files>
<file_organization>
## File Organization Patterns
> **Needs validation:** These conventions are one approach that's worked so far, not a prescription. Better solutions should be considered.
A general principle of agent-native design: **Design for what agents can reason about.** The best proxy for that is what would make sense to a human. If a human can look at your file structure and understand what's going on, an agent probably can too.
### Entity-Scoped Directories
Organize files around entities, not actors or file types:
```
{entity_type}/{entity_id}/
├── primary content
├── metadata
└── related materials
```
**Example:** `Research/books/{bookId}/` contains everything about one book—full text, notes, sources, agent logs.
### Naming Conventions
| File Type | Naming Pattern | Example |
|-----------|---------------|---------|
| Entity data | `{entity}.json` | `library.json`, `status.json` |
| Human-readable content | `{content_type}.md` | `introduction.md`, `profile.md` |
| Agent reasoning | `agent_log.md` | Per-entity agent history |
| Primary content | `full_text.txt` | Downloaded/extracted text |
| Multi-volume | `volume{N}.txt` | `volume1.txt`, `volume2.txt` |
| External sources | `{source_name}.md` | `wikipedia.md`, `sparknotes.md` |
| Checkpoints | `{sessionId}.checkpoint` | UUID-based |
| Configuration | `config.json` | Feature settings |
### Directory Naming
- **Entity-scoped:** `{entityType}/{entityId}/` (e.g., `Research/books/{bookId}/`)
- **Type-scoped:** `{type}/` (e.g., `AgentCheckpoints/`, `AgentLogs/`)
- **Convention:** Lowercase with underscores, not camelCase
### Ephemeral vs. Durable Separation
Separate agent working files from user's permanent data:
```
Documents/
├── AgentCheckpoints/ # Ephemeral (can delete)
│ └── {sessionId}.checkpoint
├── AgentLogs/ # Ephemeral (debugging)
│ └── {type}/{sessionId}.md
└── Research/ # Durable (user's work)
└── books/{bookId}/
```
### The Split: Markdown vs JSON
- **Markdown:** For content users might read or edit
- **JSON:** For structured data the app queries
</file_organization>
<context_md_pattern>
## The context.md Pattern
A file the agent reads at the start of each session and updates as it learns:
```markdown
# Context
## Who I Am
Reading assistant for the Every app.
## What I Know About This User
- Interested in military history and Russian literature
- Prefers concise analysis
- Currently reading War and Peace
## What Exists
- 12 notes in /notes
- 3 active projects
- User preferences at /preferences.md
## Recent Activity
- User created "Project kickoff" (2 hours ago)
- Analyzed passage about Austerlitz (yesterday)
## My Guidelines
- Don't spoil books they're reading
- Use their interests to personalize insights
## Current State
- No pending tasks
- Last sync: 10 minutes ago
```
### Benefits
- **Agent behavior evolves without code changes** - Update the context, behavior changes
- **Users can inspect and modify** - Complete transparency
- **Natural place for accumulated context** - Learnings persist across sessions
- **Portable across sessions** - Restart agent, knowledge preserved
### How It Works
1. Agent reads `context.md` at session start
2. Agent updates it when learning something important
3. System can also update it (recent activity, new resources)
4. Context persists across sessions
### What to Include
| Section | Purpose |
|---------|---------|
| Who I Am | Agent identity and role |
| What I Know About This User | Learned preferences, interests |
| What Exists | Available resources, data |
| Recent Activity | Context for continuity |
| My Guidelines | Learned rules and constraints |
| Current State | Session status, pending items |
</context_md_pattern>
<files_vs_database>
## Files vs. Database
> **Needs validation:** This framing is informed by mobile development. For web apps, the tradeoffs are different.
| Use files for... | Use database for... |
|------------------|---------------------|
| Content users should read/edit | High-volume structured data |
| Configuration that benefits from version control | Data that needs complex queries |
| Agent-generated content | Ephemeral state (sessions, caches) |
| Anything that benefits from transparency | Data with relationships |
| Large text content | Data that needs indexing |
**The principle:** Files for legibility, databases for structure. When in doubt, files—they're more transparent and users can always inspect them.
### When Files Work Best
- Scale is small (one user's library, not millions of records)
- Transparency is valued over query speed
- Cloud sync (iCloud, Dropbox) works well with files
### Hybrid Approach
Even if you need a database for performance, consider maintaining a file-based "source of truth" that the agent works with, synced to the database for the UI:
```
Files (agent workspace):
Research/book_123/introduction.md
Database (UI queries):
research_index: { bookId, path, title, createdAt }
```
</files_vs_database>
<conflict_model>
## Conflict Model
If agents and users write to the same files, you need a conflict model.
### Current Reality
Most implementations use **last-write-wins** via atomic writes:
```swift
try data.write(to: url, options: [.atomic])
```
This is simple but can lose changes.
### Options
| Strategy | Pros | Cons |
|----------|------|------|
| **Last write wins** | Simple | Changes can be lost |
| **Agent checks before writing** | Preserves user edits | More complexity |
| **Separate spaces** | No conflicts | Less collaboration |
| **Append-only logs** | Never overwrites | Files grow forever |
| **File locking** | Safe concurrent access | Complexity, can block |
### Recommended Approaches
**For files agents write frequently (logs, status):** Last-write-wins is fine. Conflicts are rare.
**For files users edit (profiles, notes):** Consider explicit handling:
- Agent checks modification time before overwriting
- Or keep agent output separate from user-editable content
- Or use append-only pattern
### iCloud Considerations
iCloud sync adds complexity. It creates `{filename} (conflict).md` files when sync conflicts occur. Monitor for these:
```swift
NotificationCenter.default.addObserver(
forName: .NSMetadataQueryDidUpdate,
...
)
```
### System Prompt Guidance
Tell the agent about the conflict model:
```markdown
## Working with User Content
When you create content, the user may edit it afterward. Always read
existing files before modifying them—the user may have made improvements
you should preserve.
If a file has been modified since you last wrote it, ask before overwriting.
```
</conflict_model>
<examples>
## Example: Reading App File Structure
```
Documents/
├── Library/
│ └── library.json # Book metadata
├── Research/
│ └── books/
│ └── {bookId}/
│ ├── full_text.txt # Downloaded content
│ ├── introduction.md # Agent-generated, user-editable
│ ├── notes.md # User notes
│ └── sources/
│ ├── wikipedia.md # Research gathered by agent
│ └── reviews.md
├── Chats/
│ └── {conversationId}.json # Chat history
├── Profile/
│ └── profile.md # User reading profile
└── context.md # Agent's accumulated knowledge
```
**How it works:**
1. User adds book → creates entry in `library.json`
2. Agent downloads text → saves to `Research/books/{id}/full_text.txt`
3. Agent researches → saves to `sources/`
4. Agent generates intro → saves to `introduction.md`
5. User edits intro → agent sees changes on next read
6. Agent updates `context.md` with learnings
</examples>
<checklist>
## Files as Universal Interface Checklist
### Organization
- [ ] Entity-scoped directories (`{type}/{id}/`)
- [ ] Consistent naming conventions
- [ ] Ephemeral vs durable separation
- [ ] Markdown for human content, JSON for structured data
### context.md
- [ ] Agent reads context at session start
- [ ] Agent updates context when learning
- [ ] Includes: identity, user knowledge, what exists, guidelines
- [ ] Persists across sessions
### Conflict Handling
- [ ] Conflict model defined (last-write-wins, check-before-write, etc.)
- [ ] Agent guidance in system prompt
- [ ] iCloud conflict monitoring (if applicable)
### Integration
- [ ] UI observes file changes (or shared service)
- [ ] Agent can read user edits
- [ ] User can inspect agent output
</checklist>

View File

@@ -0,0 +1,359 @@
<overview>
Start with pure primitives: bash, file operations, basic storage. This proves the architecture works and reveals what the agent actually needs. As patterns emerge, add domain-specific tools deliberately. This document covers when and how to evolve from primitives to domain tools, and when to graduate to optimized code.
</overview>
<start_with_primitives>
## Start with Pure Primitives
Begin every agent-native system with the most atomic tools possible:
- `read_file` / `write_file` / `list_files`
- `bash` (for everything else)
- Basic storage (`store_item` / `get_item`)
- HTTP requests (`fetch_url`)
**Why start here:**
1. **Proves the architecture** - If it works with primitives, your prompts are doing their job
2. **Reveals actual needs** - You'll discover what domain concepts matter
3. **Maximum flexibility** - Agent can do anything, not just what you anticipated
4. **Forces good prompts** - You can't lean on tool logic as a crutch
### Example: Starting Primitive
```typescript
// Start with just these
const tools = [
tool("read_file", { path: z.string() }, ...),
tool("write_file", { path: z.string(), content: z.string() }, ...),
tool("list_files", { path: z.string() }, ...),
tool("bash", { command: z.string() }, ...),
];
// Prompt handles the domain logic
const prompt = `
When processing feedback:
1. Read existing feedback from data/feedback.json
2. Add the new feedback with your assessment of importance (1-5)
3. Write the updated file
4. If importance >= 4, create a notification file in data/alerts/
`;
```
</start_with_primitives>
<when_to_add_domain_tools>
## When to Add Domain Tools
As patterns emerge, you'll want to add domain-specific tools. This is good—but do it deliberately.
### Vocabulary Anchoring
**Add a domain tool when:** The agent needs to understand domain concepts.
A `create_note` tool teaches the agent what "note" means in your system better than "write a file to the notes directory with this format."
```typescript
// Without domain tool - agent must infer structure
await agent.chat("Create a note about the meeting");
// Agent: writes to... notes/? documents/? what format?
// With domain tool - vocabulary is anchored
tool("create_note", {
title: z.string(),
content: z.string(),
tags: z.array(z.string()).optional(),
}, async ({ title, content, tags }) => {
// Tool enforces structure, agent understands "note"
});
```
### Guardrails
**Add a domain tool when:** Some operations need validation or constraints that shouldn't be left to agent judgment.
```typescript
// publish_to_feed might enforce format requirements or content policies
tool("publish_to_feed", {
bookId: z.string(),
content: z.string(),
headline: z.string().max(100), // Enforce headline length
}, async ({ bookId, content, headline }) => {
// Validate content meets guidelines
if (containsProhibitedContent(content)) {
return { text: "Content doesn't meet guidelines", isError: true };
}
// Enforce proper structure
await feedService.publish({ bookId, content, headline, publishedAt: new Date() });
});
```
### Efficiency
**Add a domain tool when:** Common operations would take many primitive calls.
```typescript
// Primitive approach: multiple calls
await agent.chat("Get book details");
// Agent: read library.json, parse, find book, read full_text.txt, read introduction.md...
// Domain tool: one call for common operation
tool("get_book_with_content", { bookId: z.string() }, async ({ bookId }) => {
const book = await library.getBook(bookId);
const fullText = await readFile(`Research/${bookId}/full_text.txt`);
const intro = await readFile(`Research/${bookId}/introduction.md`);
return { text: JSON.stringify({ book, fullText, intro }) };
});
```
</when_to_add_domain_tools>
<the_rule>
## The Rule for Domain Tools
**Domain tools should represent one conceptual action from the user's perspective.**
They can include mechanical validation, but **judgment about what to do or whether to do it belongs in the prompt**.
### Wrong: Bundles Judgment
```typescript
// WRONG - analyze_and_publish bundles judgment into the tool
tool("analyze_and_publish", async ({ input }) => {
const analysis = analyzeContent(input); // Tool decides how to analyze
const shouldPublish = analysis.score > 0.7; // Tool decides whether to publish
if (shouldPublish) {
await publish(analysis.summary); // Tool decides what to publish
}
});
```
### Right: One Action, Agent Decides
```typescript
// RIGHT - separate tools, agent decides
tool("analyze_content", { content: z.string() }, ...); // Returns analysis
tool("publish", { content: z.string() }, ...); // Publishes what agent provides
// Prompt: "Analyze the content. If it's high quality, publish a summary."
// Agent decides what "high quality" means and what summary to write.
```
### The Test
Ask: "Who is making the decision here?"
- If the answer is "the tool code" → you've encoded judgment, refactor
- If the answer is "the agent based on the prompt" → good
</the_rule>
<keep_primitives_available>
## Keep Primitives Available
**Domain tools are shortcuts, not gates.**
Unless there's a specific reason to restrict access (security, data integrity), the agent should still be able to use underlying primitives for edge cases.
```typescript
// Domain tool for common case
tool("create_note", { title, content }, ...);
// But primitives still available for edge cases
tool("read_file", { path }, ...);
tool("write_file", { path, content }, ...);
// Agent can use create_note normally, but for weird edge case:
// "Create a note in a non-standard location with custom metadata"
// → Agent uses write_file directly
```
### When to Gate
Gating (making domain tool the only way) is appropriate for:
- **Security:** User authentication, payment processing
- **Data integrity:** Operations that must maintain invariants
- **Audit requirements:** Actions that must be logged in specific ways
**The default is open.** When you do gate something, make it a conscious decision with a clear reason.
</keep_primitives_available>
<graduating_to_code>
## Graduating to Code
Some operations will need to move from agent-orchestrated to optimized code for performance or reliability.
### The Progression
```
Stage 1: Agent uses primitives in a loop
→ Flexible, proves the concept
→ Slow, potentially expensive
Stage 2: Add domain tools for common operations
→ Faster, still agent-orchestrated
→ Agent still decides when/whether to use
Stage 3: For hot paths, implement in optimized code
→ Fast, deterministic
→ Agent can still trigger, but execution is code
```
### Example Progression
**Stage 1: Pure primitives**
```markdown
Prompt: "When user asks for a summary, read all notes in /notes,
analyze them, and write a summary to /summaries/{date}.md"
Agent: Calls read_file 20 times, reasons about content, writes summary
Time: 30 seconds, 50k tokens
```
**Stage 2: Domain tool**
```typescript
tool("get_all_notes", {}, async () => {
const notes = await readAllNotesFromDirectory();
return { text: JSON.stringify(notes) };
});
// Agent still decides how to summarize, but retrieval is faster
// Time: 10 seconds, 30k tokens
```
**Stage 3: Optimized code**
```typescript
tool("generate_weekly_summary", {}, async () => {
// Entire operation in code for hot path
const notes = await getNotes({ since: oneWeekAgo });
const summary = await generateSummary(notes); // Could use cheaper model
await writeSummary(summary);
return { text: "Summary generated" };
});
// Agent just triggers it
// Time: 2 seconds, 5k tokens
```
### The Caveat
**Even when an operation graduates to code, the agent should be able to:**
1. Trigger the optimized operation itself
2. Fall back to primitives for edge cases the optimized path doesn't handle
Graduation is about efficiency. **Parity still holds.** The agent doesn't lose capability when you optimize.
</graduating_to_code>
<decision_framework>
## Decision Framework
### Should I Add a Domain Tool?
| Question | If Yes |
|----------|--------|
| Is the agent confused about what this concept means? | Add for vocabulary anchoring |
| Does this operation need validation the agent shouldn't decide? | Add with guardrails |
| Is this a common multi-step operation? | Add for efficiency |
| Would changing behavior require code changes? | Keep as prompt instead |
### Should I Graduate to Code?
| Question | If Yes |
|----------|--------|
| Is this operation called very frequently? | Consider graduating |
| Does latency matter significantly? | Consider graduating |
| Are token costs problematic? | Consider graduating |
| Do you need deterministic behavior? | Graduate to code |
| Does the operation need complex state management? | Graduate to code |
### Should I Gate Access?
| Question | If Yes |
|----------|--------|
| Is there a security requirement? | Gate appropriately |
| Must this operation maintain data integrity? | Gate appropriately |
| Is there an audit/compliance requirement? | Gate appropriately |
| Is it just "safer" with no specific risk? | Keep primitives available |
</decision_framework>
<examples>
## Examples
### Feedback Processing Evolution
**Stage 1: Primitives only**
```typescript
tools: [read_file, write_file, bash]
prompt: "Store feedback in data/feedback.json, notify if important"
// Agent figures out JSON structure, importance criteria, notification method
```
**Stage 2: Domain tools for vocabulary**
```typescript
tools: [
store_feedback, // Anchors "feedback" concept with proper structure
send_notification, // Anchors "notify" with correct channels
read_file, // Still available for edge cases
write_file,
]
prompt: "Store feedback using store_feedback. Notify if importance >= 4."
// Agent still decides importance, but vocabulary is anchored
```
**Stage 3: Graduated hot path**
```typescript
tools: [
process_feedback_batch, // Optimized for high-volume processing
store_feedback, // For individual items
send_notification,
read_file,
write_file,
]
// Batch processing is code, but agent can still use store_feedback for special cases
```
### When NOT to Add Domain Tools
**Don't add a domain tool just to make things "cleaner":**
```typescript
// Unnecessary - agent can compose primitives
tool("organize_files_by_date", ...) // Just use move_file + judgment
// Unnecessary - puts decision in wrong place
tool("decide_file_importance", ...) // This is prompt territory
```
**Don't add a domain tool if behavior might change:**
```typescript
// Bad - locked into code
tool("generate_standard_report", ...) // What if report format evolves?
// Better - keep in prompt
prompt: "Generate a report covering X, Y, Z. Format for readability."
// Can adjust format by editing prompt
```
</examples>
<checklist>
## Checklist: Primitives to Domain Tools
### Starting Out
- [ ] Begin with pure primitives (read, write, list, bash)
- [ ] Write behavior in prompts, not tool logic
- [ ] Let patterns emerge from actual usage
### Adding Domain Tools
- [ ] Clear reason: vocabulary anchoring, guardrails, or efficiency
- [ ] Tool represents one conceptual action
- [ ] Judgment stays in prompts, not tool code
- [ ] Primitives remain available alongside domain tools
### Graduating to Code
- [ ] Hot path identified (frequent, latency-sensitive, or expensive)
- [ ] Optimized version doesn't remove agent capability
- [ ] Fallback to primitives for edge cases still works
### Gating Decisions
- [ ] Specific reason for each gate (security, integrity, audit)
- [ ] Default is open access
- [ ] Gates are conscious decisions, not defaults
</checklist>

View File

@@ -1,10 +1,188 @@
<overview> <overview>
Mobile agent-native apps face unique challenges: background execution limits, system permissions, network constraints, and cost sensitivity. This guide covers patterns for building robust agent experiences on iOS and Android. Mobile is a first-class platform for agent-native apps. It has unique constraints and opportunities. This guide covers why mobile matters, iOS storage architecture, checkpoint/resume patterns, and cost-aware design.
</overview> </overview>
<why_mobile>
## Why Mobile Matters
Mobile devices offer unique advantages for agent-native apps:
### A File System
Agents can work with files naturally, using the same primitives that work everywhere else. The filesystem is the universal interface.
### Rich Context
A walled garden you get access to. Health data, location, photos, calendars—context that doesn't exist on desktop or web. This enables deeply personalized agent experiences.
### Local Apps
Everyone has their own copy of the app. This opens opportunities that aren't fully realized yet: apps that modify themselves, fork themselves, evolve per-user. App Store policies constrain some of this today, but the foundation is there.
### Cross-Device Sync
If you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
### The Challenge
**Agents are long-running. Mobile apps are not.**
An agent might need 30 seconds, 5 minutes, or an hour to complete a task. But iOS will background your app after seconds of inactivity, and may kill it entirely to reclaim memory. The user might switch apps, take a call, or lock their phone mid-task.
This means mobile agent apps need:
- **Checkpointing** — Saving state so work isn't lost
- **Resuming** — Picking up where you left off after interruption
- **Background execution** — Using the limited time iOS gives you wisely
- **On-device vs. cloud decisions** — What runs locally vs. what needs a server
</why_mobile>
<ios_storage>
## iOS Storage Architecture
> **Needs validation:** This is an approach that works well, but better solutions may exist.
For agent-native iOS apps, use iCloud Drive's Documents folder for your shared workspace. This gives you **free, automatic multi-device sync** without building a sync layer or running a server.
### Why iCloud Documents?
| Approach | Cost | Complexity | Offline | Multi-Device |
|----------|------|------------|---------|--------------|
| Custom backend + sync | $$$ | High | Manual | Yes |
| CloudKit database | Free tier limits | Medium | Manual | Yes |
| **iCloud Documents** | Free (user's storage) | Low | Automatic | Automatic |
iCloud Documents:
- Uses user's existing iCloud storage (free 5GB, most users have more)
- Automatic sync across all user's devices
- Works offline, syncs when online
- Files visible in Files.app for transparency
- No server costs, no sync code to maintain
### Implementation: iCloud-First with Local Fallback
```swift
// Get the iCloud Documents container
func iCloudDocumentsURL() -> URL? {
FileManager.default.url(forUbiquityContainerIdentifier: nil)?
.appendingPathComponent("Documents")
}
// Your shared workspace lives in iCloud
class SharedWorkspace {
let rootURL: URL
init() {
// Use iCloud if available, fall back to local
if let iCloudURL = iCloudDocumentsURL() {
self.rootURL = iCloudURL
} else {
// Fallback to local Documents (user not signed into iCloud)
self.rootURL = FileManager.default.urls(
for: .documentDirectory,
in: .userDomainMask
).first!
}
}
// All file operations go through this root
func researchPath(for bookId: String) -> URL {
rootURL.appendingPathComponent("Research/\(bookId)")
}
func journalPath() -> URL {
rootURL.appendingPathComponent("Journal")
}
}
```
### Directory Structure in iCloud
```
iCloud Drive/
└── YourApp/ # Your app's container
└── Documents/ # Visible in Files.app
├── Journal/
│ ├── user/
│ │ └── 2025-01-15.md # Syncs across devices
│ └── agent/
│ └── 2025-01-15.md # Agent observations sync too
├── Research/
│ └── {bookId}/
│ ├── full_text.txt
│ └── sources/
├── Chats/
│ └── {conversationId}.json
└── context.md # Agent's accumulated knowledge
```
### Handling iCloud File States
iCloud files may not be downloaded locally. Handle this:
```swift
func readFile(at url: URL) throws -> String {
// iCloud may create .icloud placeholder files
if url.pathExtension == "icloud" {
// Trigger download
try FileManager.default.startDownloadingUbiquitousItem(at: url)
throw FileNotYetAvailableError()
}
return try String(contentsOf: url, encoding: .utf8)
}
// For writes, use coordinated file access
func writeFile(_ content: String, to url: URL) throws {
let coordinator = NSFileCoordinator()
var error: NSError?
coordinator.coordinate(
writingItemAt: url,
options: .forReplacing,
error: &error
) { newURL in
try? content.write(to: newURL, atomically: true, encoding: .utf8)
}
if let error = error { throw error }
}
```
### What iCloud Enables
1. **User starts experiment on iPhone** → Agent creates config file
2. **User opens app on iPad** → Same experiment visible, no sync code needed
3. **Agent logs observation on iPhone** → Syncs to iPad automatically
4. **User edits journal on iPad** → iPhone sees the edit
### Entitlements Required
Add to your app's entitlements:
```xml
<key>com.apple.developer.icloud-container-identifiers</key>
<array>
<string>iCloud.com.yourcompany.yourapp</string>
</array>
<key>com.apple.developer.icloud-services</key>
<array>
<string>CloudDocuments</string>
</array>
<key>com.apple.developer.ubiquity-container-identifiers</key>
<array>
<string>iCloud.com.yourcompany.yourapp</string>
</array>
```
### When NOT to Use iCloud Documents
- **Sensitive data** - Use Keychain or encrypted local storage instead
- **High-frequency writes** - iCloud sync has latency; use local + periodic sync
- **Large media files** - Consider CloudKit Assets or on-demand resources
- **Shared between users** - iCloud Documents is single-user; use CloudKit for sharing
</ios_storage>
<background_execution> <background_execution>
## Background Execution & Resumption ## Background Execution & Resumption
> **Needs validation:** These patterns work but better solutions may exist.
Mobile apps can be suspended or terminated at any time. Agents must handle this gracefully. Mobile apps can be suspended or terminated at any time. Agents must handle this gracefully.
### The Challenge ### The Challenge
@@ -623,13 +801,48 @@ class AgentOrchestrator {
``` ```
</battery_awareness> </battery_awareness>
<on_device_vs_cloud>
## On-Device vs. Cloud
Understanding what runs where in a mobile agent-native app:
| Component | On-Device | Cloud |
|-----------|-----------|-------|
| Orchestration | ✅ | |
| Tool execution | ✅ (file ops, photo access, HealthKit) | |
| LLM calls | | ✅ (Anthropic API) |
| Checkpoints | ✅ (local files) | Optional via iCloud |
| Long-running agents | Limited by iOS | Possible with server |
### Implications
**Network required for reasoning:**
- The app needs network connectivity for LLM calls
- Design tools to degrade gracefully when network is unavailable
- Consider offline caching for common queries
**Data stays local:**
- File operations happen on device
- Sensitive data never leaves the device unless explicitly synced
- Privacy is preserved by default
**Long-running agents:**
For truly long-running agents (hours), consider a server-side orchestrator that can run indefinitely, with the mobile app as a viewer and input mechanism.
</on_device_vs_cloud>
<checklist> <checklist>
## Mobile Agent-Native Checklist ## Mobile Agent-Native Checklist
**iOS Storage:**
- [ ] iCloud Documents as primary storage (or conscious alternative)
- [ ] Local Documents fallback when iCloud unavailable
- [ ] Handle `.icloud` placeholder files (trigger download)
- [ ] Use NSFileCoordinator for conflict-safe writes
**Background Execution:** **Background Execution:**
- [ ] Checkpoint/resume implemented for all agent sessions - [ ] Checkpoint/resume implemented for all agent sessions
- [ ] State machine for agent lifecycle (idle, running, backgrounded, etc.) - [ ] State machine for agent lifecycle (idle, running, backgrounded, etc.)
- [ ] Background task extension for critical saves - [ ] Background task extension for critical saves (30 second window)
- [ ] User-visible status for backgrounded agents - [ ] User-visible status for backgrounded agents
**Permissions:** **Permissions:**

View File

@@ -0,0 +1,443 @@
<overview>
Agent-native architecture has consequences for how products feel, not just how they're built. This document covers progressive disclosure of complexity, discovering latent demand through agent usage, and designing approval flows that match stakes and reversibility.
</overview>
<progressive_disclosure>
## Progressive Disclosure of Complexity
The best agent-native applications are simple to start but endlessly powerful.
### The Excel Analogy
Excel is the canonical example: you can use it for a grocery list, or you can build complex financial models. The same tool, radically different depths of use.
Claude Code has this quality: fix a typo, or refactor an entire codebase. The interface is the same—natural language—but the capability scales with the ask.
### The Pattern
Agent-native applications should aspire to this:
**Simple entry:** Basic requests work immediately with no learning curve
```
User: "Organize my downloads"
Agent: [Does it immediately, no configuration needed]
```
**Discoverable depth:** Users find they can do more as they explore
```
User: "Organize my downloads by project"
Agent: [Adapts to preference]
User: "Every Monday, review last week's downloads"
Agent: [Sets up recurring workflow]
```
**No ceiling:** Power users can push the system in ways you didn't anticipate
```
User: "Cross-reference my downloads with my calendar and flag
anything I downloaded during a meeting that I haven't
followed up on"
Agent: [Composes capabilities to accomplish this]
```
### How This Emerges
This isn't something you design directly. It **emerges naturally from the architecture:**
1. When features are prompts and tools are composable...
2. Users can start simple ("organize my downloads")...
3. And gradually discover complexity ("every Monday, review last week's...")...
4. Without you having to build each level explicitly
The agent meets users where they are.
### Design Implications
- **Don't force configuration upfront** - Let users start immediately
- **Don't hide capabilities** - Make them discoverable through use
- **Don't cap complexity** - If the agent can do it, let users ask for it
- **Do provide hints** - Help users discover what's possible
</progressive_disclosure>
<latent_demand_discovery>
## Latent Demand Discovery
Traditional product development: imagine what users want, build it, see if you're right.
Agent-native product development: build a capable foundation, observe what users ask the agent to do, formalize the patterns that emerge.
### The Shift
**Traditional approach:**
```
1. Imagine features users might want
2. Build them
3. Ship
4. Hope you guessed right
5. If wrong, rebuild
```
**Agent-native approach:**
```
1. Build capable foundation (atomic tools, parity)
2. Ship
3. Users ask agent for things
4. Observe what they're asking for
5. Patterns emerge
6. Formalize patterns into domain tools or prompts
7. Repeat
```
### The Flywheel
```
Build with atomic tools and parity
Users ask for things you didn't anticipate
Agent composes tools to accomplish them
(or fails, revealing a capability gap)
You observe patterns in what's being requested
Add domain tools or prompts to optimize common patterns
(Repeat)
```
### What You Learn
**When users ask and the agent succeeds:**
- This is a real need
- Your architecture supports it
- Consider optimizing with a domain tool if it's common
**When users ask and the agent fails:**
- This is a real need
- You have a capability gap
- Fix the gap: add tool, fix parity, improve context
**When users don't ask for something:**
- Maybe they don't need it
- Or maybe they don't know it's possible (capability hiding)
### Implementation
**Log agent requests:**
```typescript
async function handleAgentRequest(request: string) {
// Log what users are asking for
await analytics.log({
type: 'agent_request',
request: request,
timestamp: Date.now(),
});
// Process request...
}
```
**Track success/failure:**
```typescript
async function completeAgentSession(session: AgentSession) {
await analytics.log({
type: 'agent_session',
request: session.initialRequest,
succeeded: session.status === 'completed',
toolsUsed: session.toolCalls.map(t => t.name),
iterations: session.iterationCount,
});
}
```
**Review patterns:**
- What are users asking for most?
- What's failing? Why?
- What would benefit from a domain tool?
- What needs better context injection?
### Example: Discovering "Weekly Review"
```
Week 1: Users start asking "summarize my activity this week"
Agent: Composes list_files + read_file, works but slow
Week 2: More users asking similar things
Pattern emerges: weekly review is common
Week 3: Add prompt section for weekly review
Faster, more consistent, still flexible
Week 4: If still common and performance matters
Add domain tool: generate_weekly_summary
```
You didn't have to guess that weekly review would be popular. You discovered it.
</latent_demand_discovery>
<approval_and_agency>
## Approval and User Agency
When agents take unsolicited actions—doing things on their own rather than responding to explicit requests—you need to decide how much autonomy to grant.
> **Note:** This framework applies to unsolicited agent actions. If the user explicitly asks the agent to do something ("send that email"), that's already approval—the agent just does it.
### The Stakes/Reversibility Matrix
Consider two dimensions:
- **Stakes:** How much does it matter if this goes wrong?
- **Reversibility:** How easy is it to undo?
| Stakes | Reversibility | Pattern | Example |
|--------|---------------|---------|---------|
| Low | Easy | **Auto-apply** | Organizing files |
| Low | Hard | **Quick confirm** | Publishing to a private feed |
| High | Easy | **Suggest + apply** | Code changes with undo |
| High | Hard | **Explicit approval** | Sending emails, payments |
### Patterns in Detail
**Auto-apply (low stakes, easy reversal):**
```
Agent: [Organizes files into folders]
Agent: "I organized your downloads into folders by type.
You can undo with Cmd+Z or move them back."
```
User doesn't need to approve—it's easy to undo and doesn't matter much.
**Quick confirm (low stakes, hard reversal):**
```
Agent: "I've drafted a post about your reading insights.
Publish to your feed?"
[Publish] [Edit first] [Cancel]
```
One-tap confirm because stakes are low, but it's hard to un-publish.
**Suggest + apply (high stakes, easy reversal):**
```
Agent: "I recommend these code changes to fix the bug:
[Shows diff]
Apply? Changes can be reverted with git."
[Apply] [Modify] [Cancel]
```
Shows what will happen, makes reversal clear.
**Explicit approval (high stakes, hard reversal):**
```
Agent: "I've drafted this email to your team about the deadline change:
[Shows full email]
This will send immediately and cannot be unsent.
Type 'send' to confirm."
```
Requires explicit action, makes consequences clear.
### Implementation
```swift
enum ApprovalLevel {
case autoApply // Just do it
case quickConfirm // One-tap approval
case suggestApply // Show preview, ask to apply
case explicitApproval // Require explicit confirmation
}
func approvalLevelFor(action: AgentAction) -> ApprovalLevel {
let stakes = assessStakes(action)
let reversibility = assessReversibility(action)
switch (stakes, reversibility) {
case (.low, .easy): return .autoApply
case (.low, .hard): return .quickConfirm
case (.high, .easy): return .suggestApply
case (.high, .hard): return .explicitApproval
}
}
func assessStakes(_ action: AgentAction) -> Stakes {
switch action {
case .organizeFiles: return .low
case .publishToFeed: return .low
case .modifyCode: return .high
case .sendEmail: return .high
case .makePayment: return .high
}
}
func assessReversibility(_ action: AgentAction) -> Reversibility {
switch action {
case .organizeFiles: return .easy // Can move back
case .publishToFeed: return .hard // People might see it
case .modifyCode: return .easy // Git revert
case .sendEmail: return .hard // Can't unsend
case .makePayment: return .hard // Money moved
}
}
```
### Self-Modification Considerations
When agents can modify their own behavior—changing prompts, updating preferences, adjusting workflows—the goals are:
1. **Visibility:** User can see what changed
2. **Understanding:** User understands the effects
3. **Rollback:** User can undo changes
Approval flows are one way to achieve this. Audit logs with easy rollback could be another. **The principle is: make it legible.**
```swift
// When agent modifies its own prompt
func agentSelfModify(change: PromptChange) async {
// Log the change
await auditLog.record(change)
// Create checkpoint for rollback
await createCheckpoint(currentState)
// Notify user (could be async/batched)
await notifyUser("I've adjusted my approach: \(change.summary)")
// Apply change
await applyChange(change)
}
```
</approval_and_agency>
<capability_visibility>
## Capability Visibility
Users need to discover what the agent can do. Hidden capabilities lead to underutilization.
### The Problem
```
User: "Help me with my reading"
Agent: "What would you like help with?"
// Agent doesn't mention it can publish to feed, research books,
// generate introductions, analyze themes...
```
The agent can do these things, but the user doesn't know.
### Solutions
**Onboarding hints:**
```
Agent: "I can help you with your reading in several ways:
- Research any book (web search + save findings)
- Generate personalized introductions
- Publish insights to your reading feed
- Analyze themes across your library
What interests you?"
```
**Contextual suggestions:**
```
User: "I just finished reading 1984"
Agent: "Great choice! Would you like me to:
- Research historical context?
- Compare it to other books in your library?
- Publish an insight about it to your feed?"
```
**Progressive revelation:**
```
// After user uses basic features
Agent: "By the way, you can also ask me to set up
recurring tasks, like 'every Monday, review my
reading progress.' Just let me know!"
```
### Balance
- **Don't overwhelm** with all capabilities upfront
- **Do reveal** capabilities naturally through use
- **Don't assume** users will discover things on their own
- **Do make** capabilities visible when relevant
</capability_visibility>
<designing_for_trust>
## Designing for Trust
Agent-native apps require trust. Users are giving an AI significant capability. Build trust through:
### Transparency
- Show what the agent is doing (tool calls, progress)
- Explain reasoning when it matters
- Make all agent work inspectable (files, logs)
### Predictability
- Consistent behavior for similar requests
- Clear patterns for when approval is needed
- No surprises in what the agent can access
### Reversibility
- Easy undo for agent actions
- Checkpoints before significant changes
- Clear rollback paths
### Control
- User can stop agent at any time
- User can adjust agent behavior (prompts, preferences)
- User can restrict capabilities if desired
### Implementation
```swift
struct AgentTransparency {
// Show what's happening
func onToolCall(_ tool: ToolCall) {
showInUI("Using \(tool.name)...")
}
// Explain reasoning
func onDecision(_ decision: AgentDecision) {
if decision.needsExplanation {
showInUI("I chose this because: \(decision.reasoning)")
}
}
// Make work inspectable
func onOutput(_ output: AgentOutput) {
// All output is in files user can see
// Or in visible UI state
}
}
```
</designing_for_trust>
<checklist>
## Product Design Checklist
### Progressive Disclosure
- [ ] Basic requests work immediately (no config)
- [ ] Depth is discoverable through use
- [ ] No artificial ceiling on complexity
- [ ] Capability hints provided
### Latent Demand Discovery
- [ ] Agent requests are logged
- [ ] Success/failure is tracked
- [ ] Patterns are reviewed regularly
- [ ] Common patterns formalized into tools/prompts
### Approval & Agency
- [ ] Stakes assessed for each action type
- [ ] Reversibility assessed for each action type
- [ ] Approval pattern matches stakes/reversibility
- [ ] Self-modification is legible (visible, understandable, reversible)
### Capability Visibility
- [ ] Onboarding reveals key capabilities
- [ ] Contextual suggestions provided
- [ ] Users aren't expected to guess what's possible
### Trust
- [ ] Agent actions are transparent
- [ ] Behavior is predictable
- [ ] Actions are reversible
- [ ] User has control
</checklist>