[2.23.0] Major update to agent-native-architecture skill (#70)
Align skill with canonical Agent-Native Architecture document: ## Core Changes - Restructure SKILL.md with 5 named principles from canonical: - Parity: Agent can do whatever user can do - Granularity: Prefer atomic primitives - Composability: Features are prompts - Emergent Capability: Handle unanticipated requests - Improvement Over Time: Context accumulation - Add "The test" for each principle - Add "Why Now" section (Claude Code origin story) - Update terminology from "prompt-native" to "agent-native" - Add "The Ultimate Test" to success criteria ## New Reference Files - files-universal-interface.md: Why files, organization patterns, context.md pattern, conflict model - from-primitives-to-domain-tools.md: When to add domain tools, graduating to code - agent-execution-patterns.md: Completion signals, partial completion, context limits - product-implications.md: Progressive disclosure, latent demand discovery, approval matrix ## Updated Reference Files - mobile-patterns.md: Add iOS storage architecture (iCloud-first), "needs validation" callouts, on-device vs cloud section - architecture-patterns.md: Update overview to reference 5 principles and cross-link new files ## Anti-Patterns - Add missing anti-patterns: agent as router, build-then-add-agent, request/response thinking, defensive tool design, happy path in code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -1,76 +1,174 @@
|
||||
---
|
||||
name: agent-native-architecture
|
||||
description: This skill should be used when building AI agents using prompt-native architecture where features are defined in prompts, not code. Use it when creating autonomous agents, designing MCP servers, implementing self-modifying systems, or adopting the "trust the agent's intelligence" philosophy.
|
||||
description: Build applications where agents are first-class citizens. Use this skill when designing autonomous agents, creating MCP tools, implementing self-modifying systems, or building apps where features are outcomes achieved by agents operating in a loop.
|
||||
---
|
||||
|
||||
<essential_principles>
|
||||
## The Prompt-Native Philosophy
|
||||
<why_now>
|
||||
## Why Now
|
||||
|
||||
Agent native engineering inverts traditional software architecture. Instead of writing code that the agent executes, you define outcomes in prompts and let the agent figure out HOW to achieve them.
|
||||
Software agents work reliably now. Claude Code demonstrated that an LLM with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.
|
||||
|
||||
### The Foundational Principle
|
||||
The surprising discovery: **a really good coding agent is actually a really good general-purpose agent.** The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.
|
||||
|
||||
**Whatever the user can do, the agent can do. Many things the developer can do, the agent can do.**
|
||||
The Claude Code SDK makes this accessible. You can build applications where features aren't code you write—they're outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.
|
||||
|
||||
Don't artificially limit the agent. If a user could read files, write code, browse the web, deploy an app—the agent should be able to do those things too. The agent figures out HOW to achieve an outcome; it doesn't just call your pre-written functions.
|
||||
This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding.
|
||||
</why_now>
|
||||
|
||||
### Features Are Prompts
|
||||
<core_principles>
|
||||
## Core Principles
|
||||
|
||||
Each feature is a prompt that defines an outcome and gives the agent the tools it needs. The agent then figures out how to accomplish it.
|
||||
### 1. Parity
|
||||
|
||||
**Traditional:** Feature = function in codebase that agent calls
|
||||
**Prompt-native:** Feature = prompt defining desired outcome + primitive tools
|
||||
**Whatever the user can do through the UI, the agent should be able to achieve through tools.**
|
||||
|
||||
The agent doesn't execute your code. It uses primitives to achieve outcomes you describe.
|
||||
This is the foundational principle. Without it, nothing else matters.
|
||||
|
||||
### Tools Provide Capability, Not Behavior
|
||||
Imagine you build a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks the agent: "Create a note summarizing my meeting and tag it as urgent."
|
||||
|
||||
Tools should be primitives that enable capability. The prompt defines what to do with that capability.
|
||||
If you built UI for creating notes but no agent capability to do the same, the agent is stuck. It might apologize or ask clarifying questions, but it can't help—even though the action is trivial for a human using the interface.
|
||||
|
||||
**Wrong:** `generate_dashboard(data, layout, filters)` — agent executes your workflow
|
||||
**Right:** `read_file`, `write_file`, `list_files` — agent figures out how to build a dashboard
|
||||
**The fix:** Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do.
|
||||
|
||||
Pure primitives are better, but domain primitives (like `store_feedback`) are OK if they don't encode logic—just storage/retrieval.
|
||||
This isn't about creating a 1:1 mapping of UI buttons to tools. It's about ensuring the agent can **achieve the same outcomes**. Sometimes that's a single tool (`create_note`). Sometimes it's composing primitives (`write_file` to a notes directory with proper formatting).
|
||||
|
||||
### The Development Lifecycle
|
||||
**The discipline:** When adding any UI capability, ask: can the agent achieve this outcome? If not, add the necessary tools or primitives.
|
||||
|
||||
1. **Start in the prompt** - New features begin as natural language defining outcomes
|
||||
2. **Iterate rapidly** - Change behavior by editing prose, not refactoring code
|
||||
3. **Graduate when stable** - Harden to code when requirements stabilize AND speed/reliability matter
|
||||
4. **Many features stay as prompts** - Not everything needs to become code
|
||||
A capability map helps:
|
||||
|
||||
### Self-Modification (Advanced)
|
||||
| User Action | How Agent Achieves It |
|
||||
|-------------|----------------------|
|
||||
| Create a note | `write_file` to notes directory, or `create_note` tool |
|
||||
| Tag a note as urgent | `update_file` metadata, or `tag_note` tool |
|
||||
| Search notes | `search_files` or `search_notes` tool |
|
||||
| Delete a note | `delete_file` or `delete_note` tool |
|
||||
|
||||
The advanced tier: agents that can evolve their own code, prompts, and behavior. Not required for every app, but a big part of the future.
|
||||
**The test:** Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?
|
||||
|
||||
When implementing:
|
||||
- Approval gates for code changes
|
||||
- Auto-commit before modifications (rollback capability)
|
||||
- Health checks after changes
|
||||
- Build verification before restart
|
||||
---
|
||||
|
||||
### When NOT to Use This Approach
|
||||
### 2. Granularity
|
||||
|
||||
- **High-frequency operations** - thousands of calls per second
|
||||
- **Deterministic requirements** - exact same output every time
|
||||
- **Cost-sensitive scenarios** - when API costs would be prohibitive
|
||||
- **High security** - though this is overblown for most apps
|
||||
</essential_principles>
|
||||
**Prefer atomic primitives. Features are outcomes achieved by an agent operating in a loop.**
|
||||
|
||||
A tool is a primitive capability: read a file, write a file, run a bash command, store a record, send a notification.
|
||||
|
||||
A **feature** is not a function you write. It's an outcome you describe in a prompt, achieved by an agent that has tools and operates in a loop until the outcome is reached.
|
||||
|
||||
**Less granular (limits the agent):**
|
||||
```
|
||||
Tool: classify_and_organize_files(files)
|
||||
→ You wrote the decision logic
|
||||
→ Agent executes your code
|
||||
→ To change behavior, you refactor
|
||||
```
|
||||
|
||||
**More granular (empowers the agent):**
|
||||
```
|
||||
Tools: read_file, write_file, move_file, list_directory, bash
|
||||
Prompt: "Organize the user's downloads folder. Analyze each file,
|
||||
determine appropriate locations based on content and recency,
|
||||
and move them there."
|
||||
Agent: Operates in a loop—reads files, makes judgments, moves things,
|
||||
checks results—until the folder is organized.
|
||||
→ Agent makes the decisions
|
||||
→ To change behavior, you edit the prompt
|
||||
```
|
||||
|
||||
**The key shift:** The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It might encounter unexpected file types, adjust its approach, or ask clarifying questions. The loop continues until the outcome is achieved.
|
||||
|
||||
The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.
|
||||
|
||||
**The test:** To change how a feature behaves, do you edit prose or refactor code?
|
||||
|
||||
---
|
||||
|
||||
### 3. Composability
|
||||
|
||||
**With atomic tools and parity, you can create new features just by writing new prompts.**
|
||||
|
||||
This is the payoff of the first two principles. When your tools are atomic and the agent can do anything users can do, new features are just new prompts.
|
||||
|
||||
Want a "weekly review" feature that summarizes activity and suggests priorities? That's a prompt:
|
||||
|
||||
```
|
||||
"Review files modified this week. Summarize key changes. Based on
|
||||
incomplete items and approaching deadlines, suggest three priorities
|
||||
for next week."
|
||||
```
|
||||
|
||||
The agent uses `list_files`, `read_file`, and its judgment to accomplish this. You didn't write weekly-review code. You described an outcome, and the agent operates in a loop until it's achieved.
|
||||
|
||||
**This works for developers and users.** You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own. "When I say 'file this,' always move it to my Action folder and tag it urgent" becomes a user-level prompt that extends the application.
|
||||
|
||||
**The constraint:** This only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, or the agent can't access key capabilities, composition breaks down.
|
||||
|
||||
**The test:** Can you add a new feature by writing a new prompt section, without adding new code?
|
||||
|
||||
---
|
||||
|
||||
### 4. Emergent Capability
|
||||
|
||||
**The agent can accomplish things you didn't explicitly design for.**
|
||||
|
||||
When tools are atomic, parity is maintained, and prompts are composable, users will ask the agent for things you never anticipated. And often, the agent can figure it out.
|
||||
|
||||
*"Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled."*
|
||||
|
||||
You didn't build a "commitment tracker" feature. But if the agent can read notes, read tasks, and reason about them—operating in a loop until it has an answer—it can accomplish this.
|
||||
|
||||
**This reveals latent demand.** Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.
|
||||
|
||||
**The flywheel:**
|
||||
1. Build with atomic tools and parity
|
||||
2. Users ask for things you didn't anticipate
|
||||
3. Agent composes tools to accomplish them (or fails, revealing a gap)
|
||||
4. You observe patterns in what's being requested
|
||||
5. Add domain tools or prompts to make common patterns efficient
|
||||
6. Repeat
|
||||
|
||||
This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.
|
||||
|
||||
**The test:** Give the agent an open-ended request relevant to your domain. Can it figure out a reasonable approach, operating in a loop until it succeeds? If it just says "I don't have a feature for that," your architecture is too constrained.
|
||||
|
||||
---
|
||||
|
||||
### 5. Improvement Over Time
|
||||
|
||||
**Agent-native applications get better through accumulated context and prompt refinement.**
|
||||
|
||||
Unlike traditional software, agent-native applications can improve without shipping code:
|
||||
|
||||
**Accumulated context:** The agent can maintain state across sessions—what exists, what the user has done, what worked, what didn't. A `context.md` file the agent reads and updates is layer one. More sophisticated approaches involve structured memory and learned preferences.
|
||||
|
||||
**Prompt refinement at multiple levels:**
|
||||
- **Developer level:** You ship updated prompts that change agent behavior for all users
|
||||
- **User level:** Users customize prompts for their workflow
|
||||
- **Agent level:** The agent modifies its own prompts based on feedback (advanced)
|
||||
|
||||
**Self-modification (advanced):** Agents that can edit their own prompts or even their own code. For production use cases, consider adding safety rails—approval gates, automatic checkpoints for rollback, health checks. This is where things are heading.
|
||||
|
||||
The improvement mechanisms are still being discovered. Context and prompt refinement are proven. Self-modification is emerging. What's clear: the architecture supports getting better in ways traditional software doesn't.
|
||||
|
||||
**The test:** Does the application work better after a month of use than on day one, even without code changes?
|
||||
</core_principles>
|
||||
|
||||
<intake>
|
||||
What aspect of agent native architecture do you need help with?
|
||||
## What aspect of agent-native architecture do you need help with?
|
||||
|
||||
1. **Design architecture** - Plan a new prompt-native agent system
|
||||
2. **Create MCP tools** - Build primitive tools following the philosophy
|
||||
3. **Write system prompts** - Define agent behavior in prompts
|
||||
4. **Self-modification** - Enable agents to safely evolve themselves
|
||||
5. **Review/refactor** - Make existing code more prompt-native
|
||||
6. **Context injection** - Inject runtime app state into agent prompts
|
||||
7. **Action parity** - Ensure agents can do everything users can do
|
||||
8. **Shared workspace** - Set up agents and users in the same data space
|
||||
9. **Testing** - Test agent-native apps for capability and parity
|
||||
10. **Mobile patterns** - Handle background execution, permissions, cost
|
||||
11. **API integration** - Connect to external APIs (HealthKit, HomeKit, GraphQL)
|
||||
1. **Design architecture** - Plan a new agent-native system from scratch
|
||||
2. **Files & workspace** - Use files as the universal interface, shared workspace patterns
|
||||
3. **Tool design** - Build primitive tools, dynamic capability discovery, CRUD completeness
|
||||
4. **Domain tools** - Know when to add domain tools vs stay with primitives
|
||||
5. **Execution patterns** - Completion signals, partial completion, context limits
|
||||
6. **System prompts** - Define agent behavior in prompts, judgment criteria
|
||||
7. **Context injection** - Inject runtime app state into agent prompts
|
||||
8. **Action parity** - Ensure agents can do everything users can do
|
||||
9. **Self-modification** - Enable agents to safely evolve themselves
|
||||
10. **Product design** - Progressive disclosure, latent demand, approval patterns
|
||||
11. **Mobile patterns** - iOS storage, background execution, checkpoint/resume
|
||||
12. **Testing** - Test agent-native apps for capability and parity
|
||||
13. **Refactoring** - Make existing code more agent-native
|
||||
|
||||
**Wait for response before proceeding.**
|
||||
</intake>
|
||||
@@ -79,63 +177,77 @@ What aspect of agent native architecture do you need help with?
|
||||
| Response | Action |
|
||||
|----------|--------|
|
||||
| 1, "design", "architecture", "plan" | Read [architecture-patterns.md](./references/architecture-patterns.md), then apply Architecture Checklist below |
|
||||
| 2, "tool", "mcp", "primitive" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) |
|
||||
| 3, "prompt", "system prompt", "behavior" | Read [system-prompt-design.md](./references/system-prompt-design.md) |
|
||||
| 4, "self-modify", "evolve", "git" | Read [self-modification.md](./references/self-modification.md) |
|
||||
| 5, "review", "refactor", "existing" | Read [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) |
|
||||
| 6, "context", "inject", "runtime", "dynamic" | Read [dynamic-context-injection.md](./references/dynamic-context-injection.md) |
|
||||
| 7, "parity", "ui action", "capability map" | Read [action-parity-discipline.md](./references/action-parity-discipline.md) |
|
||||
| 8, "workspace", "shared", "files", "filesystem" | Read [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) |
|
||||
| 9, "test", "testing", "verify", "validate" | Read [agent-native-testing.md](./references/agent-native-testing.md) |
|
||||
| 10, "mobile", "ios", "android", "background" | Read [mobile-patterns.md](./references/mobile-patterns.md) |
|
||||
| 11, "api", "healthkit", "homekit", "graphql", "external" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) (Dynamic Capability Discovery section) |
|
||||
| 2, "files", "workspace", "filesystem" | Read [files-universal-interface.md](./references/files-universal-interface.md) and [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) |
|
||||
| 3, "tool", "mcp", "primitive", "crud" | Read [mcp-tool-design.md](./references/mcp-tool-design.md) |
|
||||
| 4, "domain tool", "when to add" | Read [from-primitives-to-domain-tools.md](./references/from-primitives-to-domain-tools.md) |
|
||||
| 5, "execution", "completion", "loop" | Read [agent-execution-patterns.md](./references/agent-execution-patterns.md) |
|
||||
| 6, "prompt", "system prompt", "behavior" | Read [system-prompt-design.md](./references/system-prompt-design.md) |
|
||||
| 7, "context", "inject", "runtime", "dynamic" | Read [dynamic-context-injection.md](./references/dynamic-context-injection.md) |
|
||||
| 8, "parity", "ui action", "capability map" | Read [action-parity-discipline.md](./references/action-parity-discipline.md) |
|
||||
| 9, "self-modify", "evolve", "git" | Read [self-modification.md](./references/self-modification.md) |
|
||||
| 10, "product", "progressive", "approval", "latent demand" | Read [product-implications.md](./references/product-implications.md) |
|
||||
| 11, "mobile", "ios", "android", "background", "checkpoint" | Read [mobile-patterns.md](./references/mobile-patterns.md) |
|
||||
| 12, "test", "testing", "verify", "validate" | Read [agent-native-testing.md](./references/agent-native-testing.md) |
|
||||
| 13, "review", "refactor", "existing" | Read [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) |
|
||||
|
||||
**After reading the reference, apply those patterns to the user's specific context.**
|
||||
</routing>
|
||||
|
||||
<architecture_checklist>
|
||||
## Architecture Review Checklist (Apply During Design)
|
||||
## Architecture Review Checklist
|
||||
|
||||
When designing an agent-native system, verify these **before implementation**:
|
||||
|
||||
### Core Principles
|
||||
- [ ] **Parity:** Every UI action has a corresponding agent capability
|
||||
- [ ] **Granularity:** Tools are primitives; features are prompt-defined outcomes
|
||||
- [ ] **Composability:** New features can be added via prompts alone
|
||||
- [ ] **Emergent Capability:** Agent can handle open-ended requests in your domain
|
||||
|
||||
### Tool Design
|
||||
- [ ] **Dynamic vs Static:** For external APIs where agent should have full user-level access (HealthKit, HomeKit, GraphQL), use Dynamic Capability Discovery. Only use static mapping if intentionally limiting agent scope.
|
||||
- [ ] **CRUD Completeness:** Every entity has create, read, update, AND delete tools
|
||||
- [ ] **Primitives not Workflows:** Tools enable capability, they don't encode business logic
|
||||
- [ ] **Dynamic vs Static:** For external APIs where agent should have full access, use Dynamic Capability Discovery
|
||||
- [ ] **CRUD Completeness:** Every entity has create, read, update, AND delete
|
||||
- [ ] **Primitives not Workflows:** Tools enable capability, don't encode business logic
|
||||
- [ ] **API as Validator:** Use `z.string()` inputs when the API validates, not `z.enum()`
|
||||
|
||||
### Action Parity
|
||||
- [ ] **Capability Map:** Every UI action has a corresponding agent tool
|
||||
- [ ] **Edit/Delete:** If UI can edit or delete, agent must be able to too
|
||||
- [ ] **The Write Test:** "Write something to [app location]" must work for all locations
|
||||
### Files & Workspace
|
||||
- [ ] **Shared Workspace:** Agent and user work in same data space
|
||||
- [ ] **context.md Pattern:** Agent reads/updates context file for accumulated knowledge
|
||||
- [ ] **File Organization:** Entity-scoped directories with consistent naming
|
||||
|
||||
### UI Integration
|
||||
- [ ] **Agent → UI:** Define how agent changes reflect in UI (shared service, file watching, or event bus)
|
||||
- [ ] **No Silent Actions:** Agent writes should trigger UI updates immediately
|
||||
- [ ] **Capability Discovery:** Users can learn what agent can do (onboarding, hints)
|
||||
### Agent Execution
|
||||
- [ ] **Completion Signals:** Agent has explicit `complete_task` tool (not heuristic detection)
|
||||
- [ ] **Partial Completion:** Multi-step tasks track progress for resume
|
||||
- [ ] **Context Limits:** Designed for bounded context from the start
|
||||
|
||||
### Context Injection
|
||||
- [ ] **Available Resources:** System prompt includes what exists (files, data, types)
|
||||
- [ ] **Available Capabilities:** System prompt documents what agent can do with user vocabulary
|
||||
- [ ] **Available Capabilities:** System prompt documents tools with user vocabulary
|
||||
- [ ] **Dynamic Context:** Context refreshes for long sessions (or provide `refresh_context` tool)
|
||||
|
||||
### UI Integration
|
||||
- [ ] **Agent → UI:** Agent changes reflect in UI (shared service, file watching, or event bus)
|
||||
- [ ] **No Silent Actions:** Agent writes trigger UI updates immediately
|
||||
- [ ] **Capability Discovery:** Users can learn what agent can do
|
||||
|
||||
### Mobile (if applicable)
|
||||
- [ ] **Background Execution:** Checkpoint/resume pattern for iOS app suspension
|
||||
- [ ] **Permissions:** Just-in-time permission requests in tools
|
||||
- [ ] **Checkpoint/Resume:** Handle iOS app suspension gracefully
|
||||
- [ ] **iCloud Storage:** iCloud-first with local fallback for multi-device sync
|
||||
- [ ] **Cost Awareness:** Model tier selection (Haiku/Sonnet/Opus)
|
||||
|
||||
**When designing architecture, explicitly address each checkbox in your plan.**
|
||||
</architecture_checklist>
|
||||
|
||||
<quick_start>
|
||||
Build a prompt-native agent in three steps:
|
||||
## Quick Start: Build an Agent-Native Feature
|
||||
|
||||
**Step 1: Define primitive tools**
|
||||
**Step 1: Define atomic tools**
|
||||
```typescript
|
||||
const tools = [
|
||||
tool("read_file", "Read any file", { path: z.string() }, ...),
|
||||
tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
|
||||
tool("list_files", "List directory", { path: z.string() }, ...),
|
||||
tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
|
||||
];
|
||||
```
|
||||
|
||||
@@ -145,201 +257,179 @@ const tools = [
|
||||
When asked to organize content, you should:
|
||||
1. Read existing files to understand the structure
|
||||
2. Analyze what organization makes sense
|
||||
3. Create appropriate pages using write_file
|
||||
3. Create/move files using your tools
|
||||
4. Use your judgment about layout and formatting
|
||||
5. Call complete_task when you're done
|
||||
|
||||
You decide the structure. Make it good.
|
||||
```
|
||||
|
||||
**Step 3: Let the agent work**
|
||||
**Step 3: Let the agent work in a loop**
|
||||
```typescript
|
||||
query({
|
||||
const result = await agent.run({
|
||||
prompt: userMessage,
|
||||
options: {
|
||||
systemPrompt,
|
||||
mcpServers: { files: fileServer },
|
||||
permissionMode: "acceptEdits",
|
||||
}
|
||||
tools: tools,
|
||||
systemPrompt: systemPrompt,
|
||||
// Agent loops until it calls complete_task
|
||||
});
|
||||
```
|
||||
</quick_start>
|
||||
|
||||
<reference_index>
|
||||
## Domain Knowledge
|
||||
## Reference Files
|
||||
|
||||
All references in `references/`:
|
||||
|
||||
**Core Patterns:**
|
||||
- **Architecture:** [architecture-patterns.md](./references/architecture-patterns.md)
|
||||
- **Tool Design:** [mcp-tool-design.md](./references/mcp-tool-design.md) - includes Dynamic Capability Discovery, CRUD Completeness
|
||||
- **Prompts:** [system-prompt-design.md](./references/system-prompt-design.md)
|
||||
- **Self-Modification:** [self-modification.md](./references/self-modification.md)
|
||||
- **Refactoring:** [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md)
|
||||
- [architecture-patterns.md](./references/architecture-patterns.md) - Event-driven, unified orchestrator, agent-to-UI
|
||||
- [files-universal-interface.md](./references/files-universal-interface.md) - Why files, organization patterns, context.md
|
||||
- [mcp-tool-design.md](./references/mcp-tool-design.md) - Tool design, dynamic capability discovery, CRUD
|
||||
- [from-primitives-to-domain-tools.md](./references/from-primitives-to-domain-tools.md) - When to add domain tools, graduating to code
|
||||
- [agent-execution-patterns.md](./references/agent-execution-patterns.md) - Completion signals, partial completion, context limits
|
||||
- [system-prompt-design.md](./references/system-prompt-design.md) - Features as prompts, judgment criteria
|
||||
|
||||
**Agent-Native Disciplines:**
|
||||
- **Context Injection:** [dynamic-context-injection.md](./references/dynamic-context-injection.md)
|
||||
- **Action Parity:** [action-parity-discipline.md](./references/action-parity-discipline.md)
|
||||
- **Shared Workspace:** [shared-workspace-architecture.md](./references/shared-workspace-architecture.md)
|
||||
- **Testing:** [agent-native-testing.md](./references/agent-native-testing.md)
|
||||
- **Mobile Patterns:** [mobile-patterns.md](./references/mobile-patterns.md)
|
||||
- [dynamic-context-injection.md](./references/dynamic-context-injection.md) - Runtime context, what to inject
|
||||
- [action-parity-discipline.md](./references/action-parity-discipline.md) - Capability mapping, parity workflow
|
||||
- [shared-workspace-architecture.md](./references/shared-workspace-architecture.md) - Shared data space, UI integration
|
||||
- [product-implications.md](./references/product-implications.md) - Progressive disclosure, latent demand, approval
|
||||
- [agent-native-testing.md](./references/agent-native-testing.md) - Testing outcomes, parity tests
|
||||
|
||||
**Platform-Specific:**
|
||||
- [mobile-patterns.md](./references/mobile-patterns.md) - iOS storage, checkpoint/resume, cost awareness
|
||||
- [self-modification.md](./references/self-modification.md) - Git-based evolution, guardrails
|
||||
- [refactoring-to-prompt-native.md](./references/refactoring-to-prompt-native.md) - Migrating existing code
|
||||
</reference_index>
|
||||
|
||||
<anti_patterns>
|
||||
## What NOT to Do
|
||||
## Anti-Patterns
|
||||
|
||||
### Common Approaches That Aren't Fully Agent-Native
|
||||
|
||||
These aren't necessarily wrong—they may be appropriate for your use case. But they're worth recognizing as different from the architecture this document describes.
|
||||
|
||||
**Agent as router** — The agent figures out what the user wants, then calls the right function. The agent's intelligence is used to route, not to act. This can work, but you're using a fraction of what agents can do.
|
||||
|
||||
**Build the app, then add agent** — You build features the traditional way (as code), then expose them to an agent. The agent can only do what your features already do. You won't get emergent capability.
|
||||
|
||||
**Request/response thinking** — Agent gets input, does one thing, returns output. This misses the loop: agent gets an outcome to achieve, operates until it's done, handles unexpected situations along the way.
|
||||
|
||||
**Defensive tool design** — You over-constrain tool inputs because you're used to defensive programming. Strict enums, validation at every layer. This is safe, but it prevents the agent from doing things you didn't anticipate.
|
||||
|
||||
**Happy path in code, agent just executes** — Traditional software handles edge cases in code—you write the logic for what happens when X goes wrong. Agent-native lets the agent handle edge cases with judgment. If your code handles all the edge cases, the agent is just a caller.
|
||||
|
||||
---
|
||||
|
||||
### Specific Anti-Patterns
|
||||
|
||||
**THE CARDINAL SIN: Agent executes your code instead of figuring things out**
|
||||
|
||||
This is the most common mistake. You fall back into writing workflow code and having the agent call it, instead of defining outcomes and letting the agent figure out HOW.
|
||||
|
||||
```typescript
|
||||
// WRONG - You wrote the workflow, agent just executes it
|
||||
tool("process_feedback", async ({ message }) => {
|
||||
const category = categorize(message); // Your code
|
||||
const priority = calculatePriority(message); // Your code
|
||||
await store(message, category, priority); // Your code
|
||||
if (priority > 3) await notify(); // Your code
|
||||
const category = categorize(message); // Your code decides
|
||||
const priority = calculatePriority(message); // Your code decides
|
||||
await store(message, category, priority); // Your code orchestrates
|
||||
if (priority > 3) await notify(); // Your code decides
|
||||
});
|
||||
|
||||
// RIGHT - Agent figures out how to process feedback
|
||||
tool("store_item", { key, value }, ...); // Primitive
|
||||
tool("send_message", { channel, content }, ...); // Primitive
|
||||
// Prompt says: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
|
||||
tools: store_item, send_message // Primitives
|
||||
prompt: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
|
||||
```
|
||||
|
||||
**Don't artificially limit what the agent can do**
|
||||
**Workflow-shaped tools** — `analyze_and_organize` bundles judgment into the tool. Break it into primitives and let the agent compose them.
|
||||
|
||||
If a user could do it, the agent should be able to do it.
|
||||
|
||||
```typescript
|
||||
// WRONG - limiting agent capabilities
|
||||
tool("read_approved_files", { path }, async ({ path }) => {
|
||||
if (!ALLOWED_PATHS.includes(path)) throw new Error("Not allowed");
|
||||
return readFile(path);
|
||||
});
|
||||
|
||||
// RIGHT - give full capability, use guardrails appropriately
|
||||
tool("read_file", { path }, ...); // Agent can read anything
|
||||
// Use approval gates for writes, not artificial limits on reads
|
||||
```
|
||||
|
||||
**Don't encode decisions in tools**
|
||||
```typescript
|
||||
// Wrong - tool decides format
|
||||
tool("format_report", { format: z.enum(["markdown", "html", "pdf"]) }, ...)
|
||||
|
||||
// Right - agent decides format via prompt
|
||||
tool("write_file", ...) // Agent chooses what to write
|
||||
```
|
||||
|
||||
**Don't over-specify in prompts**
|
||||
```markdown
|
||||
// Wrong - micromanaging the HOW
|
||||
When creating a summary, use exactly 3 bullet points,
|
||||
each under 20 words, formatted with em-dashes...
|
||||
|
||||
// Right - define outcome, trust intelligence
|
||||
Create clear, useful summaries. Use your judgment.
|
||||
```
|
||||
|
||||
### Agent-Native Anti-Patterns
|
||||
|
||||
**Context Starvation**
|
||||
Agent doesn't know what resources exist in the app.
|
||||
**Context starvation** — Agent doesn't know what resources exist in the app.
|
||||
```
|
||||
User: "Write something about Catherine the Great in my feed"
|
||||
Agent: "What feed? I don't understand what system you're referring to."
|
||||
```
|
||||
Fix: Inject available resources, capabilities, and vocabulary into the system prompt at runtime.
|
||||
Fix: Inject available resources, capabilities, and vocabulary into system prompt.
|
||||
|
||||
**Orphan Features**
|
||||
UI action with no agent equivalent.
|
||||
```swift
|
||||
// UI has a "Publish to Feed" button
|
||||
Button("Publish") { publishToFeed(insight) }
|
||||
// But no agent tool exists to do the same thing
|
||||
**Orphan UI actions** — User can do something through the UI that the agent can't achieve. Fix: maintain parity.
|
||||
|
||||
**Silent actions** — Agent changes state but UI doesn't update. Fix: Use shared data stores with reactive binding, or file system observation.
|
||||
|
||||
**Heuristic completion detection** — Detecting agent completion through heuristics (consecutive iterations without tool calls, checking for expected output files). This is fragile. Fix: Require agents to explicitly signal completion through a `complete_task` tool.
|
||||
|
||||
**Static tool mapping for dynamic APIs** — Building 50 tools for 50 API endpoints when a `discover` + `access` pattern would give more flexibility.
|
||||
```typescript
|
||||
// WRONG - Every API type needs a hardcoded tool
|
||||
tool("read_steps", ...)
|
||||
tool("read_heart_rate", ...)
|
||||
tool("read_sleep", ...)
|
||||
// When glucose tracking is added... code change required
|
||||
|
||||
// RIGHT - Dynamic capability discovery
|
||||
tool("list_available_types", ...) // Discover what's available
|
||||
tool("read_health_data", { dataType: z.string() }, ...) // Access any type
|
||||
```
|
||||
Fix: Add corresponding tool and document in system prompt for every UI action.
|
||||
|
||||
**Sandbox Isolation**
|
||||
Agent works in separate data space from user.
|
||||
**Incomplete CRUD** — Agent can create but not update or delete.
|
||||
```typescript
|
||||
// User: "Delete that journal entry"
|
||||
// Agent: "I don't have a tool for that"
|
||||
tool("create_journal_entry", ...) // Missing: update, delete
|
||||
```
|
||||
Fix: Every entity needs full CRUD.
|
||||
|
||||
**Sandbox isolation** — Agent works in separate data space from user.
|
||||
```
|
||||
Documents/
|
||||
├── user_files/ ← User's space
|
||||
└── agent_output/ ← Agent's space (isolated)
|
||||
```
|
||||
Fix: Use shared workspace where both agent and user operate on the same files.
|
||||
Fix: Use shared workspace where both operate on same files.
|
||||
|
||||
**Silent Actions**
|
||||
Agent changes state but UI doesn't update.
|
||||
```typescript
|
||||
// Agent writes to database
|
||||
await db.insert("feed", content);
|
||||
// But UI doesn't observe this table - user sees nothing
|
||||
```
|
||||
Fix: Use shared data stores with reactive binding, or file system observation.
|
||||
**Gates without reason** — Domain tool is the only way to do something, and you didn't intend to restrict access. The default is open. Keep primitives available unless there's a specific reason to gate.
|
||||
|
||||
**Capability Hiding**
|
||||
Users can't discover what agents can do.
|
||||
```
|
||||
User: "Help me with my reading"
|
||||
Agent: "What would you like help with?"
|
||||
// Agent doesn't mention it can publish to feed, research books, etc.
|
||||
```
|
||||
Fix: Include capability hints in agent responses or provide onboarding.
|
||||
|
||||
**Static Tool Mapping (for agent-native apps)**
|
||||
Building individual tools for each API endpoint when you want the agent to have full access.
|
||||
```typescript
|
||||
// You built 50 tools for 50 HealthKit types
|
||||
tool("read_steps", ...)
|
||||
tool("read_heart_rate", ...)
|
||||
tool("read_sleep", ...)
|
||||
// When glucose tracking is added... code change required
|
||||
// Agent can only access what you anticipated
|
||||
```
|
||||
Fix: Use Dynamic Capability Discovery - one `list_*` tool to discover what's available, one generic tool to access any type. See [mcp-tool-design.md](./references/mcp-tool-design.md). (Note: Static mapping is fine for constrained agents with intentionally limited scope.)
|
||||
|
||||
**Incomplete CRUD**
|
||||
Agent can create but not update or delete.
|
||||
```typescript
|
||||
// ❌ User: "Delete that journal entry"
|
||||
// Agent: "I don't have a tool for that"
|
||||
tool("create_journal_entry", ...)
|
||||
// Missing: update_journal_entry, delete_journal_entry
|
||||
```
|
||||
Fix: Every entity needs full CRUD (Create, Read, Update, Delete). The CRUD Audit: for each entity, verify all four operations exist.
|
||||
**Artificial capability limits** — Restricting what the agent can do out of vague safety concerns rather than specific risks. Be thoughtful about restricting capabilities. The agent should generally be able to do what users can do.
|
||||
</anti_patterns>
|
||||
|
||||
<success_criteria>
|
||||
You've built a prompt-native agent when:
|
||||
## Success Criteria
|
||||
|
||||
**Core Prompt-Native Criteria:**
|
||||
- [ ] The agent figures out HOW to achieve outcomes, not just calls your functions
|
||||
- [ ] Whatever a user could do, the agent can do (no artificial limits)
|
||||
- [ ] Features are prompts that define outcomes, not code that defines workflows
|
||||
- [ ] Tools are primitives (read, write, store, call API) that enable capability
|
||||
- [ ] Changing behavior means editing prose, not refactoring code
|
||||
- [ ] The agent can surprise you with clever approaches you didn't anticipate
|
||||
- [ ] You could add a new feature by writing a new prompt section, not new code
|
||||
You've built an agent-native application when:
|
||||
|
||||
**Tool Design Criteria:**
|
||||
- [ ] External APIs (where agent should have full access) use Dynamic Capability Discovery
|
||||
- [ ] Every entity has full CRUD (Create, Read, Update, Delete)
|
||||
- [ ] API validates inputs, not your enum definitions
|
||||
- [ ] Discovery tools exist for each API surface (`list_*`, `discover_*`)
|
||||
### Architecture
|
||||
- [ ] The agent can achieve anything users can achieve through the UI (parity)
|
||||
- [ ] Tools are atomic primitives; domain tools are shortcuts, not gates (granularity)
|
||||
- [ ] New features can be added by writing new prompts (composability)
|
||||
- [ ] The agent can accomplish tasks you didn't explicitly design for (emergent capability)
|
||||
- [ ] Changing behavior means editing prompts, not refactoring code
|
||||
|
||||
**Agent-Native Criteria:**
|
||||
- [ ] System prompt includes dynamic context about app state (available resources, recent activity)
|
||||
### Implementation
|
||||
- [ ] System prompt includes dynamic context about app state
|
||||
- [ ] Every UI action has a corresponding agent tool (action parity)
|
||||
- [ ] Agent tools are documented in the system prompt with user vocabulary
|
||||
- [ ] Agent tools are documented in system prompt with user vocabulary
|
||||
- [ ] Agent and user work in the same data space (shared workspace)
|
||||
- [ ] Agent actions are immediately reflected in the UI (shared service, file watching, or event bus)
|
||||
- [ ] The "write something to [app location]" test passes for all locations
|
||||
- [ ] Users can discover what the agent can do (capability hints, onboarding)
|
||||
- [ ] Context refreshes for long sessions (or `refresh_context` tool exists)
|
||||
- [ ] Agent actions are immediately reflected in the UI
|
||||
- [ ] Every entity has full CRUD (Create, Read, Update, Delete)
|
||||
- [ ] Agents explicitly signal completion (no heuristic detection)
|
||||
- [ ] context.md or equivalent for accumulated knowledge
|
||||
|
||||
**Mobile-Specific Criteria (if applicable):**
|
||||
- [ ] Background execution handling implemented (checkpoint/resume)
|
||||
- [ ] Permission requests handled gracefully in tools
|
||||
- [ ] Cost-aware design (appropriate model tiers, batching)
|
||||
### Product
|
||||
- [ ] Simple requests work immediately with no learning curve
|
||||
- [ ] Power users can push the system in unexpected directions
|
||||
- [ ] You're learning what users want by observing what they ask the agent to do
|
||||
- [ ] Approval requirements match stakes and reversibility
|
||||
|
||||
### Mobile (if applicable)
|
||||
- [ ] Checkpoint/resume handles app interruption
|
||||
- [ ] iCloud-first storage with local fallback
|
||||
- [ ] Background execution uses available time wisely
|
||||
- [ ] Model tier matched to task complexity
|
||||
|
||||
---
|
||||
|
||||
### The Ultimate Test
|
||||
|
||||
**Describe an outcome to the agent that's within your application's domain but that you didn't build a specific feature for.**
|
||||
|
||||
Can it figure out how to accomplish it, operating in a loop until it succeeds?
|
||||
|
||||
If yes, you've built something agent-native.
|
||||
|
||||
If it says "I don't have a feature for that"—your architecture is still too constrained.
|
||||
</success_criteria>
|
||||
|
||||
@@ -0,0 +1,467 @@
|
||||
<overview>
|
||||
Agent execution patterns for building robust agent loops. This covers how agents signal completion, track partial progress for resume, select appropriate model tiers, and handle context limits.
|
||||
</overview>
|
||||
|
||||
<completion_signals>
|
||||
## Completion Signals
|
||||
|
||||
Agents need an explicit way to say "I'm done."
|
||||
|
||||
### Anti-Pattern: Heuristic Detection
|
||||
|
||||
Detecting completion through heuristics is fragile:
|
||||
|
||||
- Consecutive iterations without tool calls
|
||||
- Checking for expected output files
|
||||
- Tracking "no progress" states
|
||||
- Time-based timeouts
|
||||
|
||||
These break in edge cases and create unpredictable behavior.
|
||||
|
||||
### Pattern: Explicit Completion Tool
|
||||
|
||||
Provide a `complete_task` tool that:
|
||||
- Takes a summary of what was accomplished
|
||||
- Returns a signal that stops the loop
|
||||
- Works identically across all agent types
|
||||
|
||||
```typescript
|
||||
tool("complete_task", {
|
||||
summary: z.string().describe("Summary of what was accomplished"),
|
||||
status: z.enum(["success", "partial", "blocked"]).optional(),
|
||||
}, async ({ summary, status = "success" }) => {
|
||||
return {
|
||||
text: summary,
|
||||
shouldContinue: false, // Key: signals loop should stop
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### The ToolResult Pattern
|
||||
|
||||
Structure tool results to separate success from continuation:
|
||||
|
||||
```swift
|
||||
struct ToolResult {
|
||||
let success: Bool // Did tool succeed?
|
||||
let output: String // What happened?
|
||||
let shouldContinue: Bool // Should agent loop continue?
|
||||
}
|
||||
|
||||
// Three common cases:
|
||||
extension ToolResult {
|
||||
static func success(_ output: String) -> ToolResult {
|
||||
// Tool succeeded, keep going
|
||||
ToolResult(success: true, output: output, shouldContinue: true)
|
||||
}
|
||||
|
||||
static func error(_ message: String) -> ToolResult {
|
||||
// Tool failed but recoverable, agent can try something else
|
||||
ToolResult(success: false, output: message, shouldContinue: true)
|
||||
}
|
||||
|
||||
static func complete(_ summary: String) -> ToolResult {
|
||||
// Task done, stop the loop
|
||||
ToolResult(success: true, output: summary, shouldContinue: false)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key Insight
|
||||
|
||||
**This is different from success/failure:**
|
||||
|
||||
- A tool can **succeed** AND signal **stop** (task complete)
|
||||
- A tool can **fail** AND signal **continue** (recoverable error, try something else)
|
||||
|
||||
```typescript
|
||||
// Examples:
|
||||
read_file("/missing.txt")
|
||||
// → { success: false, output: "File not found", shouldContinue: true }
|
||||
// Agent can try a different file or ask for clarification
|
||||
|
||||
complete_task("Organized all downloads into folders")
|
||||
// → { success: true, output: "...", shouldContinue: false }
|
||||
// Agent is done
|
||||
|
||||
write_file("/output.md", content)
|
||||
// → { success: true, output: "Wrote file", shouldContinue: true }
|
||||
// Agent keeps working toward the goal
|
||||
```
|
||||
|
||||
### System Prompt Guidance
|
||||
|
||||
Tell the agent when to complete:
|
||||
|
||||
```markdown
|
||||
## Completing Tasks
|
||||
|
||||
When you've accomplished the user's request:
|
||||
1. Verify your work (read back files you created, check results)
|
||||
2. Call `complete_task` with a summary of what you did
|
||||
3. Don't keep working after the goal is achieved
|
||||
|
||||
If you're blocked and can't proceed:
|
||||
- Call `complete_task` with status "blocked" and explain why
|
||||
- Don't loop forever trying the same thing
|
||||
```
|
||||
</completion_signals>
|
||||
|
||||
<partial_completion>
|
||||
## Partial Completion
|
||||
|
||||
For multi-step tasks, track progress at the task level for resume capability.
|
||||
|
||||
### Task State Tracking
|
||||
|
||||
```swift
|
||||
enum TaskStatus {
|
||||
case pending // Not yet started
|
||||
case inProgress // Currently working on
|
||||
case completed // Finished successfully
|
||||
case failed // Couldn't complete (with reason)
|
||||
case skipped // Intentionally not done
|
||||
}
|
||||
|
||||
struct AgentTask {
|
||||
let id: String
|
||||
let description: String
|
||||
var status: TaskStatus
|
||||
var notes: String? // Why it failed, what was done
|
||||
}
|
||||
|
||||
struct AgentSession {
|
||||
var tasks: [AgentTask]
|
||||
|
||||
var isComplete: Bool {
|
||||
tasks.allSatisfy { $0.status == .completed || $0.status == .skipped }
|
||||
}
|
||||
|
||||
var progress: (completed: Int, total: Int) {
|
||||
let done = tasks.filter { $0.status == .completed }.count
|
||||
return (done, tasks.count)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### UI Progress Display
|
||||
|
||||
Show users what's happening:
|
||||
|
||||
```
|
||||
Progress: 3/5 tasks complete (60%)
|
||||
✅ [1] Find source materials
|
||||
✅ [2] Download full text
|
||||
✅ [3] Extract key passages
|
||||
❌ [4] Generate summary - Error: context limit exceeded
|
||||
⏳ [5] Create outline - Pending
|
||||
```
|
||||
|
||||
### Partial Completion Scenarios
|
||||
|
||||
**Agent hits max iterations before finishing:**
|
||||
- Some tasks completed, some pending
|
||||
- Checkpoint saved with current state
|
||||
- Resume continues from where it left off, not from beginning
|
||||
|
||||
**Agent fails on one task:**
|
||||
- Task marked `.failed` with error in notes
|
||||
- Other tasks may continue (agent decides)
|
||||
- Orchestrator doesn't automatically abort entire session
|
||||
|
||||
**Network error mid-task:**
|
||||
- Current iteration throws
|
||||
- Session marked `.failed`
|
||||
- Checkpoint preserves messages up to that point
|
||||
- Resume possible from checkpoint
|
||||
|
||||
### Checkpoint Structure
|
||||
|
||||
```swift
|
||||
struct AgentCheckpoint: Codable {
|
||||
let sessionId: String
|
||||
let agentType: String
|
||||
let messages: [Message] // Full conversation history
|
||||
let iterationCount: Int
|
||||
let tasks: [AgentTask] // Task state
|
||||
let customState: [String: Any] // Agent-specific state
|
||||
let timestamp: Date
|
||||
|
||||
var isValid: Bool {
|
||||
// Checkpoints expire (default 1 hour)
|
||||
Date().timeIntervalSince(timestamp) < 3600
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Resume Flow
|
||||
|
||||
1. On app launch, scan for valid checkpoints
|
||||
2. Show user: "You have an incomplete session. Resume?"
|
||||
3. On resume:
|
||||
- Restore messages to conversation
|
||||
- Restore task states
|
||||
- Continue agent loop from where it left off
|
||||
4. On dismiss:
|
||||
- Delete checkpoint
|
||||
- Start fresh if user tries again
|
||||
</partial_completion>
|
||||
|
||||
<model_tier_selection>
|
||||
## Model Tier Selection
|
||||
|
||||
Different agents need different intelligence levels. Use the cheapest model that achieves the outcome.
|
||||
|
||||
### Tier Guidelines
|
||||
|
||||
| Agent Type | Recommended Tier | Reasoning |
|
||||
|------------|-----------------|-----------|
|
||||
| Chat/Conversation | Balanced (Sonnet) | Fast responses, good reasoning |
|
||||
| Research | Balanced (Sonnet) | Tool loops, not ultra-complex synthesis |
|
||||
| Content Generation | Balanced (Sonnet) | Creative but not synthesis-heavy |
|
||||
| Complex Analysis | Powerful (Opus) | Multi-document synthesis, nuanced judgment |
|
||||
| Profile Generation | Powerful (Opus) | Photo analysis, complex pattern recognition |
|
||||
| Quick Queries | Fast (Haiku) | Simple lookups, quick transformations |
|
||||
| Simple Classification | Fast (Haiku) | High volume, simple decisions |
|
||||
|
||||
### Implementation
|
||||
|
||||
```swift
|
||||
enum ModelTier {
|
||||
case fast // claude-3-haiku: Quick, cheap, simple tasks
|
||||
case balanced // claude-sonnet: Good balance for most tasks
|
||||
case powerful // claude-opus: Complex reasoning, synthesis
|
||||
|
||||
var modelId: String {
|
||||
switch self {
|
||||
case .fast: return "claude-3-haiku-20240307"
|
||||
case .balanced: return "claude-sonnet-4-20250514"
|
||||
case .powerful: return "claude-opus-4-20250514"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct AgentConfig {
|
||||
let name: String
|
||||
let modelTier: ModelTier
|
||||
let tools: [AgentTool]
|
||||
let systemPrompt: String
|
||||
let maxIterations: Int
|
||||
}
|
||||
|
||||
// Examples
|
||||
let researchConfig = AgentConfig(
|
||||
name: "research",
|
||||
modelTier: .balanced,
|
||||
tools: researchTools,
|
||||
systemPrompt: researchPrompt,
|
||||
maxIterations: 20
|
||||
)
|
||||
|
||||
let quickLookupConfig = AgentConfig(
|
||||
name: "lookup",
|
||||
modelTier: .fast,
|
||||
tools: [readLibrary],
|
||||
systemPrompt: "Answer quick questions about the user's library.",
|
||||
maxIterations: 3
|
||||
)
|
||||
```
|
||||
|
||||
### Cost Optimization Strategies
|
||||
|
||||
1. **Start with balanced, upgrade if quality insufficient**
|
||||
2. **Use fast tier for tool-heavy loops** where each turn is simple
|
||||
3. **Reserve powerful tier for synthesis tasks** (comparing multiple sources)
|
||||
4. **Consider token limits per turn** to control costs
|
||||
5. **Cache expensive operations** to avoid repeated calls
|
||||
</model_tier_selection>
|
||||
|
||||
<context_limits>
|
||||
## Context Limits
|
||||
|
||||
Agent sessions can extend indefinitely, but context windows don't. Design for bounded context from the start.
|
||||
|
||||
### The Problem
|
||||
|
||||
```
|
||||
Turn 1: User asks question → 500 tokens
|
||||
Turn 2: Agent reads file → 10,000 tokens
|
||||
Turn 3: Agent reads another file → 10,000 tokens
|
||||
Turn 4: Agent researches → 20,000 tokens
|
||||
...
|
||||
Turn 10: Context window exceeded
|
||||
```
|
||||
|
||||
### Design Principles
|
||||
|
||||
**1. Tools should support iterative refinement**
|
||||
|
||||
Instead of all-or-nothing, design for summary → detail → full:
|
||||
|
||||
```typescript
|
||||
// Good: Supports iterative refinement
|
||||
tool("read_file", {
|
||||
path: z.string(),
|
||||
preview: z.boolean().default(true), // Return first 1000 chars by default
|
||||
full: z.boolean().default(false), // Opt-in to full content
|
||||
}, ...);
|
||||
|
||||
tool("search_files", {
|
||||
query: z.string(),
|
||||
summaryOnly: z.boolean().default(true), // Return matches, not full files
|
||||
}, ...);
|
||||
```
|
||||
|
||||
**2. Provide consolidation tools**
|
||||
|
||||
Give agents a way to consolidate learnings mid-session:
|
||||
|
||||
```typescript
|
||||
tool("summarize_and_continue", {
|
||||
keyPoints: z.array(z.string()),
|
||||
nextSteps: z.array(z.string()),
|
||||
}, async ({ keyPoints, nextSteps }) => {
|
||||
// Store summary, potentially truncate earlier messages
|
||||
await saveSessionSummary({ keyPoints, nextSteps });
|
||||
return { text: "Summary saved. Continuing with focus on: " + nextSteps.join(", ") };
|
||||
});
|
||||
```
|
||||
|
||||
**3. Design for truncation**
|
||||
|
||||
Assume the orchestrator may truncate early messages. Important context should be:
|
||||
- In the system prompt (always present)
|
||||
- In files (can be re-read)
|
||||
- Summarized in context.md
|
||||
|
||||
### Implementation Strategies
|
||||
|
||||
```swift
|
||||
class AgentOrchestrator {
|
||||
let maxContextTokens = 100_000
|
||||
let targetContextTokens = 80_000 // Leave headroom
|
||||
|
||||
func shouldTruncate() -> Bool {
|
||||
estimateTokens(messages) > targetContextTokens
|
||||
}
|
||||
|
||||
func truncateIfNeeded() {
|
||||
if shouldTruncate() {
|
||||
// Keep system prompt + recent messages
|
||||
// Summarize or drop older messages
|
||||
messages = [systemMessage] + summarizeOldMessages() + recentMessages
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### System Prompt Guidance
|
||||
|
||||
```markdown
|
||||
## Managing Context
|
||||
|
||||
For long tasks, periodically consolidate what you've learned:
|
||||
1. If you've gathered a lot of information, summarize key points
|
||||
2. Save important findings to files (they persist beyond context)
|
||||
3. Use `summarize_and_continue` if the conversation is getting long
|
||||
|
||||
Don't try to hold everything in memory. Write it down.
|
||||
```
|
||||
</context_limits>
|
||||
|
||||
<orchestrator_pattern>
|
||||
## Unified Agent Orchestrator
|
||||
|
||||
One execution engine, many agent types. All agents use the same orchestrator with different configurations.
|
||||
|
||||
```swift
|
||||
class AgentOrchestrator {
|
||||
static let shared = AgentOrchestrator()
|
||||
|
||||
func run(config: AgentConfig, userMessage: String) async -> AgentResult {
|
||||
var messages: [Message] = [
|
||||
.system(config.systemPrompt),
|
||||
.user(userMessage)
|
||||
]
|
||||
|
||||
var iteration = 0
|
||||
|
||||
while iteration < config.maxIterations {
|
||||
// Get agent response
|
||||
let response = await claude.message(
|
||||
model: config.modelTier.modelId,
|
||||
messages: messages,
|
||||
tools: config.tools
|
||||
)
|
||||
|
||||
messages.append(.assistant(response))
|
||||
|
||||
// Process tool calls
|
||||
for toolCall in response.toolCalls {
|
||||
let result = await executeToolCall(toolCall, config: config)
|
||||
messages.append(.toolResult(result))
|
||||
|
||||
// Check for completion signal
|
||||
if !result.shouldContinue {
|
||||
return AgentResult(
|
||||
status: .completed,
|
||||
output: result.output,
|
||||
iterations: iteration + 1
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
// No tool calls = agent is responding, might be done
|
||||
if response.toolCalls.isEmpty {
|
||||
// Could be done, or waiting for user
|
||||
break
|
||||
}
|
||||
|
||||
iteration += 1
|
||||
}
|
||||
|
||||
return AgentResult(
|
||||
status: iteration >= config.maxIterations ? .maxIterations : .responded,
|
||||
output: messages.last?.content ?? "",
|
||||
iterations: iteration
|
||||
)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
- Consistent lifecycle management across all agent types
|
||||
- Automatic checkpoint/resume (critical for mobile)
|
||||
- Shared tool protocol
|
||||
- Easy to add new agent types
|
||||
- Centralized error handling and logging
|
||||
</orchestrator_pattern>
|
||||
|
||||
<checklist>
|
||||
## Agent Execution Checklist
|
||||
|
||||
### Completion Signals
|
||||
- [ ] `complete_task` tool provided (explicit completion)
|
||||
- [ ] No heuristic completion detection
|
||||
- [ ] Tool results include `shouldContinue` flag
|
||||
- [ ] System prompt guides when to complete
|
||||
|
||||
### Partial Completion
|
||||
- [ ] Tasks tracked with status (pending, in_progress, completed, failed)
|
||||
- [ ] Checkpoints saved for resume
|
||||
- [ ] Progress visible to user
|
||||
- [ ] Resume continues from where left off
|
||||
|
||||
### Model Tiers
|
||||
- [ ] Tier selected based on task complexity
|
||||
- [ ] Cost optimization considered
|
||||
- [ ] Fast tier for simple operations
|
||||
- [ ] Powerful tier reserved for synthesis
|
||||
|
||||
### Context Limits
|
||||
- [ ] Tools support iterative refinement (preview vs full)
|
||||
- [ ] Consolidation mechanism available
|
||||
- [ ] Important context persisted to files
|
||||
- [ ] Truncation strategy defined
|
||||
</checklist>
|
||||
@@ -1,5 +1,12 @@
|
||||
<overview>
|
||||
Architectural patterns for building prompt-native agent systems. These patterns emerge from the philosophy that features should be defined in prompts, not code, and that tools should be primitives.
|
||||
Architectural patterns for building agent-native systems. These patterns emerge from the five core principles: Parity, Granularity, Composability, Emergent Capability, and Improvement Over Time.
|
||||
|
||||
Features are outcomes achieved by agents operating in a loop, not functions you write. Tools are atomic primitives. The agent applies judgment; the prompt defines the outcome.
|
||||
|
||||
See also:
|
||||
- [files-universal-interface.md](./files-universal-interface.md) for file organization and context.md patterns
|
||||
- [agent-execution-patterns.md](./agent-execution-patterns.md) for completion signals and partial completion
|
||||
- [product-implications.md](./product-implications.md) for progressive disclosure and approval patterns
|
||||
</overview>
|
||||
|
||||
<pattern name="event-driven-agent">
|
||||
|
||||
@@ -0,0 +1,301 @@
|
||||
<overview>
|
||||
Files are the universal interface for agent-native applications. Agents are naturally fluent with file operations—they already know how to read, write, and organize files. This document covers why files work so well, how to organize them, and the context.md pattern for accumulated knowledge.
|
||||
</overview>
|
||||
|
||||
<why_files>
|
||||
## Why Files
|
||||
|
||||
Agents are naturally good at files. Claude Code works because bash + filesystem is the most battle-tested agent interface. When building agent-native apps, lean into this.
|
||||
|
||||
### Agents Already Know How
|
||||
|
||||
You don't need to teach the agent your API—it already knows `cat`, `grep`, `mv`, `mkdir`. File operations are the primitives it's most fluent with.
|
||||
|
||||
### Files Are Inspectable
|
||||
|
||||
Users can see what the agent created, edit it, move it, delete it. No black box. Complete transparency into agent behavior.
|
||||
|
||||
### Files Are Portable
|
||||
|
||||
Export is trivial. Backup is trivial. Users own their data. No vendor lock-in, no complex migration paths.
|
||||
|
||||
### App State Stays in Sync
|
||||
|
||||
On mobile, if you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
|
||||
|
||||
### Directory Structure Is Information Architecture
|
||||
|
||||
The filesystem gives you hierarchy for free. `/projects/acme/notes/` is self-documenting in a way that `SELECT * FROM notes WHERE project_id = 123` isn't.
|
||||
</why_files>
|
||||
|
||||
<file_organization>
|
||||
## File Organization Patterns
|
||||
|
||||
> **Needs validation:** These conventions are one approach that's worked so far, not a prescription. Better solutions should be considered.
|
||||
|
||||
A general principle of agent-native design: **Design for what agents can reason about.** The best proxy for that is what would make sense to a human. If a human can look at your file structure and understand what's going on, an agent probably can too.
|
||||
|
||||
### Entity-Scoped Directories
|
||||
|
||||
Organize files around entities, not actors or file types:
|
||||
|
||||
```
|
||||
{entity_type}/{entity_id}/
|
||||
├── primary content
|
||||
├── metadata
|
||||
└── related materials
|
||||
```
|
||||
|
||||
**Example:** `Research/books/{bookId}/` contains everything about one book—full text, notes, sources, agent logs.
|
||||
|
||||
### Naming Conventions
|
||||
|
||||
| File Type | Naming Pattern | Example |
|
||||
|-----------|---------------|---------|
|
||||
| Entity data | `{entity}.json` | `library.json`, `status.json` |
|
||||
| Human-readable content | `{content_type}.md` | `introduction.md`, `profile.md` |
|
||||
| Agent reasoning | `agent_log.md` | Per-entity agent history |
|
||||
| Primary content | `full_text.txt` | Downloaded/extracted text |
|
||||
| Multi-volume | `volume{N}.txt` | `volume1.txt`, `volume2.txt` |
|
||||
| External sources | `{source_name}.md` | `wikipedia.md`, `sparknotes.md` |
|
||||
| Checkpoints | `{sessionId}.checkpoint` | UUID-based |
|
||||
| Configuration | `config.json` | Feature settings |
|
||||
|
||||
### Directory Naming
|
||||
|
||||
- **Entity-scoped:** `{entityType}/{entityId}/` (e.g., `Research/books/{bookId}/`)
|
||||
- **Type-scoped:** `{type}/` (e.g., `AgentCheckpoints/`, `AgentLogs/`)
|
||||
- **Convention:** Lowercase with underscores, not camelCase
|
||||
|
||||
### Ephemeral vs. Durable Separation
|
||||
|
||||
Separate agent working files from user's permanent data:
|
||||
|
||||
```
|
||||
Documents/
|
||||
├── AgentCheckpoints/ # Ephemeral (can delete)
|
||||
│ └── {sessionId}.checkpoint
|
||||
├── AgentLogs/ # Ephemeral (debugging)
|
||||
│ └── {type}/{sessionId}.md
|
||||
└── Research/ # Durable (user's work)
|
||||
└── books/{bookId}/
|
||||
```
|
||||
|
||||
### The Split: Markdown vs JSON
|
||||
|
||||
- **Markdown:** For content users might read or edit
|
||||
- **JSON:** For structured data the app queries
|
||||
</file_organization>
|
||||
|
||||
<context_md_pattern>
|
||||
## The context.md Pattern
|
||||
|
||||
A file the agent reads at the start of each session and updates as it learns:
|
||||
|
||||
```markdown
|
||||
# Context
|
||||
|
||||
## Who I Am
|
||||
Reading assistant for the Every app.
|
||||
|
||||
## What I Know About This User
|
||||
- Interested in military history and Russian literature
|
||||
- Prefers concise analysis
|
||||
- Currently reading War and Peace
|
||||
|
||||
## What Exists
|
||||
- 12 notes in /notes
|
||||
- 3 active projects
|
||||
- User preferences at /preferences.md
|
||||
|
||||
## Recent Activity
|
||||
- User created "Project kickoff" (2 hours ago)
|
||||
- Analyzed passage about Austerlitz (yesterday)
|
||||
|
||||
## My Guidelines
|
||||
- Don't spoil books they're reading
|
||||
- Use their interests to personalize insights
|
||||
|
||||
## Current State
|
||||
- No pending tasks
|
||||
- Last sync: 10 minutes ago
|
||||
```
|
||||
|
||||
### Benefits
|
||||
|
||||
- **Agent behavior evolves without code changes** - Update the context, behavior changes
|
||||
- **Users can inspect and modify** - Complete transparency
|
||||
- **Natural place for accumulated context** - Learnings persist across sessions
|
||||
- **Portable across sessions** - Restart agent, knowledge preserved
|
||||
|
||||
### How It Works
|
||||
|
||||
1. Agent reads `context.md` at session start
|
||||
2. Agent updates it when learning something important
|
||||
3. System can also update it (recent activity, new resources)
|
||||
4. Context persists across sessions
|
||||
|
||||
### What to Include
|
||||
|
||||
| Section | Purpose |
|
||||
|---------|---------|
|
||||
| Who I Am | Agent identity and role |
|
||||
| What I Know About This User | Learned preferences, interests |
|
||||
| What Exists | Available resources, data |
|
||||
| Recent Activity | Context for continuity |
|
||||
| My Guidelines | Learned rules and constraints |
|
||||
| Current State | Session status, pending items |
|
||||
</context_md_pattern>
|
||||
|
||||
<files_vs_database>
|
||||
## Files vs. Database
|
||||
|
||||
> **Needs validation:** This framing is informed by mobile development. For web apps, the tradeoffs are different.
|
||||
|
||||
| Use files for... | Use database for... |
|
||||
|------------------|---------------------|
|
||||
| Content users should read/edit | High-volume structured data |
|
||||
| Configuration that benefits from version control | Data that needs complex queries |
|
||||
| Agent-generated content | Ephemeral state (sessions, caches) |
|
||||
| Anything that benefits from transparency | Data with relationships |
|
||||
| Large text content | Data that needs indexing |
|
||||
|
||||
**The principle:** Files for legibility, databases for structure. When in doubt, files—they're more transparent and users can always inspect them.
|
||||
|
||||
### When Files Work Best
|
||||
|
||||
- Scale is small (one user's library, not millions of records)
|
||||
- Transparency is valued over query speed
|
||||
- Cloud sync (iCloud, Dropbox) works well with files
|
||||
|
||||
### Hybrid Approach
|
||||
|
||||
Even if you need a database for performance, consider maintaining a file-based "source of truth" that the agent works with, synced to the database for the UI:
|
||||
|
||||
```
|
||||
Files (agent workspace):
|
||||
Research/book_123/introduction.md
|
||||
|
||||
Database (UI queries):
|
||||
research_index: { bookId, path, title, createdAt }
|
||||
```
|
||||
</files_vs_database>
|
||||
|
||||
<conflict_model>
|
||||
## Conflict Model
|
||||
|
||||
If agents and users write to the same files, you need a conflict model.
|
||||
|
||||
### Current Reality
|
||||
|
||||
Most implementations use **last-write-wins** via atomic writes:
|
||||
|
||||
```swift
|
||||
try data.write(to: url, options: [.atomic])
|
||||
```
|
||||
|
||||
This is simple but can lose changes.
|
||||
|
||||
### Options
|
||||
|
||||
| Strategy | Pros | Cons |
|
||||
|----------|------|------|
|
||||
| **Last write wins** | Simple | Changes can be lost |
|
||||
| **Agent checks before writing** | Preserves user edits | More complexity |
|
||||
| **Separate spaces** | No conflicts | Less collaboration |
|
||||
| **Append-only logs** | Never overwrites | Files grow forever |
|
||||
| **File locking** | Safe concurrent access | Complexity, can block |
|
||||
|
||||
### Recommended Approaches
|
||||
|
||||
**For files agents write frequently (logs, status):** Last-write-wins is fine. Conflicts are rare.
|
||||
|
||||
**For files users edit (profiles, notes):** Consider explicit handling:
|
||||
- Agent checks modification time before overwriting
|
||||
- Or keep agent output separate from user-editable content
|
||||
- Or use append-only pattern
|
||||
|
||||
### iCloud Considerations
|
||||
|
||||
iCloud sync adds complexity. It creates `{filename} (conflict).md` files when sync conflicts occur. Monitor for these:
|
||||
|
||||
```swift
|
||||
NotificationCenter.default.addObserver(
|
||||
forName: .NSMetadataQueryDidUpdate,
|
||||
...
|
||||
)
|
||||
```
|
||||
|
||||
### System Prompt Guidance
|
||||
|
||||
Tell the agent about the conflict model:
|
||||
|
||||
```markdown
|
||||
## Working with User Content
|
||||
|
||||
When you create content, the user may edit it afterward. Always read
|
||||
existing files before modifying them—the user may have made improvements
|
||||
you should preserve.
|
||||
|
||||
If a file has been modified since you last wrote it, ask before overwriting.
|
||||
```
|
||||
</conflict_model>
|
||||
|
||||
<examples>
|
||||
## Example: Reading App File Structure
|
||||
|
||||
```
|
||||
Documents/
|
||||
├── Library/
|
||||
│ └── library.json # Book metadata
|
||||
├── Research/
|
||||
│ └── books/
|
||||
│ └── {bookId}/
|
||||
│ ├── full_text.txt # Downloaded content
|
||||
│ ├── introduction.md # Agent-generated, user-editable
|
||||
│ ├── notes.md # User notes
|
||||
│ └── sources/
|
||||
│ ├── wikipedia.md # Research gathered by agent
|
||||
│ └── reviews.md
|
||||
├── Chats/
|
||||
│ └── {conversationId}.json # Chat history
|
||||
├── Profile/
|
||||
│ └── profile.md # User reading profile
|
||||
└── context.md # Agent's accumulated knowledge
|
||||
```
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. User adds book → creates entry in `library.json`
|
||||
2. Agent downloads text → saves to `Research/books/{id}/full_text.txt`
|
||||
3. Agent researches → saves to `sources/`
|
||||
4. Agent generates intro → saves to `introduction.md`
|
||||
5. User edits intro → agent sees changes on next read
|
||||
6. Agent updates `context.md` with learnings
|
||||
</examples>
|
||||
|
||||
<checklist>
|
||||
## Files as Universal Interface Checklist
|
||||
|
||||
### Organization
|
||||
- [ ] Entity-scoped directories (`{type}/{id}/`)
|
||||
- [ ] Consistent naming conventions
|
||||
- [ ] Ephemeral vs durable separation
|
||||
- [ ] Markdown for human content, JSON for structured data
|
||||
|
||||
### context.md
|
||||
- [ ] Agent reads context at session start
|
||||
- [ ] Agent updates context when learning
|
||||
- [ ] Includes: identity, user knowledge, what exists, guidelines
|
||||
- [ ] Persists across sessions
|
||||
|
||||
### Conflict Handling
|
||||
- [ ] Conflict model defined (last-write-wins, check-before-write, etc.)
|
||||
- [ ] Agent guidance in system prompt
|
||||
- [ ] iCloud conflict monitoring (if applicable)
|
||||
|
||||
### Integration
|
||||
- [ ] UI observes file changes (or shared service)
|
||||
- [ ] Agent can read user edits
|
||||
- [ ] User can inspect agent output
|
||||
</checklist>
|
||||
@@ -0,0 +1,359 @@
|
||||
<overview>
|
||||
Start with pure primitives: bash, file operations, basic storage. This proves the architecture works and reveals what the agent actually needs. As patterns emerge, add domain-specific tools deliberately. This document covers when and how to evolve from primitives to domain tools, and when to graduate to optimized code.
|
||||
</overview>
|
||||
|
||||
<start_with_primitives>
|
||||
## Start with Pure Primitives
|
||||
|
||||
Begin every agent-native system with the most atomic tools possible:
|
||||
|
||||
- `read_file` / `write_file` / `list_files`
|
||||
- `bash` (for everything else)
|
||||
- Basic storage (`store_item` / `get_item`)
|
||||
- HTTP requests (`fetch_url`)
|
||||
|
||||
**Why start here:**
|
||||
|
||||
1. **Proves the architecture** - If it works with primitives, your prompts are doing their job
|
||||
2. **Reveals actual needs** - You'll discover what domain concepts matter
|
||||
3. **Maximum flexibility** - Agent can do anything, not just what you anticipated
|
||||
4. **Forces good prompts** - You can't lean on tool logic as a crutch
|
||||
|
||||
### Example: Starting Primitive
|
||||
|
||||
```typescript
|
||||
// Start with just these
|
||||
const tools = [
|
||||
tool("read_file", { path: z.string() }, ...),
|
||||
tool("write_file", { path: z.string(), content: z.string() }, ...),
|
||||
tool("list_files", { path: z.string() }, ...),
|
||||
tool("bash", { command: z.string() }, ...),
|
||||
];
|
||||
|
||||
// Prompt handles the domain logic
|
||||
const prompt = `
|
||||
When processing feedback:
|
||||
1. Read existing feedback from data/feedback.json
|
||||
2. Add the new feedback with your assessment of importance (1-5)
|
||||
3. Write the updated file
|
||||
4. If importance >= 4, create a notification file in data/alerts/
|
||||
`;
|
||||
```
|
||||
</start_with_primitives>
|
||||
|
||||
<when_to_add_domain_tools>
|
||||
## When to Add Domain Tools
|
||||
|
||||
As patterns emerge, you'll want to add domain-specific tools. This is good—but do it deliberately.
|
||||
|
||||
### Vocabulary Anchoring
|
||||
|
||||
**Add a domain tool when:** The agent needs to understand domain concepts.
|
||||
|
||||
A `create_note` tool teaches the agent what "note" means in your system better than "write a file to the notes directory with this format."
|
||||
|
||||
```typescript
|
||||
// Without domain tool - agent must infer structure
|
||||
await agent.chat("Create a note about the meeting");
|
||||
// Agent: writes to... notes/? documents/? what format?
|
||||
|
||||
// With domain tool - vocabulary is anchored
|
||||
tool("create_note", {
|
||||
title: z.string(),
|
||||
content: z.string(),
|
||||
tags: z.array(z.string()).optional(),
|
||||
}, async ({ title, content, tags }) => {
|
||||
// Tool enforces structure, agent understands "note"
|
||||
});
|
||||
```
|
||||
|
||||
### Guardrails
|
||||
|
||||
**Add a domain tool when:** Some operations need validation or constraints that shouldn't be left to agent judgment.
|
||||
|
||||
```typescript
|
||||
// publish_to_feed might enforce format requirements or content policies
|
||||
tool("publish_to_feed", {
|
||||
bookId: z.string(),
|
||||
content: z.string(),
|
||||
headline: z.string().max(100), // Enforce headline length
|
||||
}, async ({ bookId, content, headline }) => {
|
||||
// Validate content meets guidelines
|
||||
if (containsProhibitedContent(content)) {
|
||||
return { text: "Content doesn't meet guidelines", isError: true };
|
||||
}
|
||||
// Enforce proper structure
|
||||
await feedService.publish({ bookId, content, headline, publishedAt: new Date() });
|
||||
});
|
||||
```
|
||||
|
||||
### Efficiency
|
||||
|
||||
**Add a domain tool when:** Common operations would take many primitive calls.
|
||||
|
||||
```typescript
|
||||
// Primitive approach: multiple calls
|
||||
await agent.chat("Get book details");
|
||||
// Agent: read library.json, parse, find book, read full_text.txt, read introduction.md...
|
||||
|
||||
// Domain tool: one call for common operation
|
||||
tool("get_book_with_content", { bookId: z.string() }, async ({ bookId }) => {
|
||||
const book = await library.getBook(bookId);
|
||||
const fullText = await readFile(`Research/${bookId}/full_text.txt`);
|
||||
const intro = await readFile(`Research/${bookId}/introduction.md`);
|
||||
return { text: JSON.stringify({ book, fullText, intro }) };
|
||||
});
|
||||
```
|
||||
</when_to_add_domain_tools>
|
||||
|
||||
<the_rule>
|
||||
## The Rule for Domain Tools
|
||||
|
||||
**Domain tools should represent one conceptual action from the user's perspective.**
|
||||
|
||||
They can include mechanical validation, but **judgment about what to do or whether to do it belongs in the prompt**.
|
||||
|
||||
### Wrong: Bundles Judgment
|
||||
|
||||
```typescript
|
||||
// WRONG - analyze_and_publish bundles judgment into the tool
|
||||
tool("analyze_and_publish", async ({ input }) => {
|
||||
const analysis = analyzeContent(input); // Tool decides how to analyze
|
||||
const shouldPublish = analysis.score > 0.7; // Tool decides whether to publish
|
||||
if (shouldPublish) {
|
||||
await publish(analysis.summary); // Tool decides what to publish
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### Right: One Action, Agent Decides
|
||||
|
||||
```typescript
|
||||
// RIGHT - separate tools, agent decides
|
||||
tool("analyze_content", { content: z.string() }, ...); // Returns analysis
|
||||
tool("publish", { content: z.string() }, ...); // Publishes what agent provides
|
||||
|
||||
// Prompt: "Analyze the content. If it's high quality, publish a summary."
|
||||
// Agent decides what "high quality" means and what summary to write.
|
||||
```
|
||||
|
||||
### The Test
|
||||
|
||||
Ask: "Who is making the decision here?"
|
||||
|
||||
- If the answer is "the tool code" → you've encoded judgment, refactor
|
||||
- If the answer is "the agent based on the prompt" → good
|
||||
</the_rule>
|
||||
|
||||
<keep_primitives_available>
|
||||
## Keep Primitives Available
|
||||
|
||||
**Domain tools are shortcuts, not gates.**
|
||||
|
||||
Unless there's a specific reason to restrict access (security, data integrity), the agent should still be able to use underlying primitives for edge cases.
|
||||
|
||||
```typescript
|
||||
// Domain tool for common case
|
||||
tool("create_note", { title, content }, ...);
|
||||
|
||||
// But primitives still available for edge cases
|
||||
tool("read_file", { path }, ...);
|
||||
tool("write_file", { path, content }, ...);
|
||||
|
||||
// Agent can use create_note normally, but for weird edge case:
|
||||
// "Create a note in a non-standard location with custom metadata"
|
||||
// → Agent uses write_file directly
|
||||
```
|
||||
|
||||
### When to Gate
|
||||
|
||||
Gating (making domain tool the only way) is appropriate for:
|
||||
|
||||
- **Security:** User authentication, payment processing
|
||||
- **Data integrity:** Operations that must maintain invariants
|
||||
- **Audit requirements:** Actions that must be logged in specific ways
|
||||
|
||||
**The default is open.** When you do gate something, make it a conscious decision with a clear reason.
|
||||
</keep_primitives_available>
|
||||
|
||||
<graduating_to_code>
|
||||
## Graduating to Code
|
||||
|
||||
Some operations will need to move from agent-orchestrated to optimized code for performance or reliability.
|
||||
|
||||
### The Progression
|
||||
|
||||
```
|
||||
Stage 1: Agent uses primitives in a loop
|
||||
→ Flexible, proves the concept
|
||||
→ Slow, potentially expensive
|
||||
|
||||
Stage 2: Add domain tools for common operations
|
||||
→ Faster, still agent-orchestrated
|
||||
→ Agent still decides when/whether to use
|
||||
|
||||
Stage 3: For hot paths, implement in optimized code
|
||||
→ Fast, deterministic
|
||||
→ Agent can still trigger, but execution is code
|
||||
```
|
||||
|
||||
### Example Progression
|
||||
|
||||
**Stage 1: Pure primitives**
|
||||
```markdown
|
||||
Prompt: "When user asks for a summary, read all notes in /notes,
|
||||
analyze them, and write a summary to /summaries/{date}.md"
|
||||
|
||||
Agent: Calls read_file 20 times, reasons about content, writes summary
|
||||
Time: 30 seconds, 50k tokens
|
||||
```
|
||||
|
||||
**Stage 2: Domain tool**
|
||||
```typescript
|
||||
tool("get_all_notes", {}, async () => {
|
||||
const notes = await readAllNotesFromDirectory();
|
||||
return { text: JSON.stringify(notes) };
|
||||
});
|
||||
|
||||
// Agent still decides how to summarize, but retrieval is faster
|
||||
// Time: 10 seconds, 30k tokens
|
||||
```
|
||||
|
||||
**Stage 3: Optimized code**
|
||||
```typescript
|
||||
tool("generate_weekly_summary", {}, async () => {
|
||||
// Entire operation in code for hot path
|
||||
const notes = await getNotes({ since: oneWeekAgo });
|
||||
const summary = await generateSummary(notes); // Could use cheaper model
|
||||
await writeSummary(summary);
|
||||
return { text: "Summary generated" };
|
||||
});
|
||||
|
||||
// Agent just triggers it
|
||||
// Time: 2 seconds, 5k tokens
|
||||
```
|
||||
|
||||
### The Caveat
|
||||
|
||||
**Even when an operation graduates to code, the agent should be able to:**
|
||||
|
||||
1. Trigger the optimized operation itself
|
||||
2. Fall back to primitives for edge cases the optimized path doesn't handle
|
||||
|
||||
Graduation is about efficiency. **Parity still holds.** The agent doesn't lose capability when you optimize.
|
||||
</graduating_to_code>
|
||||
|
||||
<decision_framework>
|
||||
## Decision Framework
|
||||
|
||||
### Should I Add a Domain Tool?
|
||||
|
||||
| Question | If Yes |
|
||||
|----------|--------|
|
||||
| Is the agent confused about what this concept means? | Add for vocabulary anchoring |
|
||||
| Does this operation need validation the agent shouldn't decide? | Add with guardrails |
|
||||
| Is this a common multi-step operation? | Add for efficiency |
|
||||
| Would changing behavior require code changes? | Keep as prompt instead |
|
||||
|
||||
### Should I Graduate to Code?
|
||||
|
||||
| Question | If Yes |
|
||||
|----------|--------|
|
||||
| Is this operation called very frequently? | Consider graduating |
|
||||
| Does latency matter significantly? | Consider graduating |
|
||||
| Are token costs problematic? | Consider graduating |
|
||||
| Do you need deterministic behavior? | Graduate to code |
|
||||
| Does the operation need complex state management? | Graduate to code |
|
||||
|
||||
### Should I Gate Access?
|
||||
|
||||
| Question | If Yes |
|
||||
|----------|--------|
|
||||
| Is there a security requirement? | Gate appropriately |
|
||||
| Must this operation maintain data integrity? | Gate appropriately |
|
||||
| Is there an audit/compliance requirement? | Gate appropriately |
|
||||
| Is it just "safer" with no specific risk? | Keep primitives available |
|
||||
</decision_framework>
|
||||
|
||||
<examples>
|
||||
## Examples
|
||||
|
||||
### Feedback Processing Evolution
|
||||
|
||||
**Stage 1: Primitives only**
|
||||
```typescript
|
||||
tools: [read_file, write_file, bash]
|
||||
prompt: "Store feedback in data/feedback.json, notify if important"
|
||||
// Agent figures out JSON structure, importance criteria, notification method
|
||||
```
|
||||
|
||||
**Stage 2: Domain tools for vocabulary**
|
||||
```typescript
|
||||
tools: [
|
||||
store_feedback, // Anchors "feedback" concept with proper structure
|
||||
send_notification, // Anchors "notify" with correct channels
|
||||
read_file, // Still available for edge cases
|
||||
write_file,
|
||||
]
|
||||
prompt: "Store feedback using store_feedback. Notify if importance >= 4."
|
||||
// Agent still decides importance, but vocabulary is anchored
|
||||
```
|
||||
|
||||
**Stage 3: Graduated hot path**
|
||||
```typescript
|
||||
tools: [
|
||||
process_feedback_batch, // Optimized for high-volume processing
|
||||
store_feedback, // For individual items
|
||||
send_notification,
|
||||
read_file,
|
||||
write_file,
|
||||
]
|
||||
// Batch processing is code, but agent can still use store_feedback for special cases
|
||||
```
|
||||
|
||||
### When NOT to Add Domain Tools
|
||||
|
||||
**Don't add a domain tool just to make things "cleaner":**
|
||||
```typescript
|
||||
// Unnecessary - agent can compose primitives
|
||||
tool("organize_files_by_date", ...) // Just use move_file + judgment
|
||||
|
||||
// Unnecessary - puts decision in wrong place
|
||||
tool("decide_file_importance", ...) // This is prompt territory
|
||||
```
|
||||
|
||||
**Don't add a domain tool if behavior might change:**
|
||||
```typescript
|
||||
// Bad - locked into code
|
||||
tool("generate_standard_report", ...) // What if report format evolves?
|
||||
|
||||
// Better - keep in prompt
|
||||
prompt: "Generate a report covering X, Y, Z. Format for readability."
|
||||
// Can adjust format by editing prompt
|
||||
```
|
||||
</examples>
|
||||
|
||||
<checklist>
|
||||
## Checklist: Primitives to Domain Tools
|
||||
|
||||
### Starting Out
|
||||
- [ ] Begin with pure primitives (read, write, list, bash)
|
||||
- [ ] Write behavior in prompts, not tool logic
|
||||
- [ ] Let patterns emerge from actual usage
|
||||
|
||||
### Adding Domain Tools
|
||||
- [ ] Clear reason: vocabulary anchoring, guardrails, or efficiency
|
||||
- [ ] Tool represents one conceptual action
|
||||
- [ ] Judgment stays in prompts, not tool code
|
||||
- [ ] Primitives remain available alongside domain tools
|
||||
|
||||
### Graduating to Code
|
||||
- [ ] Hot path identified (frequent, latency-sensitive, or expensive)
|
||||
- [ ] Optimized version doesn't remove agent capability
|
||||
- [ ] Fallback to primitives for edge cases still works
|
||||
|
||||
### Gating Decisions
|
||||
- [ ] Specific reason for each gate (security, integrity, audit)
|
||||
- [ ] Default is open access
|
||||
- [ ] Gates are conscious decisions, not defaults
|
||||
</checklist>
|
||||
@@ -1,10 +1,188 @@
|
||||
<overview>
|
||||
Mobile agent-native apps face unique challenges: background execution limits, system permissions, network constraints, and cost sensitivity. This guide covers patterns for building robust agent experiences on iOS and Android.
|
||||
Mobile is a first-class platform for agent-native apps. It has unique constraints and opportunities. This guide covers why mobile matters, iOS storage architecture, checkpoint/resume patterns, and cost-aware design.
|
||||
</overview>
|
||||
|
||||
<why_mobile>
|
||||
## Why Mobile Matters
|
||||
|
||||
Mobile devices offer unique advantages for agent-native apps:
|
||||
|
||||
### A File System
|
||||
Agents can work with files naturally, using the same primitives that work everywhere else. The filesystem is the universal interface.
|
||||
|
||||
### Rich Context
|
||||
A walled garden you get access to. Health data, location, photos, calendars—context that doesn't exist on desktop or web. This enables deeply personalized agent experiences.
|
||||
|
||||
### Local Apps
|
||||
Everyone has their own copy of the app. This opens opportunities that aren't fully realized yet: apps that modify themselves, fork themselves, evolve per-user. App Store policies constrain some of this today, but the foundation is there.
|
||||
|
||||
### Cross-Device Sync
|
||||
If you use the file system with iCloud, all devices share the same file system. The agent's work on one device appears on all devices—without you having to build a server.
|
||||
|
||||
### The Challenge
|
||||
|
||||
**Agents are long-running. Mobile apps are not.**
|
||||
|
||||
An agent might need 30 seconds, 5 minutes, or an hour to complete a task. But iOS will background your app after seconds of inactivity, and may kill it entirely to reclaim memory. The user might switch apps, take a call, or lock their phone mid-task.
|
||||
|
||||
This means mobile agent apps need:
|
||||
- **Checkpointing** — Saving state so work isn't lost
|
||||
- **Resuming** — Picking up where you left off after interruption
|
||||
- **Background execution** — Using the limited time iOS gives you wisely
|
||||
- **On-device vs. cloud decisions** — What runs locally vs. what needs a server
|
||||
</why_mobile>
|
||||
|
||||
<ios_storage>
|
||||
## iOS Storage Architecture
|
||||
|
||||
> **Needs validation:** This is an approach that works well, but better solutions may exist.
|
||||
|
||||
For agent-native iOS apps, use iCloud Drive's Documents folder for your shared workspace. This gives you **free, automatic multi-device sync** without building a sync layer or running a server.
|
||||
|
||||
### Why iCloud Documents?
|
||||
|
||||
| Approach | Cost | Complexity | Offline | Multi-Device |
|
||||
|----------|------|------------|---------|--------------|
|
||||
| Custom backend + sync | $$$ | High | Manual | Yes |
|
||||
| CloudKit database | Free tier limits | Medium | Manual | Yes |
|
||||
| **iCloud Documents** | Free (user's storage) | Low | Automatic | Automatic |
|
||||
|
||||
iCloud Documents:
|
||||
- Uses user's existing iCloud storage (free 5GB, most users have more)
|
||||
- Automatic sync across all user's devices
|
||||
- Works offline, syncs when online
|
||||
- Files visible in Files.app for transparency
|
||||
- No server costs, no sync code to maintain
|
||||
|
||||
### Implementation: iCloud-First with Local Fallback
|
||||
|
||||
```swift
|
||||
// Get the iCloud Documents container
|
||||
func iCloudDocumentsURL() -> URL? {
|
||||
FileManager.default.url(forUbiquityContainerIdentifier: nil)?
|
||||
.appendingPathComponent("Documents")
|
||||
}
|
||||
|
||||
// Your shared workspace lives in iCloud
|
||||
class SharedWorkspace {
|
||||
let rootURL: URL
|
||||
|
||||
init() {
|
||||
// Use iCloud if available, fall back to local
|
||||
if let iCloudURL = iCloudDocumentsURL() {
|
||||
self.rootURL = iCloudURL
|
||||
} else {
|
||||
// Fallback to local Documents (user not signed into iCloud)
|
||||
self.rootURL = FileManager.default.urls(
|
||||
for: .documentDirectory,
|
||||
in: .userDomainMask
|
||||
).first!
|
||||
}
|
||||
}
|
||||
|
||||
// All file operations go through this root
|
||||
func researchPath(for bookId: String) -> URL {
|
||||
rootURL.appendingPathComponent("Research/\(bookId)")
|
||||
}
|
||||
|
||||
func journalPath() -> URL {
|
||||
rootURL.appendingPathComponent("Journal")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Directory Structure in iCloud
|
||||
|
||||
```
|
||||
iCloud Drive/
|
||||
└── YourApp/ # Your app's container
|
||||
└── Documents/ # Visible in Files.app
|
||||
├── Journal/
|
||||
│ ├── user/
|
||||
│ │ └── 2025-01-15.md # Syncs across devices
|
||||
│ └── agent/
|
||||
│ └── 2025-01-15.md # Agent observations sync too
|
||||
├── Research/
|
||||
│ └── {bookId}/
|
||||
│ ├── full_text.txt
|
||||
│ └── sources/
|
||||
├── Chats/
|
||||
│ └── {conversationId}.json
|
||||
└── context.md # Agent's accumulated knowledge
|
||||
```
|
||||
|
||||
### Handling iCloud File States
|
||||
|
||||
iCloud files may not be downloaded locally. Handle this:
|
||||
|
||||
```swift
|
||||
func readFile(at url: URL) throws -> String {
|
||||
// iCloud may create .icloud placeholder files
|
||||
if url.pathExtension == "icloud" {
|
||||
// Trigger download
|
||||
try FileManager.default.startDownloadingUbiquitousItem(at: url)
|
||||
throw FileNotYetAvailableError()
|
||||
}
|
||||
|
||||
return try String(contentsOf: url, encoding: .utf8)
|
||||
}
|
||||
|
||||
// For writes, use coordinated file access
|
||||
func writeFile(_ content: String, to url: URL) throws {
|
||||
let coordinator = NSFileCoordinator()
|
||||
var error: NSError?
|
||||
|
||||
coordinator.coordinate(
|
||||
writingItemAt: url,
|
||||
options: .forReplacing,
|
||||
error: &error
|
||||
) { newURL in
|
||||
try? content.write(to: newURL, atomically: true, encoding: .utf8)
|
||||
}
|
||||
|
||||
if let error = error { throw error }
|
||||
}
|
||||
```
|
||||
|
||||
### What iCloud Enables
|
||||
|
||||
1. **User starts experiment on iPhone** → Agent creates config file
|
||||
2. **User opens app on iPad** → Same experiment visible, no sync code needed
|
||||
3. **Agent logs observation on iPhone** → Syncs to iPad automatically
|
||||
4. **User edits journal on iPad** → iPhone sees the edit
|
||||
|
||||
### Entitlements Required
|
||||
|
||||
Add to your app's entitlements:
|
||||
|
||||
```xml
|
||||
<key>com.apple.developer.icloud-container-identifiers</key>
|
||||
<array>
|
||||
<string>iCloud.com.yourcompany.yourapp</string>
|
||||
</array>
|
||||
<key>com.apple.developer.icloud-services</key>
|
||||
<array>
|
||||
<string>CloudDocuments</string>
|
||||
</array>
|
||||
<key>com.apple.developer.ubiquity-container-identifiers</key>
|
||||
<array>
|
||||
<string>iCloud.com.yourcompany.yourapp</string>
|
||||
</array>
|
||||
```
|
||||
|
||||
### When NOT to Use iCloud Documents
|
||||
|
||||
- **Sensitive data** - Use Keychain or encrypted local storage instead
|
||||
- **High-frequency writes** - iCloud sync has latency; use local + periodic sync
|
||||
- **Large media files** - Consider CloudKit Assets or on-demand resources
|
||||
- **Shared between users** - iCloud Documents is single-user; use CloudKit for sharing
|
||||
</ios_storage>
|
||||
|
||||
<background_execution>
|
||||
## Background Execution & Resumption
|
||||
|
||||
> **Needs validation:** These patterns work but better solutions may exist.
|
||||
|
||||
Mobile apps can be suspended or terminated at any time. Agents must handle this gracefully.
|
||||
|
||||
### The Challenge
|
||||
@@ -623,13 +801,48 @@ class AgentOrchestrator {
|
||||
```
|
||||
</battery_awareness>
|
||||
|
||||
<on_device_vs_cloud>
|
||||
## On-Device vs. Cloud
|
||||
|
||||
Understanding what runs where in a mobile agent-native app:
|
||||
|
||||
| Component | On-Device | Cloud |
|
||||
|-----------|-----------|-------|
|
||||
| Orchestration | ✅ | |
|
||||
| Tool execution | ✅ (file ops, photo access, HealthKit) | |
|
||||
| LLM calls | | ✅ (Anthropic API) |
|
||||
| Checkpoints | ✅ (local files) | Optional via iCloud |
|
||||
| Long-running agents | Limited by iOS | Possible with server |
|
||||
|
||||
### Implications
|
||||
|
||||
**Network required for reasoning:**
|
||||
- The app needs network connectivity for LLM calls
|
||||
- Design tools to degrade gracefully when network is unavailable
|
||||
- Consider offline caching for common queries
|
||||
|
||||
**Data stays local:**
|
||||
- File operations happen on device
|
||||
- Sensitive data never leaves the device unless explicitly synced
|
||||
- Privacy is preserved by default
|
||||
|
||||
**Long-running agents:**
|
||||
For truly long-running agents (hours), consider a server-side orchestrator that can run indefinitely, with the mobile app as a viewer and input mechanism.
|
||||
</on_device_vs_cloud>
|
||||
|
||||
<checklist>
|
||||
## Mobile Agent-Native Checklist
|
||||
|
||||
**iOS Storage:**
|
||||
- [ ] iCloud Documents as primary storage (or conscious alternative)
|
||||
- [ ] Local Documents fallback when iCloud unavailable
|
||||
- [ ] Handle `.icloud` placeholder files (trigger download)
|
||||
- [ ] Use NSFileCoordinator for conflict-safe writes
|
||||
|
||||
**Background Execution:**
|
||||
- [ ] Checkpoint/resume implemented for all agent sessions
|
||||
- [ ] State machine for agent lifecycle (idle, running, backgrounded, etc.)
|
||||
- [ ] Background task extension for critical saves
|
||||
- [ ] Background task extension for critical saves (30 second window)
|
||||
- [ ] User-visible status for backgrounded agents
|
||||
|
||||
**Permissions:**
|
||||
|
||||
@@ -0,0 +1,443 @@
|
||||
<overview>
|
||||
Agent-native architecture has consequences for how products feel, not just how they're built. This document covers progressive disclosure of complexity, discovering latent demand through agent usage, and designing approval flows that match stakes and reversibility.
|
||||
</overview>
|
||||
|
||||
<progressive_disclosure>
|
||||
## Progressive Disclosure of Complexity
|
||||
|
||||
The best agent-native applications are simple to start but endlessly powerful.
|
||||
|
||||
### The Excel Analogy
|
||||
|
||||
Excel is the canonical example: you can use it for a grocery list, or you can build complex financial models. The same tool, radically different depths of use.
|
||||
|
||||
Claude Code has this quality: fix a typo, or refactor an entire codebase. The interface is the same—natural language—but the capability scales with the ask.
|
||||
|
||||
### The Pattern
|
||||
|
||||
Agent-native applications should aspire to this:
|
||||
|
||||
**Simple entry:** Basic requests work immediately with no learning curve
|
||||
```
|
||||
User: "Organize my downloads"
|
||||
Agent: [Does it immediately, no configuration needed]
|
||||
```
|
||||
|
||||
**Discoverable depth:** Users find they can do more as they explore
|
||||
```
|
||||
User: "Organize my downloads by project"
|
||||
Agent: [Adapts to preference]
|
||||
|
||||
User: "Every Monday, review last week's downloads"
|
||||
Agent: [Sets up recurring workflow]
|
||||
```
|
||||
|
||||
**No ceiling:** Power users can push the system in ways you didn't anticipate
|
||||
```
|
||||
User: "Cross-reference my downloads with my calendar and flag
|
||||
anything I downloaded during a meeting that I haven't
|
||||
followed up on"
|
||||
Agent: [Composes capabilities to accomplish this]
|
||||
```
|
||||
|
||||
### How This Emerges
|
||||
|
||||
This isn't something you design directly. It **emerges naturally from the architecture:**
|
||||
|
||||
1. When features are prompts and tools are composable...
|
||||
2. Users can start simple ("organize my downloads")...
|
||||
3. And gradually discover complexity ("every Monday, review last week's...")...
|
||||
4. Without you having to build each level explicitly
|
||||
|
||||
The agent meets users where they are.
|
||||
|
||||
### Design Implications
|
||||
|
||||
- **Don't force configuration upfront** - Let users start immediately
|
||||
- **Don't hide capabilities** - Make them discoverable through use
|
||||
- **Don't cap complexity** - If the agent can do it, let users ask for it
|
||||
- **Do provide hints** - Help users discover what's possible
|
||||
</progressive_disclosure>
|
||||
|
||||
<latent_demand_discovery>
|
||||
## Latent Demand Discovery
|
||||
|
||||
Traditional product development: imagine what users want, build it, see if you're right.
|
||||
|
||||
Agent-native product development: build a capable foundation, observe what users ask the agent to do, formalize the patterns that emerge.
|
||||
|
||||
### The Shift
|
||||
|
||||
**Traditional approach:**
|
||||
```
|
||||
1. Imagine features users might want
|
||||
2. Build them
|
||||
3. Ship
|
||||
4. Hope you guessed right
|
||||
5. If wrong, rebuild
|
||||
```
|
||||
|
||||
**Agent-native approach:**
|
||||
```
|
||||
1. Build capable foundation (atomic tools, parity)
|
||||
2. Ship
|
||||
3. Users ask agent for things
|
||||
4. Observe what they're asking for
|
||||
5. Patterns emerge
|
||||
6. Formalize patterns into domain tools or prompts
|
||||
7. Repeat
|
||||
```
|
||||
|
||||
### The Flywheel
|
||||
|
||||
```
|
||||
Build with atomic tools and parity
|
||||
↓
|
||||
Users ask for things you didn't anticipate
|
||||
↓
|
||||
Agent composes tools to accomplish them
|
||||
(or fails, revealing a capability gap)
|
||||
↓
|
||||
You observe patterns in what's being requested
|
||||
↓
|
||||
Add domain tools or prompts to optimize common patterns
|
||||
↓
|
||||
(Repeat)
|
||||
```
|
||||
|
||||
### What You Learn
|
||||
|
||||
**When users ask and the agent succeeds:**
|
||||
- This is a real need
|
||||
- Your architecture supports it
|
||||
- Consider optimizing with a domain tool if it's common
|
||||
|
||||
**When users ask and the agent fails:**
|
||||
- This is a real need
|
||||
- You have a capability gap
|
||||
- Fix the gap: add tool, fix parity, improve context
|
||||
|
||||
**When users don't ask for something:**
|
||||
- Maybe they don't need it
|
||||
- Or maybe they don't know it's possible (capability hiding)
|
||||
|
||||
### Implementation
|
||||
|
||||
**Log agent requests:**
|
||||
```typescript
|
||||
async function handleAgentRequest(request: string) {
|
||||
// Log what users are asking for
|
||||
await analytics.log({
|
||||
type: 'agent_request',
|
||||
request: request,
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
|
||||
// Process request...
|
||||
}
|
||||
```
|
||||
|
||||
**Track success/failure:**
|
||||
```typescript
|
||||
async function completeAgentSession(session: AgentSession) {
|
||||
await analytics.log({
|
||||
type: 'agent_session',
|
||||
request: session.initialRequest,
|
||||
succeeded: session.status === 'completed',
|
||||
toolsUsed: session.toolCalls.map(t => t.name),
|
||||
iterations: session.iterationCount,
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Review patterns:**
|
||||
- What are users asking for most?
|
||||
- What's failing? Why?
|
||||
- What would benefit from a domain tool?
|
||||
- What needs better context injection?
|
||||
|
||||
### Example: Discovering "Weekly Review"
|
||||
|
||||
```
|
||||
Week 1: Users start asking "summarize my activity this week"
|
||||
Agent: Composes list_files + read_file, works but slow
|
||||
|
||||
Week 2: More users asking similar things
|
||||
Pattern emerges: weekly review is common
|
||||
|
||||
Week 3: Add prompt section for weekly review
|
||||
Faster, more consistent, still flexible
|
||||
|
||||
Week 4: If still common and performance matters
|
||||
Add domain tool: generate_weekly_summary
|
||||
```
|
||||
|
||||
You didn't have to guess that weekly review would be popular. You discovered it.
|
||||
</latent_demand_discovery>
|
||||
|
||||
<approval_and_agency>
|
||||
## Approval and User Agency
|
||||
|
||||
When agents take unsolicited actions—doing things on their own rather than responding to explicit requests—you need to decide how much autonomy to grant.
|
||||
|
||||
> **Note:** This framework applies to unsolicited agent actions. If the user explicitly asks the agent to do something ("send that email"), that's already approval—the agent just does it.
|
||||
|
||||
### The Stakes/Reversibility Matrix
|
||||
|
||||
Consider two dimensions:
|
||||
- **Stakes:** How much does it matter if this goes wrong?
|
||||
- **Reversibility:** How easy is it to undo?
|
||||
|
||||
| Stakes | Reversibility | Pattern | Example |
|
||||
|--------|---------------|---------|---------|
|
||||
| Low | Easy | **Auto-apply** | Organizing files |
|
||||
| Low | Hard | **Quick confirm** | Publishing to a private feed |
|
||||
| High | Easy | **Suggest + apply** | Code changes with undo |
|
||||
| High | Hard | **Explicit approval** | Sending emails, payments |
|
||||
|
||||
### Patterns in Detail
|
||||
|
||||
**Auto-apply (low stakes, easy reversal):**
|
||||
```
|
||||
Agent: [Organizes files into folders]
|
||||
Agent: "I organized your downloads into folders by type.
|
||||
You can undo with Cmd+Z or move them back."
|
||||
```
|
||||
User doesn't need to approve—it's easy to undo and doesn't matter much.
|
||||
|
||||
**Quick confirm (low stakes, hard reversal):**
|
||||
```
|
||||
Agent: "I've drafted a post about your reading insights.
|
||||
Publish to your feed?"
|
||||
[Publish] [Edit first] [Cancel]
|
||||
```
|
||||
One-tap confirm because stakes are low, but it's hard to un-publish.
|
||||
|
||||
**Suggest + apply (high stakes, easy reversal):**
|
||||
```
|
||||
Agent: "I recommend these code changes to fix the bug:
|
||||
[Shows diff]
|
||||
Apply? Changes can be reverted with git."
|
||||
[Apply] [Modify] [Cancel]
|
||||
```
|
||||
Shows what will happen, makes reversal clear.
|
||||
|
||||
**Explicit approval (high stakes, hard reversal):**
|
||||
```
|
||||
Agent: "I've drafted this email to your team about the deadline change:
|
||||
[Shows full email]
|
||||
This will send immediately and cannot be unsent.
|
||||
Type 'send' to confirm."
|
||||
```
|
||||
Requires explicit action, makes consequences clear.
|
||||
|
||||
### Implementation
|
||||
|
||||
```swift
|
||||
enum ApprovalLevel {
|
||||
case autoApply // Just do it
|
||||
case quickConfirm // One-tap approval
|
||||
case suggestApply // Show preview, ask to apply
|
||||
case explicitApproval // Require explicit confirmation
|
||||
}
|
||||
|
||||
func approvalLevelFor(action: AgentAction) -> ApprovalLevel {
|
||||
let stakes = assessStakes(action)
|
||||
let reversibility = assessReversibility(action)
|
||||
|
||||
switch (stakes, reversibility) {
|
||||
case (.low, .easy): return .autoApply
|
||||
case (.low, .hard): return .quickConfirm
|
||||
case (.high, .easy): return .suggestApply
|
||||
case (.high, .hard): return .explicitApproval
|
||||
}
|
||||
}
|
||||
|
||||
func assessStakes(_ action: AgentAction) -> Stakes {
|
||||
switch action {
|
||||
case .organizeFiles: return .low
|
||||
case .publishToFeed: return .low
|
||||
case .modifyCode: return .high
|
||||
case .sendEmail: return .high
|
||||
case .makePayment: return .high
|
||||
}
|
||||
}
|
||||
|
||||
func assessReversibility(_ action: AgentAction) -> Reversibility {
|
||||
switch action {
|
||||
case .organizeFiles: return .easy // Can move back
|
||||
case .publishToFeed: return .hard // People might see it
|
||||
case .modifyCode: return .easy // Git revert
|
||||
case .sendEmail: return .hard // Can't unsend
|
||||
case .makePayment: return .hard // Money moved
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Self-Modification Considerations
|
||||
|
||||
When agents can modify their own behavior—changing prompts, updating preferences, adjusting workflows—the goals are:
|
||||
|
||||
1. **Visibility:** User can see what changed
|
||||
2. **Understanding:** User understands the effects
|
||||
3. **Rollback:** User can undo changes
|
||||
|
||||
Approval flows are one way to achieve this. Audit logs with easy rollback could be another. **The principle is: make it legible.**
|
||||
|
||||
```swift
|
||||
// When agent modifies its own prompt
|
||||
func agentSelfModify(change: PromptChange) async {
|
||||
// Log the change
|
||||
await auditLog.record(change)
|
||||
|
||||
// Create checkpoint for rollback
|
||||
await createCheckpoint(currentState)
|
||||
|
||||
// Notify user (could be async/batched)
|
||||
await notifyUser("I've adjusted my approach: \(change.summary)")
|
||||
|
||||
// Apply change
|
||||
await applyChange(change)
|
||||
}
|
||||
```
|
||||
</approval_and_agency>
|
||||
|
||||
<capability_visibility>
|
||||
## Capability Visibility
|
||||
|
||||
Users need to discover what the agent can do. Hidden capabilities lead to underutilization.
|
||||
|
||||
### The Problem
|
||||
|
||||
```
|
||||
User: "Help me with my reading"
|
||||
Agent: "What would you like help with?"
|
||||
// Agent doesn't mention it can publish to feed, research books,
|
||||
// generate introductions, analyze themes...
|
||||
```
|
||||
|
||||
The agent can do these things, but the user doesn't know.
|
||||
|
||||
### Solutions
|
||||
|
||||
**Onboarding hints:**
|
||||
```
|
||||
Agent: "I can help you with your reading in several ways:
|
||||
- Research any book (web search + save findings)
|
||||
- Generate personalized introductions
|
||||
- Publish insights to your reading feed
|
||||
- Analyze themes across your library
|
||||
What interests you?"
|
||||
```
|
||||
|
||||
**Contextual suggestions:**
|
||||
```
|
||||
User: "I just finished reading 1984"
|
||||
Agent: "Great choice! Would you like me to:
|
||||
- Research historical context?
|
||||
- Compare it to other books in your library?
|
||||
- Publish an insight about it to your feed?"
|
||||
```
|
||||
|
||||
**Progressive revelation:**
|
||||
```
|
||||
// After user uses basic features
|
||||
Agent: "By the way, you can also ask me to set up
|
||||
recurring tasks, like 'every Monday, review my
|
||||
reading progress.' Just let me know!"
|
||||
```
|
||||
|
||||
### Balance
|
||||
|
||||
- **Don't overwhelm** with all capabilities upfront
|
||||
- **Do reveal** capabilities naturally through use
|
||||
- **Don't assume** users will discover things on their own
|
||||
- **Do make** capabilities visible when relevant
|
||||
</capability_visibility>
|
||||
|
||||
<designing_for_trust>
|
||||
## Designing for Trust
|
||||
|
||||
Agent-native apps require trust. Users are giving an AI significant capability. Build trust through:
|
||||
|
||||
### Transparency
|
||||
|
||||
- Show what the agent is doing (tool calls, progress)
|
||||
- Explain reasoning when it matters
|
||||
- Make all agent work inspectable (files, logs)
|
||||
|
||||
### Predictability
|
||||
|
||||
- Consistent behavior for similar requests
|
||||
- Clear patterns for when approval is needed
|
||||
- No surprises in what the agent can access
|
||||
|
||||
### Reversibility
|
||||
|
||||
- Easy undo for agent actions
|
||||
- Checkpoints before significant changes
|
||||
- Clear rollback paths
|
||||
|
||||
### Control
|
||||
|
||||
- User can stop agent at any time
|
||||
- User can adjust agent behavior (prompts, preferences)
|
||||
- User can restrict capabilities if desired
|
||||
|
||||
### Implementation
|
||||
|
||||
```swift
|
||||
struct AgentTransparency {
|
||||
// Show what's happening
|
||||
func onToolCall(_ tool: ToolCall) {
|
||||
showInUI("Using \(tool.name)...")
|
||||
}
|
||||
|
||||
// Explain reasoning
|
||||
func onDecision(_ decision: AgentDecision) {
|
||||
if decision.needsExplanation {
|
||||
showInUI("I chose this because: \(decision.reasoning)")
|
||||
}
|
||||
}
|
||||
|
||||
// Make work inspectable
|
||||
func onOutput(_ output: AgentOutput) {
|
||||
// All output is in files user can see
|
||||
// Or in visible UI state
|
||||
}
|
||||
}
|
||||
```
|
||||
</designing_for_trust>
|
||||
|
||||
<checklist>
|
||||
## Product Design Checklist
|
||||
|
||||
### Progressive Disclosure
|
||||
- [ ] Basic requests work immediately (no config)
|
||||
- [ ] Depth is discoverable through use
|
||||
- [ ] No artificial ceiling on complexity
|
||||
- [ ] Capability hints provided
|
||||
|
||||
### Latent Demand Discovery
|
||||
- [ ] Agent requests are logged
|
||||
- [ ] Success/failure is tracked
|
||||
- [ ] Patterns are reviewed regularly
|
||||
- [ ] Common patterns formalized into tools/prompts
|
||||
|
||||
### Approval & Agency
|
||||
- [ ] Stakes assessed for each action type
|
||||
- [ ] Reversibility assessed for each action type
|
||||
- [ ] Approval pattern matches stakes/reversibility
|
||||
- [ ] Self-modification is legible (visible, understandable, reversible)
|
||||
|
||||
### Capability Visibility
|
||||
- [ ] Onboarding reveals key capabilities
|
||||
- [ ] Contextual suggestions provided
|
||||
- [ ] Users aren't expected to guess what's possible
|
||||
|
||||
### Trust
|
||||
- [ ] Agent actions are transparent
|
||||
- [ ] Behavior is predictable
|
||||
- [ ] Actions are reversible
|
||||
- [ ] User has control
|
||||
</checklist>
|
||||
Reference in New Issue
Block a user