feat: make skills platform-agnostic across coding agents (#330)

This commit is contained in:
Trevin Chow
2026-03-21 17:35:22 -07:00
committed by GitHub
parent cfbfb6710a
commit 52df90a166
9 changed files with 215 additions and 580 deletions

View File

@@ -84,6 +84,18 @@ When adding or modifying skills, verify compliance with the skill spec:
- [ ] When a skill needs to ask the user a question, instruct use of the platform's blocking question tool and name the known equivalents (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) - [ ] When a skill needs to ask the user a question, instruct use of the platform's blocking question tool and name the known equivalents (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini)
- [ ] Include a fallback for environments without a question tool (e.g., present numbered options and wait for the user's reply before proceeding) - [ ] Include a fallback for environments without a question tool (e.g., present numbered options and wait for the user's reply before proceeding)
### Cross-Platform Task Tracking
- [ ] When a skill needs to create or track tasks, describe the intent (e.g., "create a task list") and name the known equivalents (`TaskCreate`/`TaskUpdate`/`TaskList` in Claude Code, `update_plan` in Codex)
- [ ] Do not reference `TodoWrite` or `TodoRead` — these are legacy Claude Code tools replaced by `TaskCreate`/`TaskUpdate`/`TaskList`
- [ ] When a skill dispatches sub-agents, prefer parallel execution but include a sequential fallback for platforms that do not support parallel dispatch
### Script Path References in Skills
- [ ] In bash code blocks, reference co-located scripts using relative paths (e.g., `bash scripts/my-script ARG`) — not `${CLAUDE_PLUGIN_ROOT}` or other platform-specific variables
- [ ] All platforms resolve script paths relative to the skill's directory; no env var prefix is needed
- [ ] Always also include a markdown link to the script (e.g., `[scripts/my-script](scripts/my-script)`) so the agent can locate and read it
### Cross-Platform Reference Rules ### Cross-Platform Reference Rules
This plugin is authored once, then converted for other agent platforms. Commands and agents are transformed during that conversion, but `plugin.skills` are usually copied almost exactly as written. This plugin is authored once, then converted for other agent platforms. Commands and agents are transformed during that conversion, but `plugin.skills` are usually copied almost exactly as written.

View File

@@ -94,16 +94,15 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
| `/changelog` | Create engaging changelogs for recent merges | | `/changelog` | Create engaging changelogs for recent merges |
| `/create-agent-skill` | Create or edit Claude Code skills | | `/create-agent-skill` | Create or edit Claude Code skills |
| `/generate_command` | Generate new slash commands | | `/generate_command` | Generate new slash commands |
| `/heal-skill` | Fix skill documentation issues |
| `/sync` | Sync Claude Code config across machines | | `/sync` | Sync Claude Code config across machines |
| `/report-bug` | Report a bug in the plugin | | `/report-bug-ce` | Report a bug in the compound-engineering plugin |
| `/reproduce-bug` | Reproduce bugs using logs and console | | `/reproduce-bug` | Reproduce bugs using logs and console |
| `/resolve_parallel` | Resolve TODO comments in parallel | | `/resolve_parallel` | Resolve TODO comments in parallel |
| `/resolve_pr_parallel` | Resolve PR comments in parallel | | `/resolve_pr_parallel` | Resolve PR comments in parallel |
| `/resolve-todo-parallel` | Resolve todos in parallel | | `/resolve-todo-parallel` | Resolve todos in parallel |
| `/triage` | Triage and prioritize issues | | `/triage` | Triage and prioritize issues |
| `/test-browser` | Run browser tests on PR-affected pages | | `/test-browser` | Run browser tests on PR-affected pages |
| `/xcode-test` | Build and test iOS apps on simulator | | `/test-xcode` | Build and test iOS apps on simulator |
| `/feature-video` | Record video walkthroughs and add to PR description | | `/feature-video` | Record video walkthroughs and add to PR description |
## Skills ## Skills

View File

@@ -1,143 +0,0 @@
---
name: heal-skill
description: Fix incorrect SKILL.md files when a skill has wrong instructions or outdated API references
argument-hint: "[optional: specific issue to fix]"
allowed-tools: [Read, Edit, Bash(ls:*), Bash(git:*)]
disable-model-invocation: true
---
<objective>
Update a skill's SKILL.md and related files based on corrections discovered during execution.
Analyze the conversation to detect which skill is running, reflect on what went wrong, propose specific fixes, get user approval, then apply changes with optional commit.
</objective>
<context>
Skill detection: !`ls -1 ./skills/*/SKILL.md | head -5`
</context>
<quick_start>
<workflow>
1. **Detect skill** from conversation context (invocation messages, recent SKILL.md references)
2. **Reflect** on what went wrong and how you discovered the fix
3. **Present** proposed changes with before/after diffs
4. **Get approval** before making any edits
5. **Apply** changes and optionally commit
</workflow>
</quick_start>
<process>
<step_1 name="detect_skill">
Identify the skill from conversation context:
- Look for skill invocation messages
- Check which SKILL.md was recently referenced
- Examine current task context
Set: `SKILL_NAME=[skill-name]` and `SKILL_DIR=./skills/$SKILL_NAME`
If unclear, ask the user.
</step_1>
<step_2 name="reflection_and_analysis">
Focus on $ARGUMENTS if provided, otherwise analyze broader context.
Determine:
- **What was wrong**: Quote specific sections from SKILL.md that are incorrect
- **Discovery method**: Context7, error messages, trial and error, documentation lookup
- **Root cause**: Outdated API, incorrect parameters, wrong endpoint, missing context
- **Scope of impact**: Single section or multiple? Related files affected?
- **Proposed fix**: Which files, which sections, before/after for each
</step_2>
<step_3 name="scan_affected_files">
```bash
ls -la $SKILL_DIR/
ls -la $SKILL_DIR/references/ 2>/dev/null
ls -la $SKILL_DIR/scripts/ 2>/dev/null
```
</step_3>
<step_4 name="present_proposed_changes">
Present changes in this format:
```
**Skill being healed:** [skill-name]
**Issue discovered:** [1-2 sentence summary]
**Root cause:** [brief explanation]
**Files to be modified:**
- [ ] SKILL.md
- [ ] references/[file].md
- [ ] scripts/[file].py
**Proposed changes:**
### Change 1: SKILL.md - [Section name]
**Location:** Line [X] in SKILL.md
**Current (incorrect):**
```
[exact text from current file]
```
**Corrected:**
```
[new text]
```
**Reason:** [why this fixes the issue]
[repeat for each change across all files]
**Impact assessment:**
- Affects: [authentication/API endpoints/parameters/examples/etc.]
**Verification:**
These changes will prevent: [specific error that prompted this]
```
</step_4>
<step_5 name="request_approval">
```
Should I apply these changes?
1. Yes, apply and commit all changes
2. Apply but don't commit (let me review first)
3. Revise the changes (I'll provide feedback)
4. Cancel (don't make changes)
Choose (1-4):
```
**Wait for user response. Do not proceed without approval.**
</step_5>
<step_6 name="apply_changes">
Only after approval (option 1 or 2):
1. Use Edit tool for each correction across all files
2. Read back modified sections to verify
3. If option 1, commit with structured message showing what was healed
4. Confirm completion with file list
</step_6>
</process>
<success_criteria>
- Skill correctly detected from conversation context
- All incorrect sections identified with before/after
- User approved changes before application
- All edits applied across SKILL.md and related files
- Changes verified by reading back
- Commit created if user chose option 1
- Completion confirmed with file list
</success_criteria>
<verification>
Before completing:
- Read back each modified section to confirm changes applied
- Ensure cross-file consistency (SKILL.md examples match references/)
- Verify git commit created if option 1 was selected
- Check no unintended files were modified
</verification>

View File

@@ -1,17 +1,17 @@
--- ---
name: report-bug name: report-bug-ce
description: Report a bug in the compound-engineering plugin description: Report a bug in the compound-engineering plugin
argument-hint: "[optional: brief description of the bug]" argument-hint: "[optional: brief description of the bug]"
disable-model-invocation: true disable-model-invocation: true
--- ---
# Report a Compounding Engineering Plugin Bug # Report a Compound Engineering Plugin Bug
Report bugs encountered while using the compound-engineering plugin. This command gathers structured information and creates a GitHub issue for the maintainer. Report bugs encountered while using the compound-engineering plugin. This skill gathers structured information and creates a GitHub issue for the maintainer.
## Step 1: Gather Bug Information ## Step 1: Gather Bug Information
Use the AskUserQuestion tool to collect the following information: Ask the user the following questions (using the platform's blocking question tool — e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini — or present numbered options and wait for a reply):
**Question 1: Bug Category** **Question 1: Bug Category**
- What type of issue are you experiencing? - What type of issue are you experiencing?
@@ -39,18 +39,25 @@ Use the AskUserQuestion tool to collect the following information:
## Step 2: Collect Environment Information ## Step 2: Collect Environment Information
Automatically gather: Automatically gather environment details. Detect the coding agent platform and collect what is available:
**OS info (all platforms):**
```bash ```bash
# Get plugin version
cat ~/.claude/plugins/installed_plugins.json 2>/dev/null | grep -A5 "compound-engineering" | head -10 || echo "Plugin info not found"
# Get Claude Code version
claude --version 2>/dev/null || echo "Claude CLI version unknown"
# Get OS info
uname -a uname -a
``` ```
**Plugin version:** Read the plugin manifest or installed plugin metadata. Common locations:
- Claude Code: `~/.claude/plugins/installed_plugins.json`
- Codex: `.codex/plugins/` or project config
- Other platforms: check the platform's plugin registry
**Agent CLI version:** Run the platform's version command:
- Claude Code: `claude --version`
- Codex: `codex --version`
- Other platforms: use the appropriate CLI version flag
If any of these fail, note "unknown" and continue — do not block the report.
## Step 3: Format the Bug Report ## Step 3: Format the Bug Report
Create a well-structured bug report with: Create a well-structured bug report with:
@@ -63,8 +70,9 @@ Create a well-structured bug report with:
## Environment ## Environment
- **Plugin Version:** [from installed_plugins.json] - **Plugin Version:** [from plugin manifest/registry]
- **Claude Code Version:** [from claude --version] - **Agent Platform:** [e.g., Claude Code, Codex, Copilot, Pi, Kilo]
- **Agent Version:** [from CLI version command]
- **OS:** [from uname] - **OS:** [from uname]
## What Happened ## What Happened
@@ -83,16 +91,14 @@ Create a well-structured bug report with:
## Error Messages ## Error Messages
```
[Any error output] [Any error output]
```
## Additional Context ## Additional Context
[Any other relevant information] [Any other relevant information]
--- ---
*Reported via `/report-bug` command* *Reported via `/report-bug-ce` skill*
``` ```
## Step 4: Create GitHub Issue ## Step 4: Create GitHub Issue
@@ -125,7 +131,7 @@ After the issue is created:
## Output Format ## Output Format
``` ```
Bug report submitted successfully! Bug report submitted successfully!
Issue: https://github.com/EveryInc/compound-engineering-plugin/issues/[NUMBER] Issue: https://github.com/EveryInc/compound-engineering-plugin/issues/[NUMBER]
Title: [compound-engineering] Bug: [description] Title: [compound-engineering] Bug: [description]
@@ -136,16 +142,16 @@ The maintainer will review your report and respond as soon as possible.
## Error Handling ## Error Handling
- If `gh` CLI is not authenticated: Prompt user to run `gh auth login` first - If `gh` CLI is not installed or not authenticated: prompt the user to install/authenticate first
- If issue creation fails: Display the formatted report so user can manually create the issue - If issue creation fails: display the formatted report so the user can manually create the issue
- If required information is missing: Re-prompt for that specific field - If required information is missing: re-prompt for that specific field
## Privacy Notice ## Privacy Notice
This command does NOT collect: This skill does NOT collect:
- Personal information - Personal information
- API keys or credentials - API keys or credentials
- Private code from your projects - Private code from projects
- File paths beyond basic OS info - File paths beyond basic OS info
Only technical information about the bug is included in the report. Only technical information about the bug is included in the report.

View File

@@ -12,7 +12,7 @@ Resolve all unresolved PR review comments by spawning parallel agents for each t
## Context Detection ## Context Detection
Claude Code automatically detects git context: Detect git context from the current working directory:
- Current branch and associated PR - Current branch and associated PR
- All PR comments and review threads - All PR comments and review threads
- Works with any PR by specifying the number - Works with any PR by specifying the number
@@ -21,10 +21,10 @@ Claude Code automatically detects git context:
### 1. Analyze ### 1. Analyze
Fetch unresolved review threads using the GraphQL script: Fetch unresolved review threads using the GraphQL script at [scripts/get-pr-comments](scripts/get-pr-comments):
```bash ```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/get-pr-comments PR_NUMBER bash scripts/get-pr-comments PR_NUMBER
``` ```
This returns only **unresolved, non-outdated** threads with file paths, line numbers, and comment bodies. This returns only **unresolved, non-outdated** threads with file paths, line numbers, and comment bodies.
@@ -37,7 +37,7 @@ gh api repos/{owner}/{repo}/pulls/PR_NUMBER/comments
### 2. Plan ### 2. Plan
Create a TodoWrite list of all unresolved items grouped by type: Create a task list of all unresolved items grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex):
- Code changes requested - Code changes requested
- Questions to answer - Questions to answer
- Style/convention fixes - Style/convention fixes
@@ -45,23 +45,17 @@ Create a TodoWrite list of all unresolved items grouped by type:
### 3. Implement (PARALLEL) ### 3. Implement (PARALLEL)
Spawn a `pr-comment-resolver` agent for each unresolved item in parallel. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each unresolved item.
If there are 3 comments, spawn 3 agents: If there are 3 comments, spawn 3 agents — one per comment. Prefer running all agents in parallel; if the platform does not support parallel dispatch, run them sequentially.
1. Task pr-comment-resolver(comment1)
2. Task pr-comment-resolver(comment2)
3. Task pr-comment-resolver(comment3)
Always run all in parallel subagents/Tasks for each Todo item.
### 4. Commit & Resolve ### 4. Commit & Resolve
- Commit changes with a clear message referencing the PR feedback - Commit changes with a clear message referencing the PR feedback
- Resolve each thread programmatically: - Resolve each thread programmatically using [scripts/resolve-pr-thread](scripts/resolve-pr-thread):
```bash ```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/resolve-pr-thread THREAD_ID bash scripts/resolve-pr-thread THREAD_ID
``` ```
- Push to remote - Push to remote
@@ -71,7 +65,7 @@ bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/resolve-pr-thread
Re-fetch comments to confirm all threads are resolved: Re-fetch comments to confirm all threads are resolved:
```bash ```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/resolve-pr-parallel/scripts/get-pr-comments PR_NUMBER bash scripts/get-pr-comments PR_NUMBER
``` ```
Should return an empty array `[]`. If threads remain, repeat from step 1. Should return an empty array `[]`. If threads remain, repeat from step 1.

View File

@@ -16,19 +16,13 @@ If any todo recommends deleting, removing, or gitignoring files in `docs/brainst
### 2. Plan ### 2. Plan
Create a TodoWrite list of all unresolved items grouped by type. Make sure to look at dependencies that might occur and prioritize the ones needed by others. For example, if you need to change a name, you must wait to do the others. Output a mermaid flow diagram showing how we can do this. Can we do everything in parallel? Do we need to do one first that leads to others in parallel? I'll put the to-dos in the mermaid diagram flow-wise so the agent knows how to proceed in order. Create a task list of all unresolved items grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex). Analyze dependencies and prioritize items that others depend on. For example, if a rename is needed, it must complete before dependent items. Output a mermaid flow diagram showing execution order — what can run in parallel, and what must run first.
### 3. Implement (PARALLEL) ### 3. Implement (PARALLEL)
Spawn a pr-comment-resolver agent for each unresolved item in parallel. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each unresolved item.
So if there are 3 comments, it will spawn 3 pr-comment-resolver agents in parallel. Like this: If there are 3 items, spawn 3 agents — one per item. Prefer running all agents in parallel; if the platform does not support parallel dispatch, run them sequentially respecting the dependency order from step 2.
1. Task pr-comment-resolver(comment1)
2. Task pr-comment-resolver(comment2)
3. Task pr-comment-resolver(comment3)
Always run all in parallel subagents/Tasks for each Todo item.
### 4. Commit & Resolve ### 4. Commit & Resolve
@@ -40,7 +34,7 @@ GATE: STOP. Verify that todos have been resolved and changes committed. Do NOT p
### 5. Compound on Lessons Learned ### 5. Compound on Lessons Learned
Run the `ce:compound` skill to document what was learned from resolving the todos. Load the `ce:compound` skill to document what was learned from resolving the todos.
The todo resolutions often surface patterns, recurring issues, or architectural insights worth capturing. This step ensures that knowledge compounds rather than being lost. The todo resolutions often surface patterns, recurring issues, or architectural insights worth capturing. This step ensures that knowledge compounds rather than being lost.

View File

@@ -8,26 +8,20 @@ disable-model-invocation: true
## Interaction Method ## Interaction Method
If `AskUserQuestion` is available, use it for all prompts below. Ask the user each question below using the platform's blocking question tool (e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no structured question tool is available, present each question as a numbered list and wait for a reply before proceeding. For multiSelect questions, accept comma-separated numbers (e.g. `1, 3`). Never skip or auto-configure.
If not, present each question as a numbered list and wait for a reply before proceeding to the next step. For multiSelect questions, accept comma-separated numbers (e.g. `1, 3`). Never skip or auto-configure. Interactive setup for `compound-engineering.local.md` — configures which agents run during `ce:review` and `ce:work`.
Interactive setup for `compound-engineering.local.md` — configures which agents run during `/ce:review` and `/ce:work`.
## Step 1: Check Existing Config ## Step 1: Check Existing Config
Read `compound-engineering.local.md` in the project root. If it exists, display current settings summary and use AskUserQuestion: Read `compound-engineering.local.md` in the project root. If it exists, display current settings and ask:
``` ```
question: "Settings file already exists. What would you like to do?" Settings file already exists. What would you like to do?
header: "Config"
options: 1. Reconfigure - Run the interactive setup again from scratch
- label: "Reconfigure" 2. View current - Show the file contents, then stop
description: "Run the interactive setup again from scratch" 3. Cancel - Keep current settings
- label: "View current"
description: "Show the file contents, then stop"
- label: "Cancel"
description: "Keep current settings"
``` ```
If "View current": read and display the file, then stop. If "View current": read and display the file, then stop.
@@ -47,16 +41,13 @@ test -f requirements.txt && echo "python" || \
echo "general" echo "general"
``` ```
Use AskUserQuestion: Ask:
``` ```
question: "Detected {type} project. How would you like to configure?" Detected {type} project. How would you like to configure?
header: "Setup"
options: 1. Auto-configure (Recommended) - Use smart defaults for {type}. Done in one click.
- label: "Auto-configure (Recommended)" 2. Customize - Choose stack, focus areas, and review depth.
description: "Use smart defaults for {type}. Done in one click."
- label: "Customize"
description: "Choose stack, focus areas, and review depth."
``` ```
### If Auto-configure → Skip to Step 4 with defaults: ### If Auto-configure → Skip to Step 4 with defaults:
@@ -73,50 +64,35 @@ options:
**a. Stack** — confirm or override: **a. Stack** — confirm or override:
``` ```
question: "Which stack should we optimize for?" Which stack should we optimize for?
header: "Stack"
options: 1. {detected_type} (Recommended) - Auto-detected from project files
- label: "{detected_type} (Recommended)" 2. Rails - Ruby on Rails, adds DHH-style and Rails-specific reviewers
description: "Auto-detected from project files" 3. Python - Adds Pythonic pattern reviewer
- label: "Rails" 4. TypeScript - Adds type safety reviewer
description: "Ruby on Rails — adds DHH-style and Rails-specific reviewers"
- label: "Python"
description: "Python — adds Pythonic pattern reviewer"
- label: "TypeScript"
description: "TypeScript — adds type safety reviewer"
``` ```
Only show options that differ from the detected type. Only show options that differ from the detected type.
**b. Focus areas** — multiSelect: **b. Focus areas** — multiSelect (user picks one or more):
``` ```
question: "Which review areas matter most?" Which review areas matter most? (comma-separated, e.g. 1, 3)
header: "Focus"
multiSelect: true 1. Security - Vulnerability scanning, auth, input validation (security-sentinel)
options: 2. Performance - N+1 queries, memory leaks, complexity (performance-oracle)
- label: "Security" 3. Architecture - Design patterns, SOLID, separation of concerns (architecture-strategist)
description: "Vulnerability scanning, auth, input validation (security-sentinel)" 4. Code simplicity - Over-engineering, YAGNI violations (code-simplicity-reviewer)
- label: "Performance"
description: "N+1 queries, memory leaks, complexity (performance-oracle)"
- label: "Architecture"
description: "Design patterns, SOLID, separation of concerns (architecture-strategist)"
- label: "Code simplicity"
description: "Over-engineering, YAGNI violations (code-simplicity-reviewer)"
``` ```
**c. Depth:** **c. Depth:**
``` ```
question: "How thorough should reviews be?" How thorough should reviews be?
header: "Depth"
options: 1. Thorough (Recommended) - Stack reviewers + all selected focus agents.
- label: "Thorough (Recommended)" 2. Fast - Stack reviewers + code simplicity only. Less context, quicker.
description: "Stack reviewers + all selected focus agents." 3. Comprehensive - All above + git history, data integrity, agent-native checks.
- label: "Fast"
description: "Stack reviewers + code simplicity only. Less context, quicker."
- label: "Comprehensive"
description: "All above + git history, data integrity, agent-native checks."
``` ```
## Step 4: Build Agent List and Write File ## Step 4: Build Agent List and Write File
@@ -151,7 +127,7 @@ plan_review_agents: [{computed plan agent list}]
# Review Context # Review Context
Add project-specific review instructions here. Add project-specific review instructions here.
These notes are passed to all review agents during /ce:review and /ce:work. These notes are passed to all review agents during ce:review and ce:work.
Examples: Examples:
- "We use Turbo Frames heavily — check for frame-busting issues" - "We use Turbo Frames heavily — check for frame-busting issues"

View File

@@ -4,56 +4,45 @@ description: Run browser tests on pages affected by current PR or branch
argument-hint: "[PR number, branch name, 'current', or --port PORT]" argument-hint: "[PR number, branch name, 'current', or --port PORT]"
--- ---
# Browser Test Command # Browser Test Skill
<command_purpose>Run end-to-end browser tests on pages affected by a PR or branch changes using agent-browser CLI.</command_purpose> Run end-to-end browser tests on pages affected by a PR or branch changes using the `agent-browser` CLI.
## CRITICAL: Use agent-browser CLI Only ## Use `agent-browser` Only For Browser Automation
**DO NOT use Chrome MCP tools (mcp__claude-in-chrome__*).** This workflow uses the `agent-browser` CLI exclusively. Do not use any alternative browser automation system, browser MCP integration, or built-in browser-control tool. If the platform offers multiple ways to control a browser, always choose `agent-browser`.
This command uses the `agent-browser` CLI exclusively. The agent-browser CLI is a Bash-based tool from Vercel that runs headless Chromium. It is NOT the same as Chrome browser automation via MCP. Use `agent-browser` for: opening pages, clicking elements, filling forms, taking screenshots, and scraping rendered content.
If you find yourself calling `mcp__claude-in-chrome__*` tools, STOP. Use `agent-browser` Bash commands instead. Platform-specific hints:
- In Claude Code, do not use Chrome MCP tools (`mcp__claude-in-chrome__*`).
## Introduction - In Codex, do not substitute unrelated browsing tools.
<role>QA Engineer specializing in browser-based end-to-end testing</role>
This command tests affected pages in a real browser, catching issues that unit tests miss:
- JavaScript integration bugs
- CSS/layout regressions
- User workflow breakages
- Console errors
## Prerequisites ## Prerequisites
<requirements>
- Local development server running (e.g., `bin/dev`, `rails server`, `npm run dev`) - Local development server running (e.g., `bin/dev`, `rails server`, `npm run dev`)
- agent-browser CLI installed (see Setup below) - `agent-browser` CLI installed (see Setup below)
- Git repository with changes to test - Git repository with changes to test
</requirements>
## Setup ## Setup
**Check installation:**
```bash ```bash
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED" command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
``` ```
**Install if needed:** Install if needed:
```bash ```bash
npm install -g agent-browser npm install -g agent-browser
agent-browser install # Downloads Chromium (~160MB) agent-browser install
``` ```
See the `agent-browser` skill for detailed usage. See the `agent-browser` skill for detailed usage.
## Main Tasks ## Workflow
### 0. Verify agent-browser Installation ### 1. Verify Installation
Before starting ANY browser testing, verify agent-browser is installed: Before starting, verify `agent-browser` is available:
```bash ```bash
command -v agent-browser >/dev/null 2>&1 && echo "Ready" || (echo "Installing..." && npm install -g agent-browser && agent-browser install) command -v agent-browser >/dev/null 2>&1 && echo "Ready" || (echo "Installing..." && npm install -g agent-browser && agent-browser install)
@@ -61,27 +50,20 @@ command -v agent-browser >/dev/null 2>&1 && echo "Ready" || (echo "Installing...
If installation fails, inform the user and stop. If installation fails, inform the user and stop.
### 1. Ask Browser Mode ### 2. Ask Browser Mode
<ask_browser_mode> Ask the user whether to run headed or headless (using the platform's question tool — e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini — or present options and wait for a reply):
Before starting tests, ask user if they want to watch the browser: ```
Do you want to watch the browser tests run?
Use AskUserQuestion with: 1. Headed (watch) - Opens visible browser window so you can see tests run
- Question: "Do you want to watch the browser tests run?" 2. Headless (faster) - Runs in background, faster but invisible
- Options: ```
1. **Headed (watch)** - Opens visible browser window so you can see tests run
2. **Headless (faster)** - Runs in background, faster but invisible
Store the choice and use `--headed` flag when user selects "Headed". Store the choice and use the `--headed` flag when the user selects option 1.
</ask_browser_mode> ### 3. Determine Test Scope
### 2. Determine Test Scope
<test_target> $ARGUMENTS </test_target>
<determine_scope>
**If PR number provided:** **If PR number provided:**
```bash ```bash
@@ -98,11 +80,7 @@ git diff --name-only main...HEAD
git diff --name-only main...[branch] git diff --name-only main...[branch]
``` ```
</determine_scope> ### 4. Map Files to Routes
### 3. Map Files to Routes
<file_to_route_mapping>
Map changed files to testable routes: Map changed files to testable routes:
@@ -120,43 +98,17 @@ Map changed files to testable routes:
Build a list of URLs to test based on the mapping. Build a list of URLs to test based on the mapping.
</file_to_route_mapping> ### 5. Detect Dev Server Port
### 4. Detect Dev Server Port Determine the dev server port using this priority:
<detect_port> 1. **Explicit argument** — if the user passed `--port 5000`, use that directly
2. **Project instructions** — check `AGENTS.md`, `CLAUDE.md`, or other instruction files for port references
Determine the dev server port using this priority order: 3. **package.json** — check dev/start scripts for `--port` flags
4. **Environment files** — check `.env`, `.env.local`, `.env.development` for `PORT=`
**Priority 1: Explicit argument** 5. **Default** — fall back to `3000`
If the user passed a port number (e.g., `/test-browser 5000` or `/test-browser --port 5000`), use that port directly.
**Priority 2: AGENTS.md / project instructions**
```bash
# Check AGENTS.md first for port references, then CLAUDE.md as compatibility fallback
grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' AGENTS.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1
grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' CLAUDE.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1
```
**Priority 3: package.json scripts**
```bash
# Check dev/start scripts for --port flags
grep -Eo '\-\-port[= ]+[0-9]{4,5}' package.json 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1
```
**Priority 4: Environment files**
```bash
# Check .env, .env.local, .env.development for PORT=
grep -h '^PORT=' .env .env.local .env.development 2>/dev/null | tail -1 | cut -d= -f2
```
**Priority 5: Default fallback**
If none of the above yields a port, default to `3000`.
Store the result in a `PORT` variable for use in all subsequent steps.
```bash ```bash
# Combined detection (run this)
PORT="${EXPLICIT_PORT:-}" PORT="${EXPLICIT_PORT:-}"
if [ -z "$PORT" ]; then if [ -z "$PORT" ]; then
PORT=$(grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' AGENTS.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1) PORT=$(grep -Eio '(port\s*[:=]\s*|localhost:)([0-9]{4,5})' AGENTS.md 2>/dev/null | grep -Eo '[0-9]{4,5}' | head -1)
@@ -174,77 +126,64 @@ PORT="${PORT:-3000}"
echo "Using dev server port: $PORT" echo "Using dev server port: $PORT"
``` ```
</detect_port> ### 6. Verify Server is Running
### 5. Verify Server is Running
<check_server>
Before testing, verify the local server is accessible using the detected port:
```bash ```bash
agent-browser open http://localhost:${PORT} agent-browser open http://localhost:${PORT}
agent-browser snapshot -i agent-browser snapshot -i
``` ```
If server is not running, inform user: If the server is not running, inform the user:
```markdown
**Server not running on port ${PORT}** ```
Server not running on port ${PORT}
Please start your development server: Please start your development server:
- Rails: `bin/dev` or `rails server` - Rails: `bin/dev` or `rails server`
- Node/Next.js: `npm run dev` - Node/Next.js: `npm run dev`
- Custom port: `/test-browser --port <your-port>` - Custom port: run this skill again with `--port <your-port>`
Then run `/test-browser` again. Then re-run this skill.
``` ```
</check_server> ### 7. Test Each Affected Page
### 6. Test Each Affected Page For each affected route:
<test_pages> **Navigate and capture snapshot:**
For each affected route, use agent-browser CLI commands (NOT Chrome MCP):
**Step 1: Navigate and capture snapshot**
```bash ```bash
agent-browser open "http://localhost:${PORT}/[route]" agent-browser open "http://localhost:${PORT}/[route]"
agent-browser snapshot -i agent-browser snapshot -i
``` ```
**Step 2: For headed mode (visual debugging)** **For headed mode:**
```bash ```bash
agent-browser --headed open "http://localhost:${PORT}/[route]" agent-browser --headed open "http://localhost:${PORT}/[route]"
agent-browser --headed snapshot -i agent-browser --headed snapshot -i
``` ```
**Step 3: Verify key elements** **Verify key elements:**
- Use `agent-browser snapshot -i` to get interactive elements with refs - Use `agent-browser snapshot -i` to get interactive elements with refs
- Page title/heading present - Page title/heading present
- Primary content rendered - Primary content rendered
- No error messages visible - No error messages visible
- Forms have expected fields - Forms have expected fields
**Step 4: Test critical interactions** **Test critical interactions:**
```bash ```bash
agent-browser click @e1 # Use ref from snapshot agent-browser click @e1
agent-browser snapshot -i agent-browser snapshot -i
``` ```
**Step 5: Take screenshots** **Take screenshots:**
```bash ```bash
agent-browser screenshot page-name.png agent-browser screenshot page-name.png
agent-browser screenshot --full page-name-full.png # Full page agent-browser screenshot --full page-name-full.png
``` ```
</test_pages> ### 8. Human Verification (When Required)
### 7. Human Verification (When Required) Pause for human input when testing touches flows that require external interaction:
<human_verification>
Pause for human input when testing touches:
| Flow Type | What to Ask | | Flow Type | What to Ask |
|-----------|-------------| |-----------|-------------|
@@ -254,11 +193,12 @@ Pause for human input when testing touches:
| SMS | "Verify you received the SMS code" | | SMS | "Verify you received the SMS code" |
| External APIs | "Confirm the [service] integration is working" | | External APIs | "Confirm the [service] integration is working" |
Use AskUserQuestion: Ask the user (using the platform's question tool, or present numbered options and wait):
```markdown
**Human Verification Needed**
This test touches the [flow type]. Please: ```
Human Verification Needed
This test touches [flow type]. Please:
1. [Action to take] 1. [Action to take]
2. [What to verify] 2. [What to verify]
@@ -267,11 +207,7 @@ Did it work correctly?
2. No - describe the issue 2. No - describe the issue
``` ```
</human_verification> ### 9. Handle Failures
### 8. Handle Failures
<failure_handling>
When a test fails: When a test fails:
@@ -279,9 +215,10 @@ When a test fails:
- Screenshot the error state: `agent-browser screenshot error.png` - Screenshot the error state: `agent-browser screenshot error.png`
- Note the exact reproduction steps - Note the exact reproduction steps
2. **Ask user how to proceed:** 2. **Ask the user how to proceed:**
```markdown
**Test Failed: [route]** ```
Test Failed: [route]
Issue: [description] Issue: [description]
Console errors: [if any] Console errors: [if any]
@@ -292,27 +229,13 @@ When a test fails:
3. Skip - Continue testing other pages 3. Skip - Continue testing other pages
``` ```
3. **If "Fix now":** 3. **If "Fix now":** investigate, propose a fix, apply, re-run the failing test
- Investigate the issue 4. **If "Create todo":** create `{id}-pending-p1-browser-test-{description}.md`, continue
- Propose a fix 5. **If "Skip":** log as skipped, continue
- Apply fix
- Re-run the failing test
4. **If "Create todo":** ### 10. Test Summary
- Create `{id}-pending-p1-browser-test-{description}.md`
- Continue testing
5. **If "Skip":** After all tests complete, present a summary:
- Log as skipped
- Continue testing
</failure_handling>
### 9. Test Summary
<test_summary>
After all tests complete, present summary:
```markdown ```markdown
## Browser Test Results ## Browser Test Results
@@ -345,8 +268,6 @@ After all tests complete, present summary:
### Result: [PASS / FAIL / PARTIAL] ### Result: [PASS / FAIL / PARTIAL]
``` ```
</test_summary>
## Quick Usage Examples ## Quick Usage Examples
```bash ```bash
@@ -365,8 +286,6 @@ After all tests complete, present summary:
## agent-browser CLI Reference ## agent-browser CLI Reference
**ALWAYS use these Bash commands. NEVER use mcp__claude-in-chrome__* tools.**
```bash ```bash
# Navigation # Navigation
agent-browser open <url> # Navigate to URL agent-browser open <url> # Navigate to URL

View File

@@ -5,167 +5,81 @@ argument-hint: "[scheme name or 'current' to use default]"
disable-model-invocation: true disable-model-invocation: true
--- ---
# Xcode Test Command # Xcode Test Skill
<command_purpose>Build, install, and test iOS apps on the simulator using XcodeBuildMCP. Captures screenshots, logs, and verifies app behavior.</command_purpose> Build, install, and test iOS apps on the simulator using XcodeBuildMCP. Captures screenshots, logs, and verifies app behavior.
## Introduction
<role>iOS QA Engineer specializing in simulator-based testing</role>
This command tests iOS/macOS apps by:
- Building for simulator
- Installing and launching the app
- Taking screenshots of key screens
- Capturing console logs for errors
- Supporting human verification for external flows
## Prerequisites ## Prerequisites
<requirements>
- Xcode installed with command-line tools - Xcode installed with command-line tools
- XcodeBuildMCP server connected - XcodeBuildMCP MCP server connected
- Valid Xcode project or workspace - Valid Xcode project or workspace
- At least one iOS Simulator available - At least one iOS Simulator available
</requirements>
## Main Tasks ## Workflow
### 0. Verify XcodeBuildMCP is Installed ### 0. Verify XcodeBuildMCP is Available
<check_mcp_installed> Check that the XcodeBuildMCP MCP server is connected by calling its `list_simulators` tool.
**First, check if XcodeBuildMCP tools are available.** MCP tool names vary by platform:
- Claude Code: `mcp__xcodebuildmcp__list_simulators`
- Other platforms: use the equivalent MCP tool call for the `XcodeBuildMCP` server's `list_simulators` method
If the tool is not found or errors, inform the user they need to add the XcodeBuildMCP MCP server:
Try calling:
``` ```
mcp__xcodebuildmcp__list_simulators({}) XcodeBuildMCP not installed
Install via Homebrew:
brew tap getsentry/xcodebuildmcp && brew install xcodebuildmcp
Or via npx (no global install needed):
npx -y xcodebuildmcp@latest mcp
Then add "XcodeBuildMCP" as an MCP server in your agent configuration
and restart your agent.
``` ```
**If the tool is not found or errors:** Do NOT proceed until XcodeBuildMCP is confirmed working.
Tell the user:
```markdown
**XcodeBuildMCP not installed**
Please install the XcodeBuildMCP server first:
\`\`\`bash
claude mcp add XcodeBuildMCP -- npx xcodebuildmcp@latest
\`\`\`
Then restart Claude Code and run `/xcode-test` again.
```
**Do NOT proceed** until XcodeBuildMCP is confirmed working.
</check_mcp_installed>
### 1. Discover Project and Scheme ### 1. Discover Project and Scheme
<discover_project> Call XcodeBuildMCP's `discover_projs` tool to find available projects, then `list_schemes` with the project path to get available schemes.
**Find available projects:** If an argument was provided, use that scheme name. If "current", use the default/last-used scheme.
```
mcp__xcodebuildmcp__discover_projs({})
```
**List schemes for the project:**
```
mcp__xcodebuildmcp__list_schemes({ project_path: "/path/to/Project.xcodeproj" })
```
**If argument provided:**
- Use the specified scheme name
- Or "current" to use the default/last-used scheme
</discover_project>
### 2. Boot Simulator ### 2. Boot Simulator
<boot_simulator> Call `list_simulators` to find available simulators. Boot the preferred simulator (iPhone 15 Pro recommended) using `boot_simulator` with the simulator's UUID.
**List available simulators:** Wait for the simulator to be ready before proceeding.
```
mcp__xcodebuildmcp__list_simulators({})
```
**Boot preferred simulator (iPhone 15 Pro recommended):**
```
mcp__xcodebuildmcp__boot_simulator({ simulator_id: "[uuid]" })
```
**Wait for simulator to be ready:**
Check simulator state before proceeding with installation.
</boot_simulator>
### 3. Build the App ### 3. Build the App
<build_app> Call `build_ios_sim_app` with the project path and scheme name.
**Build for iOS Simulator:** **On failure:**
```
mcp__xcodebuildmcp__build_ios_sim_app({
project_path: "/path/to/Project.xcodeproj",
scheme: "[scheme_name]"
})
```
**Handle build failures:**
- Capture build errors - Capture build errors
- Create P1 todo for each build error - Create a P1 todo for each build error
- Report to user with specific error details - Report to user with specific error details
**On success:** **On success:**
- Note the built app path for installation - Note the built app path for installation
- Proceed to installation step - Proceed to step 4
</build_app>
### 4. Install and Launch ### 4. Install and Launch
<install_launch> 1. Call `install_app_on_simulator` with the built app path and simulator UUID
2. Call `launch_app_on_simulator` with the bundle ID and simulator UUID
**Install app on simulator:** 3. Call `capture_sim_logs` with the simulator UUID and bundle ID to start log capture
```
mcp__xcodebuildmcp__install_app_on_simulator({
app_path: "/path/to/built/App.app",
simulator_id: "[uuid]"
})
```
**Launch the app:**
```
mcp__xcodebuildmcp__launch_app_on_simulator({
bundle_id: "[app.bundle.id]",
simulator_id: "[uuid]"
})
```
**Start capturing logs:**
```
mcp__xcodebuildmcp__capture_sim_logs({
simulator_id: "[uuid]",
bundle_id: "[app.bundle.id]"
})
```
</install_launch>
### 5. Test Key Screens ### 5. Test Key Screens
<test_screens>
For each key screen in the app: For each key screen in the app:
**Take screenshot:** **Take screenshot:**
``` Call `take_screenshot` with the simulator UUID and a descriptive filename (e.g., `screen-home.png`).
mcp__xcodebuildmcp__take_screenshot({
simulator_id: "[uuid]",
filename: "screen-[name].png"
})
```
**Review screenshot for:** **Review screenshot for:**
- UI elements rendered correctly - UI elements rendered correctly
@@ -174,23 +88,15 @@ mcp__xcodebuildmcp__take_screenshot({
- Layout looks correct - Layout looks correct
**Check logs for errors:** **Check logs for errors:**
``` Call `get_sim_logs` with the simulator UUID. Look for:
mcp__xcodebuildmcp__get_sim_logs({ simulator_id: "[uuid]" })
```
Look for:
- Crashes - Crashes
- Exceptions - Exceptions
- Error-level log messages - Error-level log messages
- Failed network requests - Failed network requests
</test_screens>
### 6. Human Verification (When Required) ### 6. Human Verification (When Required)
<human_verification> Pause for human input when testing touches flows that require device interaction.
Pause for human input when testing touches:
| Flow Type | What to Ask | | Flow Type | What to Ask |
|-----------|-------------| |-----------|-------------|
@@ -200,9 +106,10 @@ Pause for human input when testing touches:
| Camera/Photos | "Grant permissions and verify camera works" | | Camera/Photos | "Grant permissions and verify camera works" |
| Location | "Allow location access and verify map updates" | | Location | "Allow location access and verify map updates" |
Use AskUserQuestion: Ask the user (using the platform's question tool — e.g., `AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini — or present numbered options and wait):
```markdown
**Human Verification Needed** ```
Human Verification Needed
This test requires [flow type]. Please: This test requires [flow type]. Please:
1. [Action to take on simulator] 1. [Action to take on simulator]
@@ -213,12 +120,8 @@ Did it work correctly?
2. No - describe the issue 2. No - describe the issue
``` ```
</human_verification>
### 7. Handle Failures ### 7. Handle Failures
<failure_handling>
When a test fails: When a test fails:
1. **Document the failure:** 1. **Document the failure:**
@@ -226,9 +129,10 @@ When a test fails:
- Capture console logs - Capture console logs
- Note reproduction steps - Note reproduction steps
2. **Ask user how to proceed:** 2. **Ask the user how to proceed:**
```markdown
**Test Failed: [screen/feature]** ```
Test Failed: [screen/feature]
Issue: [description] Issue: [description]
Logs: [relevant error messages] Logs: [relevant error messages]
@@ -239,47 +143,38 @@ When a test fails:
3. Skip - Continue testing other screens 3. Skip - Continue testing other screens
``` ```
3. **If "Fix now":** 3. **If "Fix now":** investigate, propose a fix, rebuild and retest
- Investigate the issue in code 4. **If "Create todo":** create `{id}-pending-p1-xcode-{description}.md`, continue
- Propose a fix 5. **If "Skip":** log as skipped, continue
- Rebuild and retest
4. **If "Create todo":**
- Create `{id}-pending-p1-xcode-{description}.md`
- Continue testing
</failure_handling>
### 8. Test Summary ### 8. Test Summary
<test_summary> After all tests complete, present a summary:
After all tests complete, present summary:
```markdown ```markdown
## 📱 Xcode Test Results ## Xcode Test Results
**Project:** [project name] **Project:** [project name]
**Scheme:** [scheme name] **Scheme:** [scheme name]
**Simulator:** [simulator name] **Simulator:** [simulator name]
### Build: Success / Failed ### Build: Success / Failed
### Screens Tested: [count] ### Screens Tested: [count]
| Screen | Status | Notes | | Screen | Status | Notes |
|--------|--------|-------| |--------|--------|-------|
| Launch | Pass | | | Launch | Pass | |
| Home | Pass | | | Home | Pass | |
| Settings | Fail | Crash on tap | | Settings | Fail | Crash on tap |
| Profile | ⏭️ Skip | Requires login | | Profile | Skip | Requires login |
### Console Errors: [count] ### Console Errors: [count]
- [List any errors found] - [List any errors found]
### Human Verifications: [count] ### Human Verifications: [count]
- Sign in with Apple: Confirmed - Sign in with Apple: Confirmed
- Push notifications: Confirmed - Push notifications: Confirmed
### Failures: [count] ### Failures: [count]
- Settings screen - crash on navigation - Settings screen - crash on navigation
@@ -290,43 +185,26 @@ After all tests complete, present summary:
### Result: [PASS / FAIL / PARTIAL] ### Result: [PASS / FAIL / PARTIAL]
``` ```
</test_summary>
### 9. Cleanup ### 9. Cleanup
<cleanup>
After testing: After testing:
**Stop log capture:** 1. Call `stop_log_capture` with the simulator UUID
``` 2. Optionally call `shutdown_simulator` with the simulator UUID
mcp__xcodebuildmcp__stop_log_capture({ simulator_id: "[uuid]" })
```
**Optionally shut down simulator:**
```
mcp__xcodebuildmcp__shutdown_simulator({ simulator_id: "[uuid]" })
```
</cleanup>
## Quick Usage Examples ## Quick Usage Examples
```bash ```bash
# Test with default scheme # Test with default scheme
/xcode-test /test-xcode
# Test specific scheme # Test specific scheme
/xcode-test MyApp-Debug /test-xcode MyApp-Debug
# Test after making changes # Test after making changes
/xcode-test current /test-xcode current
``` ```
## Integration with /ce:review ## Integration with ce:review
When reviewing PRs that touch iOS code, the `/ce:review` command can spawn this as a subagent: When reviewing PRs that touch iOS code, the `ce:review` workflow can spawn an agent to run this skill, build on the simulator, test key screens, and check for crashes.
```
Task general-purpose("Run /xcode-test for scheme [name]. Build, install on simulator, test key screens, check for crashes.")
```