feat(ce-demo-reel): add demo reel skill with Python capture pipeline (#541)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-04-09 21:29:51 -07:00
committed by GitHub
parent f3cc7545e5
commit b979143ad0
18 changed files with 1796 additions and 751 deletions

View File

@@ -25,7 +25,10 @@ bun run release:validate # check plugin/marketplace consistency
- **Release versioning:** Releases are prepared by release automation, not normal feature PRs. The repo now has multiple release components (`cli`, `compound-engineering`, `coding-tutor`, `marketplace`). GitHub release PRs and GitHub Releases are the canonical release-notes surface for new releases; root `CHANGELOG.md` is only a pointer to that history. Use conventional titles such as `feat:` and `fix:` so release automation can classify change intent, but do not hand-bump release-owned versions or hand-author release notes in routine PRs.
- **Linked versions (cli + compound-engineering):** The `linked-versions` release-please plugin keeps `cli` and `compound-engineering` at the same version. This is intentional -- it simplifies version tracking across the CLI and the plugin it ships. A consequence is that a release with only plugin changes will still bump the CLI version (and vice versa). The CLI changelog may also include commits that `exclude-paths` would normally filter, because `linked-versions` overrides exclusion logic when forcing a synced bump. This is a known upstream release-please limitation, not a misconfiguration. Do not flag linked-version bumps as unnecessary.
- **Output Paths:** Keep OpenCode output at `opencode.json` and `.opencode/{agents,skills,plugins}`. For OpenCode, command go to `~/.config/opencode/commands/<name>.md`; `opencode.json` is deep-merged (never overwritten wholesale).
- **Scratch Space:** When authoring or editing skills and agents that need repo-local scratch space, instruct them to use `.context/` for ephemeral collaboration artifacts. Namespace compound-engineering workflow state under `.context/compound-engineering/<workflow-or-skill-name>/`, add a per-run subdirectory when concurrent runs are plausible, and clean scratch artifacts up after successful completion unless the user asked to inspect them or another agent still needs them. Durable outputs like plans, specs, learnings, and docs do not belong in `.context/`.
- **Scratch Space:** Two options depending on what the files are for:
- **Workflow state** (`.context/`): Files that other skills or agents in the same session may need to read — plans in progress, gate files, inter-skill handoff artifacts. Namespace under `.context/compound-engineering/<workflow-or-skill-name>/`, add a per-run subdirectory when concurrent runs are plausible, and clean up after successful completion unless the user asked to inspect them or another agent still needs them.
- **Throwaway artifacts** (`mktemp -d`): Files consumed once and discarded — captured screenshots, stitched GIFs, intermediate build outputs, recordings. Use OS temp (`mktemp -d -t <prefix>-XXXXXX`) so they live outside the repo tree entirely. No `.gitignore` needed, no risk of accidental commits, OS handles cleanup.
- **Rule of thumb:** If another skill might read it, `.context/`. If it gets uploaded/consumed and thrown away, OS temp. Durable outputs like plans, specs, learnings, and docs belong in neither — they go in `docs/`.
- **Character encoding:**
- **Identifiers** (file names, agent names, command names): ASCII only -- converters and regex patterns depend on it.
- **Markdown tables:** Use pipe-delimited (`| col | col |`), never box-drawing characters.

View File

@@ -0,0 +1,123 @@
---
title: "Prefer Python over bash for multi-step pipeline scripts"
date: 2026-04-09
category: best-practices
module: "skill scripting / ce-demo-reel"
problem_type: best_practice
component: tooling
severity: medium
applies_when:
- Script orchestrates 2+ external CLI tools (ffmpeg, curl, silicon, vhs)
- Script needs retry logic or graceful degradation on tool failure
- Script will run on macOS where bash 3.2 is the default
- Script needs to be tested from a non-shell test runner (Bun, Jest, pytest)
- Script has conditional failure paths where some errors should be caught and others should abort
tags:
- bash-vs-python
- pipeline-scripts
- skill-scripting
- set-e-footguns
- error-handling
- ce-demo-reel
---
# Prefer Python over bash for multi-step pipeline scripts
## Context
When building the `ce-demo-reel` skill, the initial implementation used a bash script (`capture-evidence.sh`) to orchestrate ffmpeg stitching, frame normalization, and catbox.moe upload. Over 4 review rounds, the script hit 4 distinct bug classes that are inherent to bash's execution model rather than simple coding mistakes.
## Guidance
Use Python for agent pipeline scripts that chain multiple CLI tools with error handling. Bash `set -euo pipefail` works for simple sequential scripts but becomes a footgun when you need controlled failure paths.
**Python subprocess model (explicit error handling):**
```python
result = subprocess.run(
["curl", "-s", "-F", f"fileToUpload=@{file_path}", url],
capture_output=True, text=True, timeout=30, check=False
)
if result.returncode != 0:
# Retry logic runs normally
attempts += 1
continue
```
**Python timeout handling (explicit catch):**
```python
try:
result = subprocess.run(cmd, timeout=60)
except subprocess.TimeoutExpired:
# Controlled failure, not a crash
return subprocess.CompletedProcess(cmd, returncode=1, stdout="", stderr="Timed out")
```
**Bash equivalent (the footgun):**
```bash
set -euo pipefail
# Exits the entire script before retry logic runs
url=$(curl -s -F "fileToUpload=@${file}" "$endpoint")
# Never reaches here on curl failure
# Workaround: || true on every line that might fail
url=$(curl -s -F "fileToUpload=@${file}" "$endpoint") || true
# Works but fragile and easy to forget
```
## Why This Matters
Agent pipeline scripts run in environments the skill author does not control: different macOS versions (bash 3.2 vs 5.x), CI containers, worktrees. Each bash portability issue requires a non-obvious workaround that reviewers must catch. Python's subprocess model makes error handling explicit and testable rather than implicit and version-dependent.
The 4 bugs found were not unusual. They are the predictable consequence of using bash for scripts that exceed its sweet spot.
## When to Apply
Use Python when:
- The script orchestrates 2+ external CLI tools
- The script needs retry logic or graceful degradation on tool failure
- The script will run on macOS where bash 3.2 is the default
- The script needs to be tested from a non-shell test runner
- The script has more than ~3 subcommands
Bash is still the right choice when:
- Simple sequential scripts with no error recovery (set -e is fine)
- One-liner wrappers around a single tool
- Scripts using only POSIX features with no array manipulation
- Git hooks and CI steps where the only failure mode is "abort the pipeline"
## Examples
**Before (bash, 4 bugs across 4 review rounds):**
| Bug | Cause | Workaround needed |
|---|---|---|
| `url=$(curl ...)` exits on network failure | `set -e` + command substitution | `\|\| true` on every line |
| `${array[-1]}` fails | Bash 3.2 lacks negative indexing | `${array[${#array[@]}-1]}` |
| Frame reduction keeps all frames for n=3,4 | Integer math: `step=(n-1)/2` with min 1 | Minimum step of 2 |
| `command -v ffmpeg` in Bun tests | `command` is a shell builtin, not spawnable | Use `which` instead |
**After (Python, all 4 bug classes eliminated):**
```python
# Negative indexing just works
last = frames[-1]
# Timeout handling is explicit
try:
result = subprocess.run(cmd, timeout=30)
except subprocess.TimeoutExpired:
return None
# Tool detection is a regular function
if not shutil.which("ffmpeg"):
sys.exit("ffmpeg not found")
# Math is straightforward
step = max(2, (len(frames) - 1) // 2)
```
## Related
- `docs/solutions/skill-design/script-first-skill-architecture.md`: covers when to use scripts vs agent logic (complementary: that doc answers "should a script do this?", this doc answers "which language?")
- `docs/solutions/agent-friendly-cli-principles.md`: CLI design from the consumer side (overlaps on exit code and stderr patterns)

View File

@@ -1,147 +0,0 @@
---
title: "Persistent GitHub authentication for agent-browser using named sessions"
category: integrations
date: 2026-03-22
tags:
- agent-browser
- github
- authentication
- chrome
- session-persistence
- lightpanda
related_to:
- plugins/compound-engineering/skills/feature-video/SKILL.md
- plugins/compound-engineering/skills/agent-browser/SKILL.md
- plugins/compound-engineering/skills/agent-browser/references/authentication.md
- plugins/compound-engineering/skills/agent-browser/references/session-management.md
---
# agent-browser Chrome Authentication for GitHub
## Problem
agent-browser needs authenticated access to GitHub for workflows like the native video
upload in the feature-video skill. Multiple authentication approaches were evaluated
before finding one that works reliably with 2FA, SSO, and OAuth.
## Investigation
| Approach | Result |
|---|---|
| `--profile` flag | Lightpanda (default engine on some installs) throws "Profiles are not supported with Lightpanda". Must use `--engine chrome`. |
| Fresh Chrome profile | No GitHub cookies. Shows "Sign up for free" instead of comment form. |
| `--auto-connect` | Requires Chrome pre-launched with `--remote-debugging-port`. Error: "No running Chrome instance found" in normal use. Impractical. |
| Auth vault (`auth save`/`auth login`) | Cannot handle 2FA, SSO, or OAuth redirects. Only works for simple username/password forms. |
| `--session-name` with Chrome engine | Cookies auto-save/restore. One-time headed login handles any auth method. **This works.** |
## Working Solution
### One-time setup (headed, user logs in manually)
```bash
# Close any running daemon (ignores engine/option changes when reused)
agent-browser close
# Open GitHub login in headed Chrome with a named session
agent-browser --engine chrome --headed --session-name github open https://github.com/login
# User logs in manually -- handles 2FA, SSO, OAuth, any method
# Verify auth
agent-browser open https://github.com/settings/profile
# If profile page loads, auth is confirmed
```
### Session validity check (before each workflow)
```bash
agent-browser close
agent-browser --engine chrome --session-name github open https://github.com/settings/profile
agent-browser get title
# Title contains username or "Profile" -> session valid, proceed
# Title contains "Sign in" or URL is github.com/login -> session expired, re-auth
```
### All subsequent runs (headless, cookies persist)
```bash
agent-browser --engine chrome --session-name github open https://github.com/...
```
## Key Findings
### Engine requirement
MUST use `--engine chrome`. Lightpanda does not support profiles, session persistence,
or state files. Any workflow that uses `--session-name`, `--profile`, `--state`, or
`state save/load` requires the Chrome engine.
Include `--engine chrome` explicitly in every command that uses an authenticated session.
Do not rely on environment defaults -- `AGENT_BROWSER_ENGINE` may be set to `lightpanda`
in some environments.
### Daemon restart
Must run `agent-browser close` before switching engine or session options. A running
daemon ignores new flags like `--engine`, `--headed`, or `--session-name`.
### Session lifetime
Cookies expire when GitHub invalidates them (typically weeks). Periodic re-authentication
is required. The feature-video skill handles this by checking session validity before
the upload step and prompting for re-auth only when needed.
### Auth vault limitations
The auth vault (`agent-browser auth save`/`auth login`) can only handle login forms with
visible username and password fields. It cannot handle:
- 2FA (TOTP, SMS, push notification)
- SSO with identity provider redirect
- OAuth consent flows
- CAPTCHA
- Device verification prompts
For GitHub and most modern services, use the one-time headed login approach instead.
### `--auto-connect` viability
Impractical for automated workflows. Requires Chrome to be pre-launched with
`--remote-debugging-port=9222`, which is not how users normally run Chrome.
## Prevention
### Skills requiring auth must declare engine
State the engine requirement in the Prerequisites section of any skill that needs
browser auth. Include `--engine chrome` in every `agent-browser` command that touches
an authenticated session.
### Session check timing
Perform the session check immediately before the step that needs auth, not at skill
start. A session valid at start may expire during a long workflow (video encoding can
take minutes).
### Recovery without restart
When expiry is detected at upload time, the video file is already encoded. Recovery:
re-authenticate, then retry only the upload step. Do not restart from the beginning.
### Concurrent sessions
Use `--session-name` with a semantically descriptive name (e.g., `github`) when multiple
skills or agents may run concurrently. Two concurrent runs sharing the default session
will interfere with each other.
### State file security
Session state files in `~/.agent-browser/sessions/` contain cookies in plaintext.
Do not commit to repositories. Add to `.gitignore` if the session directory is inside
a repo tree.
## Integration Points
This pattern is used by:
- `feature-video` skill (GitHub native video upload)
- Any future skill requiring authenticated GitHub browser access
- Potential use for other OAuth-protected services (same pattern, different session name)

View File

@@ -1,141 +0,0 @@
---
title: "GitHub inline video embedding via programmatic browser upload"
category: integrations
date: 2026-03-22
tags:
- github
- video-embedding
- agent-browser
- playwright
- feature-video
- pr-description
related_to:
- plugins/compound-engineering/skills/feature-video/SKILL.md
- plugins/compound-engineering/skills/agent-browser/SKILL.md
- plugins/compound-engineering/skills/agent-browser/references/authentication.md
---
# GitHub Native Video Upload for PRs
## Problem
Embedding video demos in GitHub PR descriptions required external storage (R2/rclone)
or GitHub Release assets. Release asset URLs render as plain download links, not inline
video players. Only `user-attachments/assets/` URLs render with GitHub's native inline
video player -- the same result as pasting a video into the PR editor manually.
The distinction is absolute:
| URL namespace | Rendering |
|---|---|
| `github.com/releases/download/...` | Plain download link (bad UX, triggers download on mobile) |
| `github.com/user-attachments/assets/...` | Native inline `<video>` player with controls |
## Investigation
1. **Public upload API** -- No public API exists. The `/upload/policies/assets` endpoint
requires browser session cookies and is not exposed via REST or GraphQL. GitHub CLI
(`gh`) has no support; issues cli/cli#1895, #4228, and #4465 are all closed as
"not planned". GitHub keeps this private to limit abuse surface (malware hosting,
spam CDN, DMCA liability).
2. **Release asset approach (Strategy B)** -- URLs render as download links, not video
players. Clickable GIF previews trigger downloads on mobile. Unacceptable UX.
3. **Claude-in-Chrome JavaScript injection with base64** -- Blocked by CSP/mixed-content
policy. HTTPS github.com cannot fetch from HTTP localhost. Base64 chunking is possible
but does not scale for larger videos.
4. **`tonkotsuboy/github-upload-image-to-pr`** -- Open-source reference confirming
browser automation is the only working approach for producing native URLs.
5. **agent-browser `upload` command** -- Works. Playwright sets files directly on hidden
file inputs without base64 encoding or fetch requests. CSP is not a factor because
Playwright's `setInputFiles` operates at the browser engine level, not via JavaScript.
## Working Solution
### Upload flow
```bash
# Navigate to PR page (authenticated Chrome session)
agent-browser --engine chrome --session-name github \
open "https://github.com/[owner]/[repo]/pull/[number]"
agent-browser scroll down 5000
# Upload video via the hidden file input
agent-browser upload '#fc-new_comment_field' tmp/videos/feature-demo.mp4
# Wait for GitHub to process the upload (typically 3-5 seconds)
agent-browser wait 5000
# Extract the URL GitHub injected into the textarea
agent-browser eval "document.getElementById('new_comment_field').value"
# Returns: https://github.com/user-attachments/assets/[uuid]
# Clear the textarea without submitting (upload already persisted server-side)
agent-browser eval "const ta = document.getElementById('new_comment_field'); \
ta.value = ''; ta.dispatchEvent(new Event('input', { bubbles: true }))"
# Embed in PR description (URL on its own line renders as inline video player)
gh pr edit [number] --body "[body with video URL on its own line]"
```
### Key selectors (validated March 2026)
| Selector | Element | Purpose |
|---|---|---|
| `#fc-new_comment_field` | Hidden `<input type="file">` | Target for `agent-browser upload`. Accepts `.mp4`, `.mov`, `.webm` and many other types. |
| `#new_comment_field` | `<textarea>` | GitHub injects the `user-attachments/assets/` URL here after processing the upload. |
GitHub's comment form contains the hidden file input. After Playwright sets the file,
GitHub uploads it server-side and injects a markdown URL into the textarea. The upload
is persisted even if the form is never submitted.
## What Was Removed
The following approaches were removed from the feature-video skill:
- R2/rclone setup and configuration
- Release asset upload flow (`gh release upload`)
- GIF preview generation (unnecessary with native inline video player)
- Strategy B fallback logic
Total: approximately 100 lines of SKILL.md content removed. The skill is now simpler
and has zero external storage dependencies.
## Prevention
### URL validation
After any upload step, confirm the extracted URL contains `user-attachments/assets/`
before writing it into the PR description. If the URL does not match, the upload failed
or used the wrong method.
### Upload failure handling
If the textarea is empty after the wait, check:
1. Session validity (did GitHub redirect to login?)
2. Wait time (processing can be slow under load -- retry after 3-5 more seconds)
3. File size (10MB free, 100MB paid accounts)
Do not silently substitute a release asset URL. Report the failure and offer to retry.
### DOM selector fragility
`#fc-new_comment_field` and `#new_comment_field` are GitHub's internal element IDs and
may change in future UI updates. If the upload stops working, snapshot the PR page and
inspect the current comment form structure for updated selectors.
### Size limits
- Free accounts: 10MB per file
- Paid (Pro, Team, Enterprise): 100MB per file
Check file size before attempting upload. Re-encode at lower quality if needed.
## References
- GitHub CLI issues: cli/cli#1895, #4228, #4465 (all closed "not planned")
- `tonkotsuboy/github-upload-image-to-pr` -- reference implementation
- GitHub Community Discussions: #29993, #46951, #28219

View File

@@ -45,7 +45,7 @@ The primary entry points for engineering work, invoked as slash commands:
| Skill | Description |
|-------|-------------|
| `/changelog` | Create engaging changelogs for recent merges |
| `/feature-video` | Record video walkthroughs and add to PR description |
| `/ce-demo-reel` | Capture a visual demo reel (GIF demos, terminal recordings, screenshots) for PRs with project-type-aware tier selection |
| `/reproduce-bug` | Reproduce bugs using logs and console |
| `/report-bug-ce` | Report a bug in the compound-engineering plugin |
| `/resolve-pr-feedback` | Resolve PR review feedback in parallel |

View File

@@ -0,0 +1,168 @@
---
name: ce-demo-reel
description: "Capture a visual demo reel (GIF, terminal recording, screenshots) for PR descriptions. Use when shipping UI changes, CLI features, or any work with observable behavior that benefits from visual proof. Also use when asked to add a demo, record a GIF, screenshot a feature, show what changed visually, create a demo reel, capture evidence, add proof to a PR, or create a before/after comparison."
argument-hint: "[what to capture, e.g. 'the new settings page' or 'CLI output of the migrate command']"
---
# Demo Reel
Detect project type, recommend a capture tier, record visual evidence, upload to a public URL, and return markdown for PR inclusion.
**Evidence means USING THE PRODUCT, not running tests.** "I ran npm test" is test evidence. Evidence capture is running the actual CLI command, opening the web app, making the API call, or triggering the feature. The distinction is absolute -- test output is never labeled "Demo" or "Screenshots."
If real product usage is impractical (requires API keys, cloud deploy, paid services, bot tokens), say so explicitly: "Real evidence would require [X]. Recommending [fallback approach] instead." Do not silently skip to "no evidence needed" or substitute test output.
Never generate fake or placeholder image/GIF URLs. If upload fails, report the failure.
## Arguments
Parse `$ARGUMENTS`:
- **What to capture**: A description of the feature or behavior to demonstrate. If provided, use it to guide which pages to visit, commands to run, or states to capture.
- If blank, infer what to capture from recoverable branch or PR context. If the target remains ambiguous after that, ask the user what they want to demonstrate before proceeding.
## Step 0: Discover Capture Target
Treat target discovery as stateless and branch-aware. The agent may be invoked in a fresh session after the work was already done, so do not rely on conversation history or assume the caller knows the right artifact.
If invoked by another skill, treat the caller-provided target as a hint, not proof. Rerun target discovery and validation before capturing anything.
Use the lightest available context to identify the best evidence target:
- Current branch name
- Open PR title and description, if one exists
- Changed files and diff against the base branch
- Recent commits
- A plan file only when it is obviously referenced by the branch, PR, arguments, or caller context
Form a capture hypothesis: "The best evidence appears to be [behavior]."
Proceed without asking only when there is exactly one high-confidence observable behavior and a plausible way to exercise it from the workspace. Ask the user what to demonstrate when multiple behaviors are plausible, the diff does not reveal how to exercise the behavior, or the requested target cannot be mapped to a product surface.
Skip evidence with a clear reason when the diff is docs-only, markdown-only, config-only, CI-only, test-only, or a pure internal refactor with no observable output change.
## Step 1: Exercise the Feature
Before capturing anything, verify the feature works by actually using it:
- **CLI tool**: Run the new/changed command and confirm the output is correct
- **Web app**: Navigate to the new/changed page and confirm it renders correctly
- **Library**: Run example code using the new/changed API
- **Bug fix**: Reproduce the original bug scenario and confirm it's fixed
Use the workspace where the feature was built. Do not reinstall from scratch. If setup requires credentials or services, use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) to ask the user.
## Step 2: Detect Project Type
Use the capture target from Step 0 to decide which directory to classify. If the diff touches a specific subdirectory with its own package manifest (e.g., `packages/cli/`, `apps/web/`), pass that as the root. Otherwise use the repo root.
```bash
python3 scripts/capture-demo.py detect --repo-root [TARGET_DIR]
```
This outputs JSON with `type` and `reason`. The result is a signal, not a gate. If the agent's understanding from Step 0 contradicts the script's classification (e.g., the diff clearly changes CLI behavior but the repo root classifies as `web-app` because of a sibling Next.js app), the agent's judgment wins.
## Step 3: Assess Change Type
Step 0 already handled the "no observable behavior" early exit. This step classifies changes that DO have observable behavior into `motion` or `states` to guide tier selection.
If arguments describe what to capture, classify based on the description. Otherwise, use the diff context from Step 0.
**Change classification:**
1. **Involves motion or interaction?** (animations, typing flows, drag-and-drop, real-time updates, continuous CLI output) -> classify as `motion`.
2. **Involves discrete states?** (before/after UI, new page, command with output, API response) -> classify as `states`.
| Change characteristic | Classification |
|---|---|
| Animations, typing, drag-and-drop, streaming output | `motion` |
| New UI, before/after, command output, API responses | `states` |
**Feature vs bug fix -- what to demonstrate:**
- **New feature (`feat`)**: Demonstrate the feature working. Show the hero moment -- the feature doing its thing.
- **Bug fix (`fix`)**: Show before AND after. Reproduce the original broken state (if possible) then show the fix. If the broken state can't be reproduced (already fixed in the workspace), capture the fixed state and describe what was broken.
Infer feat vs fix from commit messages, branch name, or plan file frontmatter (`type: feat` or `type: fix`). If unclear, ask.
## Step 4: Tool Preflight
Run the preflight check:
```bash
python3 scripts/capture-demo.py preflight
```
This outputs JSON with boolean availability for each tool: `agent_browser`, `vhs`, `silicon`, `ffmpeg`, `ffprobe`. Print a human-readable summary for the user based on the result, noting install commands for missing tools (e.g., `brew install charmbracelet/tap/vhs` for vhs, `brew install silicon` for silicon, `brew install ffmpeg` for ffmpeg).
## Step 5: Create Run Directory
Create a per-run scratch directory in the OS temp location:
```bash
mktemp -d -t demo-reel-XXXXXX
```
Use the output as `RUN_DIR`. Pass this concrete run directory to every tier reference. Evidence artifacts are ephemeral — they get uploaded to a public URL and then discarded. The OS temp directory is the right place for them, not the repo tree.
## Step 6: Recommend Tier and Ask User
Run the recommendation script with the project type from Step 2, change classification from Step 3, and preflight JSON from Step 4:
```bash
python3 scripts/capture-demo.py recommend --project-type [TYPE] --change-type [motion|states] --tools '[PREFLIGHT_JSON]'
```
This outputs JSON with `recommended` (the best tier), `available` (list of tiers whose tools are present), and `reasoning`.
Present the available tiers to the user via the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Mark the recommended tier. Always include "No evidence needed" as a final option.
**Question:** "How should evidence be captured for this change?"
**Options** (show only tiers from the `available` list, order by recommendation):
1. **Browser reel** -- Agent-browser screenshots stitched into animated GIF. Best for web apps.
2. **Terminal recording** -- VHS terminal recording to GIF. Best for CLI tools with interaction/motion.
3. **Screenshot reel** -- Styled terminal frames stitched into animated GIF. Best for discrete CLI steps.
4. **Static screenshots** -- Individual PNGs. Fallback when other tools are unavailable.
5. **No evidence needed** -- The diff speaks for itself. Best for text-only or config changes.
If the question tool is unavailable (background agent, batch mode), present the numbered options and wait for the user's reply before proceeding.
## Step 7: Execute Selected Tier
Carry the capture hypothesis from Step 0 and the feature exercise results from Step 1 into tier execution — these determine which specific pages to visit, commands to run, or states to screenshot. Substitute `[RUN_DIR]` in the tier reference with the concrete path from Step 5.
Load the appropriate reference file for the selected tier:
- **Browser reel** -> Read `references/tier-browser-reel.md`
- **Terminal recording** -> Read `references/tier-terminal-recording.md`
- **Screenshot reel** -> Read `references/tier-screenshot-reel.md`
- **Static screenshots** -> Read `references/tier-static-screenshots.md`
- **No evidence needed** -> Skip to output. Set `evidence_url` to null, `evidence_label` to null.
**Runtime failure fallback:** If the selected tier fails during execution (tool crashes, server not accessible, recording produces empty output), fall back to the next available tier rather than failing entirely. The fallback order is: browser reel -> static screenshots, terminal recording -> screenshot reel -> static screenshots, screenshot reel -> static screenshots. Static screenshots is the terminal fallback -- if even that fails, report the failure and let the user decide.
## Step 8: Upload and Approval
After the selected tier produces an artifact, read `references/upload-and-approval.md` for upload to a public host, user approval gate, and markdown embed generation.
## Output
Return these values to the caller (e.g., git-commit-push-pr):
```
=== Evidence Capture Complete ===
Tier: [browser-reel / terminal-recording / screenshot-reel / static / skipped]
Description: [1 sentence describing what the evidence shows]
URL: [public URL or "none" (multiple URLs comma-separated for static screenshots)]
=== End Evidence ===
```
The `Description` is a 1-line summary derived from the capture hypothesis in Step 0 (e.g., "CLI detect command classifying 3 project types and recommending capture tiers"). The caller decides how to format the URL(s) into the PR description.
- `Tier: skipped` or `URL: "none"` means no evidence was captured.
**Label convention:**
- Browser reel, terminal recording, screenshot reel: label as "Demo"
- Static screenshots: label as "Screenshots"
- The caller applies the label when formatting. ce-demo-reel does not generate markdown.
- Test output is never labeled "Demo" or "Screenshots"

View File

@@ -0,0 +1,105 @@
# Tier: Browser Reel
Capture 3-5 browser screenshots at key UI states and stitch into an animated GIF.
**Best for:** Web apps, desktop apps accessible via localhost or CDP.
**Output:** GIF (PNG screenshots stitched via ffmpeg two-pass palette)
**Label:** "Demo"
**Required tools:** agent-browser, ffmpeg
## Step 1: Connect to the Application
**For web apps** -- verify the dev server is accessible:
- Read `package.json` `scripts` for `dev`, `start`, `serve` commands
- Check `Procfile`, `Procfile.dev`, or `bin/dev` if they exist
- Check `Gemfile` for Rails (`bin/rails server`) or Sinatra
- Check for running processes on common ports (3000, 5000, 8080)
If the server is not running, tell the user what start command was detected and ask them to start it. Do not start it automatically (it may require environment variables, database setup, etc.).
If the server cannot be reached after the user confirms it should be running, fall back to static screenshots tier.
Once accessible, note the base URL (e.g., `http://localhost:3000`).
**For Electron/desktop apps** -- connect via Chrome DevTools Protocol (CDP):
1. Check if the app is already running with CDP enabled by probing common ports:
```bash
curl -s http://localhost:9222/json/version
```
If that returns a JSON response, the app is ready -- connect agent-browser to it:
```bash
agent-browser connect 9222
```
2. If not running, the app needs to be launched with `--remote-debugging-port`. Detect the entry point from `package.json` (look for the `main` field or `electron` in scripts), then ask the user to launch it with:
```
your-electron-app --remote-debugging-port=9222
```
If port 9222 is busy, try 9223-9230.
3. Poll until CDP is ready (timeout after 30 seconds):
```bash
curl -s http://localhost:9222/json/version
```
4. Connect agent-browser:
```bash
agent-browser connect 9222
```
**CDP advantages:** Screenshots come from the renderer's frame buffer, not macOS screen capture -- no Accessibility or Screen Recording permissions needed.
**If CDP connection fails:** Fall back to static screenshots tier. Tell the user: "Could not connect to the app via CDP. Falling back to static screenshots."
## Step 2: Capture Screenshots
Navigate to the relevant pages and capture 3-5 screenshots at key UI states:
1. **Initial/empty state** -- Before the feature is used
2. **Navigation** -- How the user reaches the feature (if not the landing page)
3. **Feature in action** -- The hero shot showing the feature working
4. **Result state** -- After interaction (data present, items created, success message)
5. **Detail view** (optional) -- Expanded item, settings panel, modal
For each screenshot, write to the concrete `RUN_DIR` created by the parent skill:
```bash
agent-browser open [URL]
```
```bash
agent-browser wait 2000
```
```bash
agent-browser screenshot [RUN_DIR]/frame-01-initial.png
```
**Capture tips:**
- Use URL navigation (`agent-browser open URL`) rather than clicking SPA elements (clicks often fail on React/Vue/Svelte SPAs)
- Wait 2-3 seconds after navigation for the page to settle
- Capture the full viewport (sidebar, header give reviewers context)
## Step 3: Stitch into GIF
Use the capture pipeline script to normalize frame dimensions, stitch with two-pass palette, and auto-reduce if over 10 MB:
```bash
python3 scripts/capture-demo.py stitch [RUN_DIR]/demo.gif [RUN_DIR]/frame-*.png
```
The script handles dimension normalization (via ffprobe + ffmpeg padding), concat demuxer stitching, palette generation, and automatic frame reduction if the GIF exceeds GitHub's 10 MB inline limit. Default is 3 seconds per frame. To adjust:
```bash
python3 scripts/capture-demo.py stitch --duration 2.0 [RUN_DIR]/demo.gif [RUN_DIR]/frame-*.png
```
**If stitching fails:** Fall back to static screenshots tier using the individual PNGs already captured. If no PNGs were captured, report the failure.
## Step 4: Cleanup
After successful GIF creation, remove individual PNG frames. Keep only the final GIF for upload.
Proceed to `references/upload-and-approval.md`.

View File

@@ -0,0 +1,61 @@
# Tier: Screenshot Reel
Render styled terminal frames from text and stitch into an animated GIF. Each frame shows one step of a CLI demo (command + output).
**Best for:** CLI tools shown as discrete steps (command -> output -> next command -> output). Also useful when VHS breaks on quoting or special characters.
**Output:** GIF (silicon PNGs stitched via ffmpeg)
**Label:** "Demo"
**Required tools:** silicon, ffmpeg
## Step 1: Write Demo Content
Create a text file with `---` delimiters between frames. Each frame shows the terminal state for one step:
Write to `[RUN_DIR]/demo-steps.txt`:
```
$ your-cli-command --flag value
Output line 1
Output line 2
Success: feature works correctly
---
$ your-cli-command --another-flag
Different output showing another aspect
Result: 42 items processed
---
$ your-cli-command --verify
All checks passed
```
**Tips:**
- Include the `$` prompt to show what the user types
- Keep each frame under ~80 characters wide for readability
- 3-5 frames is ideal -- enough to tell the story, not so many the GIF is huge
- Strip unicode characters that silicon's default font can't render (checkmarks, fancy arrows)
## Step 2: Split into Frame Files
Split the demo content on `---` lines into separate text files, one per frame:
- `[RUN_DIR]/frame-001.txt`
- `[RUN_DIR]/frame-002.txt`
- `[RUN_DIR]/frame-003.txt`
- etc.
## Step 3: Render and Stitch
Use the capture pipeline script to render each text frame through silicon and stitch into an animated GIF in a single call:
```bash
python3 scripts/capture-demo.py screenshot-reel --output [RUN_DIR]/demo.gif --duration 2.5 --text [RUN_DIR]/frame-001.txt [RUN_DIR]/frame-002.txt [RUN_DIR]/frame-003.txt
```
The script handles silicon rendering, dimension normalization, two-pass palette generation, and automatic frame reduction if the GIF exceeds limits. Default duration is 2.5 seconds per frame (faster than browser reels since terminal frames are quicker to read).
**If the script fails** (silicon rendering error, stitching error, empty output): fall back to static screenshots tier. Include the raw terminal output as a code block in the PR description instead. Label as "Terminal output", not "Screenshots".
## Step 4: Cleanup
Remove individual PNGs and text files. Keep only the final GIF for upload.
Proceed to `references/upload-and-approval.md`.

View File

@@ -0,0 +1,55 @@
# Tier: Static Screenshots
Capture individual PNG screenshots. No animation, no stitching.
**Best for:** Fallback when other tools are unavailable, library demos, or features where animation doesn't add value.
**Output:** PNG files
**Label:** "Screenshots"
**Required tools:** Varies (agent-browser for web, silicon for CLI, or native screenshot)
## Capture by Project Type
### Web app or desktop app (agent-browser available)
```bash
agent-browser open [URL]
```
```bash
agent-browser wait 2000
```
```bash
agent-browser screenshot [RUN_DIR]/screenshot-01.png
```
Capture 1-3 screenshots: before state, feature in action, result state.
### CLI tool (silicon available)
Run the command, capture its output to a text file, then render with silicon:
```bash
silicon [RUN_DIR]/output.txt -o [RUN_DIR]/screenshot-01.png --theme Dracula -l bash --pad-horiz 20 --pad-vert 20
```
### CLI tool (no silicon)
Run the command and capture the raw terminal output. Include the output as a code block in the PR description instead of an image. Label it as "Terminal output", never "Screenshot".
### Library
Run example code that exercises the new API. Capture the output as above (silicon if available, code block if not).
## Upload
Each PNG is uploaded individually. Proceed to `references/upload-and-approval.md` for each file, or upload all and present them together for approval.
For multiple screenshots, the markdown embed uses multiple image lines:
```markdown
## Screenshots
![Before](url-1)
![After](url-2)
```

View File

@@ -0,0 +1,88 @@
# Tier: Terminal Recording
Record a terminal session using VHS (charmbracelet/vhs) to produce a GIF demo.
**Best for:** CLI tools, scripts, command-line features with interaction or motion (typing, streaming output, progressive rendering).
**Output:** GIF (direct from VHS)
**Label:** "Demo"
**Required tools:** vhs
## Step 1: Plan the Recording
Before generating a .tape file, determine:
- **What command(s) to run** -- The actual product command, not test commands. "I ran npm test" is test evidence, not a demo.
- **Expected output** -- What the terminal should show when the command succeeds.
- **Terminal dimensions** -- Wide enough for the longest output line, tall enough to avoid scrolling.
- **Timing** -- Target 5-10 seconds total. Enough sleep after each command for output to render.
## Step 2: Generate .tape File
Write a VHS tape file to `[RUN_DIR]/demo.tape`:
```tape
Output [RUN_DIR]/demo.gif
Set FontSize 16
Set Width 800
Set Height 500
Set Theme "Catppuccin Mocha"
Set TypingSpeed 40ms
# Hide boring setup
Hide
Type "cd /path/to/project"
Enter
Sleep 500ms
Show
# The demo
Type "your-cli-command --flag value"
Sleep 500ms
Enter
Sleep 3s
# Let viewer read the output
Sleep 2s
```
**Key .tape directives:**
- `Output [path]` -- Where to write the GIF (must be first line)
- `Set FontSize [14-18]` -- Larger for readability
- `Set Width/Height [pixels]` -- Match content needs
- `Set Theme [name]` -- "Catppuccin Mocha" or "Dracula" are readable defaults
- `Set TypingSpeed [ms]` -- 30-50ms feels natural
- `Hide`/`Show` -- Skip boring setup (cd, source, npm install)
- `Type [text]` -- Types characters (does not execute)
- `Enter` -- Presses enter (executes the typed command)
- `Sleep [duration]` -- Wait for output to render
**Avoid:**
- Non-deterministic output (random IDs, timestamps that change between runs)
- Commands that require interactive input (prompts, password entry)
- Very long output that scrolls off screen
## Step 3: Run VHS
Use the capture pipeline script to execute the tape file and validate output:
```bash
python3 scripts/capture-demo.py terminal-recording --output [RUN_DIR]/demo.gif --tape [RUN_DIR]/demo.tape
```
The script runs VHS, validates the output exists, and reports the file size. If the GIF exceeds 10 MB, reduce by adjusting the .tape: smaller terminal dimensions (`Set Width/Height`), shorter recording (fewer sleeps), or lower font size. Re-run.
## Step 4: Quality Check
Read the generated GIF to verify:
1. Commands are visible and readable
2. Output renders completely (not cut off)
3. The feature being demonstrated is clearly shown
4. No secrets, credentials, or sensitive paths are visible
If quality is poor, revise the .tape file and re-record.
**If VHS fails** (crashes, produces empty GIF, or the command being demonstrated fails): fall back to the screenshot reel tier. Write the same commands and expected output as text frames and stitch via silicon + ffmpeg. If silicon is also unavailable, fall back to static screenshots.
Proceed to `references/upload-and-approval.md`.

View File

@@ -0,0 +1,40 @@
# Upload and Approval
Get user approval for the local artifact, upload evidence to a public URL, and generate markdown for PR inclusion.
## Step 1: Local Approval Gate
Before uploading anywhere public, present the local artifact path to the user for approval. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini).
**Question:** "Evidence captured at [RUN_DIR]/[artifact]. Review it locally and decide:"
**Options:**
1. **Looks good, upload for PR** -- proceed to upload
2. **Not good enough, try again** -- return to the tier execution step and re-capture
3. **Skip evidence for this PR** -- set evidence to null and proceed
If the question tool is unavailable (headless/background mode), present the numbered options and wait for the user's reply before proceeding.
## Step 2: Upload to catbox.moe
After the user approves the local artifact, upload the evidence file (GIF or PNG) using the capture pipeline script. Set `ARTIFACT_PATH` to the approved GIF or PNG path:
```bash
python3 scripts/capture-demo.py upload [ARTIFACT_PATH]
```
The script uploads to catbox.moe, validates the response starts with `https://`, and retries once on failure. The last line of output is the public URL (e.g., `https://files.catbox.moe/abc123.gif`).
**If upload fails** after retry, report the failure and the local artifact path. Do not commit evidence files to the repo — they are ephemeral artifacts, not source material. Tell the user: "Upload failed. Local artifact preserved at [ARTIFACT_PATH]. You can upload it manually or retry later."
For multiple files (static screenshots tier), upload each file separately.
## Step 3: Return Output
Return the structured output defined in the SKILL.md Output section: `Tier`, `Description`, and `URL`. The caller formats the evidence into the PR description. ce-demo-reel does not generate markdown.
## Step 4: Cleanup
Remove the `[RUN_DIR]` scratch directory and all temporary files. Preserve nothing -- the evidence lives at the public URL now.
If the upload failed and the user has not yet manually uploaded, preserve `[RUN_DIR]` so the artifact is still accessible.

View File

@@ -0,0 +1,653 @@
#!/usr/bin/env python3
"""
Evidence capture pipeline — deterministic helpers for the demo-reel skill.
Subcommands:
preflight Check tool availability (JSON output)
detect [--repo-root PATH] Detect project type from manifests (JSON output)
recommend --project-type T --change-type T --tools JSON Recommend capture tier (JSON output)
stitch [--duration N] OUTPUT FRAME [FRAME ...] Stitch frames into animated GIF
screenshot-reel --output OUT [--duration N] [--lang L] [--theme T] --text F [F ...] Render text frames via silicon + stitch
terminal-recording --output OUT --tape TAPE Run VHS tape file
upload FILE Upload to catbox.moe (retries once)
"""
import argparse
import json
import os
import shutil
import subprocess
import sys
import tempfile
import time
from pathlib import Path
# --- Config ---
MAX_GIF_SIZE = 10 * 1024 * 1024 # 10 MB — GitHub inline render limit
TARGET_GIF_SIZE = 5 * 1024 * 1024 # 5 MB — preferred target
CATBOX_API = "https://catbox.moe/user/api.php"
# --- Helpers ---
def die(msg):
print(f"ERROR: {msg}", file=sys.stderr)
sys.exit(1)
def check_tool(name):
return shutil.which(name) is not None
def run_cmd(cmd, timeout=120):
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout, check=False)
except subprocess.TimeoutExpired:
print(f"ERROR: Command timed out after {timeout}s: {' '.join(cmd)}", file=sys.stderr)
return subprocess.CompletedProcess(cmd, returncode=1, stdout="", stderr=f"Timed out after {timeout}s")
if result.returncode != 0:
print(f"ERROR: Command failed (exit {result.returncode}): {' '.join(cmd)}", file=sys.stderr)
if result.stderr:
print(result.stderr.strip(), file=sys.stderr)
return result
def file_size_mb(path):
return Path(path).stat().st_size / (1024 * 1024)
# --- Preflight ---
def cmd_preflight(_args):
tools = {
"agent_browser": check_tool("agent-browser"),
"vhs": check_tool("vhs"),
"silicon": check_tool("silicon"),
"ffmpeg": check_tool("ffmpeg"),
"ffprobe": check_tool("ffprobe"),
}
print(json.dumps(tools))
# --- Detect ---
ELECTRON_DEPS = {"electron", "electron-builder", "electron-forge", "electron-vite", "electron-packager"}
WEB_NODE_DEPS = {
"react", "vue", "svelte", "astro", "next", "nuxt", "@angular/core", "solid-js",
"@remix-run/react", "gatsby", "express", "fastify", "koa", "hono", "@hono/node-server",
}
WEB_RUBY_DEPS = {"rails", "sinatra", "hanami", "roda"}
WEB_GO_DEPS = {
"github.com/gin-gonic/gin", "github.com/labstack/echo", "github.com/gofiber/fiber",
"github.com/go-chi/chi", "github.com/gorilla/mux",
}
# Note: net/http is stdlib and won't appear in go.mod. The agent detects stdlib web
# servers from source imports in the diff and overrides the classification (Step 2).
WEB_PYTHON_DEPS = {"flask", "django", "fastapi", "starlette", "tornado", "sanic", "litestar"}
WEB_RUST_DEPS = {"actix-web", "axum", "rocket", "warp", "poem", "tide"}
CLI_RUBY_DEPS = {"thor", "gli", "dry-cli"}
CLI_PYTHON_DEPS = {"click", "typer", "argparse"}
def _read_file(path):
try:
return Path(path).read_text(encoding="utf-8", errors="replace")
except (OSError, IOError):
return None
def _has_any_dep(pkg_json, dep_names):
deps = set(pkg_json.get("dependencies", {}).keys())
dev_deps = set(pkg_json.get("devDependencies", {}).keys())
all_deps = deps | dev_deps
return bool(all_deps & dep_names)
def _detect_project_type(repo_root):
root = Path(repo_root)
# Try package.json first (used by multiple checks)
pkg_json = None
pkg_text = _read_file(root / "package.json")
if pkg_text:
try:
pkg_json = json.loads(pkg_text)
except json.JSONDecodeError:
pass
# 1. Desktop app (Electron)
if pkg_json and _has_any_dep(pkg_json, ELECTRON_DEPS):
return {"type": "desktop-app", "reason": "package.json contains Electron dependency"}
# 2. Web app
if pkg_json and _has_any_dep(pkg_json, WEB_NODE_DEPS):
return {"type": "web-app", "reason": "package.json contains web framework dependency"}
# Check vite with framework deps (vite alone could be anything)
if pkg_json and _has_any_dep(pkg_json, {"vite"}):
all_deps = set(pkg_json.get("dependencies", {}).keys()) | set(pkg_json.get("devDependencies", {}).keys())
if all_deps & WEB_NODE_DEPS:
return {"type": "web-app", "reason": "package.json contains vite with framework dependency"}
gemfile = _read_file(root / "Gemfile")
if gemfile:
for dep in WEB_RUBY_DEPS:
if dep in gemfile:
return {"type": "web-app", "reason": f"Gemfile contains {dep}"}
go_mod = _read_file(root / "go.mod")
if go_mod:
for dep in WEB_GO_DEPS:
if dep in go_mod:
return {"type": "web-app", "reason": f"go.mod contains {dep}"}
for pyfile in ["pyproject.toml", "requirements.txt"]:
content = _read_file(root / pyfile)
if content:
for dep in WEB_PYTHON_DEPS:
if dep in content:
return {"type": "web-app", "reason": f"{pyfile} contains {dep}"}
cargo = _read_file(root / "Cargo.toml")
if cargo:
for dep in WEB_RUST_DEPS:
if dep in cargo:
return {"type": "web-app", "reason": f"Cargo.toml contains {dep}"}
# 3. CLI tool
if pkg_json:
if "bin" in pkg_json:
return {"type": "cli-tool", "reason": "package.json has bin field"}
if (root / "bin").is_dir():
return {"type": "cli-tool", "reason": "bin/ directory exists"}
if go_mod and (root / "cmd").is_dir():
return {"type": "cli-tool", "reason": "go.mod with cmd/ directory"}
if cargo and "[[bin]]" in cargo:
return {"type": "cli-tool", "reason": "Cargo.toml has [[bin]] section"}
pyproject = _read_file(root / "pyproject.toml")
if pyproject:
if "[project.scripts]" in pyproject or "[tool.poetry.scripts]" in pyproject:
return {"type": "cli-tool", "reason": "pyproject.toml has script entry points"}
for dep in CLI_PYTHON_DEPS:
if dep in pyproject:
return {"type": "cli-tool", "reason": f"pyproject.toml contains {dep}"}
if gemfile:
for dep in CLI_RUBY_DEPS:
if dep in gemfile:
return {"type": "cli-tool", "reason": f"Gemfile contains {dep}"}
if (root / "bin").is_dir() or (root / "exe").is_dir():
return {"type": "cli-tool", "reason": "Ruby project with bin/ or exe/ directory"}
if go_mod and (root / "main.go").exists():
return {"type": "cli-tool", "reason": "main.go exists without web framework"}
# 4. Library
manifests = ["package.json", "Gemfile", "go.mod", "Cargo.toml", "pyproject.toml", "setup.py"]
has_manifest = any((root / m).exists() for m in manifests)
if not has_manifest:
# Check for gemspec
has_manifest = bool(list(root.glob("*.gemspec")))
if has_manifest:
return {"type": "library", "reason": "package manifest exists but no web/CLI signals"}
# 5. Text-only
return {"type": "text-only", "reason": "no recognized package manifest"}
def cmd_detect(args):
repo_root = args.repo_root or os.getcwd()
result = _detect_project_type(repo_root)
print(json.dumps(result))
# --- Recommend ---
def _recommend_tier(project_type, change_type, tools):
has_browser = tools.get("agent_browser", False)
has_vhs = tools.get("vhs", False)
has_silicon = tools.get("silicon", False)
has_ffmpeg = tools.get("ffmpeg", False)
has_ffprobe = tools.get("ffprobe", False)
has_stitch = has_ffmpeg and has_ffprobe # stitching requires both
recommended = None
reasoning = ""
if project_type == "web-app":
if has_browser and has_stitch:
recommended = "browser-reel"
reasoning = "Web app with agent-browser and ffmpeg available"
elif has_browser:
recommended = "static-screenshots"
reasoning = "Web app with agent-browser but no ffmpeg/ffprobe for stitching"
else:
recommended = "static-screenshots"
reasoning = "Web app without agent-browser"
elif project_type == "cli-tool":
if change_type == "motion":
if has_vhs:
recommended = "terminal-recording"
reasoning = "CLI tool with motion, VHS available"
elif has_silicon and has_stitch:
recommended = "screenshot-reel"
reasoning = "CLI tool with motion, silicon + ffmpeg available (no VHS)"
else:
recommended = "static-screenshots"
reasoning = "CLI tool with no capture tools available"
else: # states
if has_silicon and has_stitch:
recommended = "screenshot-reel"
reasoning = "CLI tool with discrete states, silicon + ffmpeg available"
elif has_vhs:
recommended = "terminal-recording"
reasoning = "CLI tool with discrete states, VHS available (no silicon)"
else:
recommended = "static-screenshots"
reasoning = "CLI tool with no capture tools available"
elif project_type == "desktop-app":
if has_browser and has_stitch:
recommended = "browser-reel"
reasoning = "Desktop app with agent-browser and ffmpeg (via localhost/CDP)"
else:
recommended = "static-screenshots"
reasoning = "Desktop app without agent-browser"
elif project_type == "library":
recommended = "static-screenshots"
reasoning = "Library projects use static screenshots"
else: # text-only or unknown
recommended = "static-screenshots"
reasoning = "Fallback to static screenshots"
# Build available tiers list
available = []
if has_browser and has_stitch:
available.append("browser-reel")
if has_vhs:
available.append("terminal-recording")
if has_silicon and has_stitch:
available.append("screenshot-reel")
available.append("static-screenshots") # always available
return {
"recommended": recommended,
"available": available,
"reasoning": reasoning,
}
def cmd_recommend(args):
try:
tools = json.loads(args.tools)
except json.JSONDecodeError:
die("--tools must be valid JSON")
result = _recommend_tier(args.project_type, args.change_type, tools)
print(json.dumps(result))
# --- Stitch ---
def _get_frame_dimensions(path):
result = run_cmd([
"ffprobe", "-v", "error", "-select_streams", "v:0",
"-show_entries", "stream=width,height", "-of", "csv=p=0", str(path),
])
if result.returncode != 0:
die(f"ffprobe failed on {path}")
parts = result.stdout.strip().split(",")
return int(parts[0]), int(parts[1])
def _stitch_frames(output, frames, duration=3.0):
if not frames:
die("No input frames provided")
for f in frames:
if not Path(f).exists():
die(f"Frame not found: {f}")
if not check_tool("ffmpeg"):
die("ffmpeg is not installed. Install with: brew install ffmpeg")
if not check_tool("ffprobe"):
die("ffprobe is not installed. Install with: brew install ffmpeg")
print(f"Stitching {len(frames)} frames into GIF ({duration}s per frame)...")
tmpdir = tempfile.mkdtemp(prefix="evidence-stitch-")
try:
# Detect max dimensions
max_w, max_h = 0, 0
for f in frames:
w, h = _get_frame_dimensions(f)
max_w = max(max_w, w)
max_h = max(max_h, h)
# Even dimensions
if max_w % 2 != 0:
max_w += 1
if max_h % 2 != 0:
max_h += 1
print(f" Target dimensions: {max_w}x{max_h}")
# Normalize frames
normalized = []
for i, f in enumerate(frames):
out = os.path.join(tmpdir, f"frame_{i:03d}.png")
result = run_cmd([
"ffmpeg", "-y", "-v", "error", "-i", f,
"-vf", f"scale={max_w}:{max_h}:force_original_aspect_ratio=decrease,"
f"pad={max_w}:{max_h}:(ow-iw)/2:0:color=#0d1117",
out,
])
if result.returncode != 0:
die(f"ffmpeg failed to normalize frame: {f}")
normalized.append(out)
print(f" Normalized {len(normalized)} frames")
# Write concat file
concat_file = os.path.join(tmpdir, "concat.txt")
with open(concat_file, "w") as fh:
for f in normalized:
fh.write(f"file '{os.path.basename(f)}'\n")
fh.write(f"duration {duration}\n")
# Last file repeated without duration (concat demuxer requirement)
fh.write(f"file '{os.path.basename(normalized[-1])}'\n")
# Two-pass palette generation
palette = os.path.join(tmpdir, "palette.png")
result = run_cmd([
"ffmpeg", "-y", "-v", "error",
"-f", "concat", "-safe", "0", "-i", concat_file,
"-vf", "palettegen=stats_mode=diff",
palette,
])
if result.returncode != 0:
die("ffmpeg palette generation failed")
# Generate GIF with palette
result = run_cmd([
"ffmpeg", "-y", "-v", "error",
"-f", "concat", "-safe", "0", "-i", concat_file,
"-i", palette,
"-lavfi", "paletteuse=dither=bayer:bayer_scale=3",
"-loop", "0",
output,
])
if result.returncode != 0:
die("ffmpeg GIF encoding failed")
if not Path(output).exists():
die("GIF creation failed: no output file")
size = Path(output).stat().st_size
size_mb = size / (1024 * 1024)
print(f" Created: {output} ({size_mb:.1f} MB, {len(frames)} frames)")
# Auto-reduce if over limit
if size > MAX_GIF_SIZE:
print(" GIF exceeds 10 MB limit. Reducing...")
if len(frames) > 2:
print(" Dropping middle frame(s) and re-stitching...")
reduced = [frames[0]]
step = max(2, (len(frames) - 1) // 2)
for j in range(step, len(frames) - 1, step):
reduced.append(frames[j])
reduced.append(frames[-1])
if len(reduced) < len(frames):
print(f" Reduced from {len(frames)} to {len(reduced)} frames")
shutil.rmtree(tmpdir, ignore_errors=True)
_stitch_frames(output, reduced, duration)
return
print(" WARNING: Could not reduce below 10 MB. GIF may not render inline on GitHub.")
elif size > TARGET_GIF_SIZE:
print(" Note: GIF is over 5 MB preferred target but under 10 MB limit. Acceptable.")
finally:
shutil.rmtree(tmpdir, ignore_errors=True)
def cmd_stitch(args):
_stitch_frames(args.output, args.frames, args.duration)
# --- Screenshot Reel ---
def cmd_screenshot_reel(args):
if not check_tool("silicon"):
die("silicon is not installed. Install with: brew install silicon")
if not check_tool("ffmpeg"):
die("ffmpeg is not installed. Install with: brew install ffmpeg")
tmpdir = tempfile.mkdtemp(prefix="evidence-reel-")
try:
frame_pngs = []
for i, text_file in enumerate(args.text):
if not Path(text_file).exists():
die(f"Text file not found: {text_file}")
out_png = os.path.join(tmpdir, f"frame_{i:03d}.png")
result = run_cmd([
"silicon", text_file,
"-o", out_png,
"--theme", args.theme,
"-l", args.lang,
"--pad-horiz", "20",
"--pad-vert", "40",
"--no-line-number",
"--no-round-corner",
"--background", args.background,
])
if result.returncode != 0 or not Path(out_png).exists():
die(f"silicon failed to render {text_file}")
frame_pngs.append(out_png)
print(f"Rendered {len(frame_pngs)} frames via silicon")
_stitch_frames(args.output, frame_pngs, args.duration)
finally:
shutil.rmtree(tmpdir, ignore_errors=True)
# --- Terminal Recording ---
def cmd_terminal_recording(args):
if not check_tool("vhs"):
die("vhs is not installed. Install with: brew install charmbracelet/tap/vhs")
tape_path = args.tape
if not Path(tape_path).exists():
die(f"Tape file not found: {tape_path}")
# Parse Output directive from tape file
output_path = args.output
tape_content = Path(tape_path).read_text()
tape_has_output = False
for line in tape_content.splitlines():
stripped = line.strip()
if stripped.startswith("Output "):
tape_has_output = True
if not output_path:
output_path = stripped.split(None, 1)[1].strip().strip('"').strip("'")
break
if not output_path:
die("No output path: use --output or set Output in the tape file")
# If --output differs from tape's Output directive, rewrite to a temp tape
actual_tape = tape_path
tmp_tape = None
if output_path and tape_has_output:
# Rewrite the Output line to use the requested path
lines = tape_content.splitlines()
rewritten = []
for line in lines:
if line.strip().startswith("Output "):
rewritten.append(f'Output "{output_path}"')
else:
rewritten.append(line)
fd, tmp_tape = tempfile.mkstemp(suffix=".tape", prefix="vhs-")
os.close(fd)
Path(tmp_tape).write_text("\n".join(rewritten) + "\n")
actual_tape = tmp_tape
elif output_path and not tape_has_output:
# No Output in tape — prepend one
fd, tmp_tape = tempfile.mkstemp(suffix=".tape", prefix="vhs-")
os.close(fd)
Path(tmp_tape).write_text(f'Output "{output_path}"\n{tape_content}')
actual_tape = tmp_tape
print(f"Running VHS tape: {tape_path}")
result = run_cmd(["vhs", actual_tape], timeout=300)
if tmp_tape and Path(tmp_tape).exists():
Path(tmp_tape).unlink()
if result.returncode != 0:
die(f"VHS failed (exit {result.returncode})")
if not Path(output_path).exists():
die(f"VHS produced no output at {output_path}")
size = Path(output_path).stat().st_size
size_mb = size / (1024 * 1024)
print(f"Recording: {output_path} ({size_mb:.1f} MB)")
print(json.dumps({"gif_path": str(output_path), "size_mb": round(size_mb, 1)}))
# --- Upload ---
def cmd_upload(args):
file_path = args.file
if not Path(file_path).exists():
die(f"File not found: {file_path}")
if not check_tool("curl"):
die("curl is not installed")
size_mb = file_size_mb(file_path)
print(f"Uploading {file_path} ({size_mb:.1f} MB) to catbox.moe...")
def _try_upload():
try:
result = subprocess.run(
["curl", "-s", "--connect-timeout", "10",
"-F", "reqtype=fileupload",
"-F", f"fileToUpload=@{file_path}", CATBOX_API],
capture_output=True, text=True, timeout=30, check=False,
)
return result.stdout.strip()
except subprocess.TimeoutExpired:
print("ERROR: Upload timed out after 30s", file=sys.stderr)
return ""
url = _try_upload()
if url.startswith("https://"):
print(f"Uploaded: {url}")
print(url)
return
print(f"ERROR: Upload failed. Response: {url[:200]}", file=sys.stderr)
print(f"Local file preserved at: {file_path}", file=sys.stderr)
# Retry once
print("Retrying in 2 seconds...", file=sys.stderr)
time.sleep(2)
url = _try_upload()
if url.startswith("https://"):
print(f"Uploaded (retry): {url}")
print(url)
else:
print("ERROR: Retry also failed. Upload manually or commit to branch.", file=sys.stderr)
sys.exit(1)
# --- Main ---
def main():
parser = argparse.ArgumentParser(
description="Evidence capture pipeline",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Commands:
preflight Check tool availability (JSON)
detect [--repo-root PATH] Detect project type (JSON)
recommend --project-type T ... Recommend capture tier (JSON)
stitch [--duration N] OUTPUT FRAMES Stitch frames into animated GIF
screenshot-reel --output O --text F Render text via silicon + stitch
terminal-recording --output O --tape T Run VHS tape
upload FILE Upload to catbox.moe
""",
)
sub = parser.add_subparsers(dest="command")
# preflight
sub.add_parser("preflight", help="Check tool availability")
# detect
p_detect = sub.add_parser("detect", help="Detect project type")
p_detect.add_argument("--repo-root", help="Repository root (default: cwd)")
# recommend
p_rec = sub.add_parser("recommend", help="Recommend capture tier")
p_rec.add_argument("--project-type", required=True,
choices=["web-app", "cli-tool", "library", "desktop-app", "text-only"])
p_rec.add_argument("--change-type", required=True, choices=["motion", "states"])
p_rec.add_argument("--tools", required=True, help="JSON object of tool availability")
# stitch
p_stitch = sub.add_parser("stitch", help="Stitch frames into animated GIF")
p_stitch.add_argument("--duration", type=float, default=3.0, help="Seconds per frame")
p_stitch.add_argument("output", help="Output GIF path")
p_stitch.add_argument("frames", nargs="+", help="Input frame PNGs")
# screenshot-reel
p_reel = sub.add_parser("screenshot-reel", help="Render text frames via silicon + stitch")
p_reel.add_argument("--output", required=True, help="Output GIF path")
p_reel.add_argument("--duration", type=float, default=2.5, help="Seconds per frame")
p_reel.add_argument("--lang", default="bash", help="Language for syntax highlighting")
p_reel.add_argument("--theme", default="Dracula", help="Silicon theme")
p_reel.add_argument("--background", default="#0d1117", help="Background color for frame border")
p_reel.add_argument("--text", nargs="+", required=True, help="Text files (one per frame)")
# terminal-recording
p_term = sub.add_parser("terminal-recording", help="Run VHS tape file")
p_term.add_argument("--output", help="Output GIF path (overrides tape Output directive)")
p_term.add_argument("--tape", required=True, help="VHS tape file path")
# upload
p_upload = sub.add_parser("upload", help="Upload to catbox.moe")
p_upload.add_argument("file", help="File to upload")
args = parser.parse_args()
if not args.command:
parser.print_help()
sys.exit(1)
dispatch = {
"preflight": cmd_preflight,
"detect": cmd_detect,
"recommend": cmd_recommend,
"stitch": cmd_stitch,
"screenshot-reel": cmd_screenshot_reel,
"terminal-recording": cmd_terminal_recording,
"upload": cmd_upload,
}
dispatch[args.command](args)
if __name__ == "__main__":
main()

View File

@@ -50,35 +50,11 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
## Phase 4: Ship It
1. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
1. **Prepare Evidence Context**
For **any** design changes, new views, or UI modifications, capture and upload screenshots before creating the PR:
Do not invoke `ce-demo-reel` directly in this step. Evidence capture belongs to the PR creation or PR description update flow, where the final PR diff and description context are available.
**Step 1: Start dev server** (if not running)
```bash
bin/dev # Run in background
```
**Step 2: Capture screenshots with agent-browser CLI**
```bash
agent-browser open http://localhost:3000/[route]
agent-browser snapshot -i
agent-browser screenshot output.png
```
See the `agent-browser` skill for detailed usage.
**Step 3: Upload using imgup skill**
```bash
skill: imgup
# Then upload each screenshot:
imgup -h pixhost screenshot.png # pixhost works without API key
# Alternative hosts: catbox, imagebin, beeimg
```
**What to capture:**
- **New screens**: Screenshot of the new UI
- **Modified screens**: Before AND after screenshots
- **Design implementation**: Screenshot showing Figma design match
Note whether the completed work has observable behavior (UI rendering, CLI output, API/library behavior with a runnable example, generated artifacts, or workflow output). The `git-commit-push-pr` skill will ask whether to capture evidence only when evidence is possible.
2. **Update Plan Status**
@@ -94,7 +70,7 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
When providing context for the PR description, include:
- The plan's summary and key decisions
- Testing notes (tests added/modified, manual testing performed)
- Screenshot URLs from step 1 (if applicable)
- Evidence context from step 1, so `git-commit-push-pr` can decide whether to ask about capturing evidence
- Figma design link (if applicable)
- The Post-Deploy Monitoring & Validation section (see Phase 3 Step 4)
@@ -116,11 +92,11 @@ Before creating PR, verify:
- [ ] Linting passes (use linting-agent)
- [ ] Code follows existing patterns
- [ ] Figma designs match implementation (if applicable)
- [ ] Before/after screenshots captured and uploaded (for UI changes)
- [ ] Evidence decision handled by `git-commit-push-pr` when the change has observable behavior
- [ ] Commit messages follow conventional format
- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
- [ ] Code review completed (inline self-review or full `ce:review`)
- [ ] PR description includes summary, testing notes, and screenshots
- [ ] PR description includes summary, testing notes, and evidence when captured
- [ ] PR description includes Compound Engineered badge with accurate model and harness
## Code Review Tiers

View File

@@ -50,35 +50,11 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
## Phase 4: Ship It
1. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
1. **Prepare Evidence Context**
For **any** design changes, new views, or UI modifications, capture and upload screenshots before creating the PR:
Do not invoke `ce-demo-reel` directly in this step. Evidence capture belongs to the PR creation or PR description update flow, where the final PR diff and description context are available.
**Step 1: Start dev server** (if not running)
```bash
bin/dev # Run in background
```
**Step 2: Capture screenshots with agent-browser CLI**
```bash
agent-browser open http://localhost:3000/[route]
agent-browser snapshot -i
agent-browser screenshot output.png
```
See the `agent-browser` skill for detailed usage.
**Step 3: Upload using imgup skill**
```bash
skill: imgup
# Then upload each screenshot:
imgup -h pixhost screenshot.png # pixhost works without API key
# Alternative hosts: catbox, imagebin, beeimg
```
**What to capture:**
- **New screens**: Screenshot of the new UI
- **Modified screens**: Before AND after screenshots
- **Design implementation**: Screenshot showing Figma design match
Note whether the completed work has observable behavior (UI rendering, CLI output, API/library behavior with a runnable example, generated artifacts, or workflow output). The `git-commit-push-pr` skill will ask whether to capture evidence only when evidence is possible.
2. **Update Plan Status**
@@ -94,7 +70,7 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
When providing context for the PR description, include:
- The plan's summary and key decisions
- Testing notes (tests added/modified, manual testing performed)
- Screenshot URLs from step 1 (if applicable)
- Evidence context from step 1, so `git-commit-push-pr` can decide whether to ask about capturing evidence
- Figma design link (if applicable)
- The Post-Deploy Monitoring & Validation section (see Phase 3 Step 4)
@@ -116,11 +92,11 @@ Before creating PR, verify:
- [ ] Linting passes (use linting-agent)
- [ ] Code follows existing patterns
- [ ] Figma designs match implementation (if applicable)
- [ ] Before/after screenshots captured and uploaded (for UI changes)
- [ ] Evidence decision handled by `git-commit-push-pr` when the change has observable behavior
- [ ] Commit messages follow conventional format
- [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
- [ ] Code review completed (inline self-review or full `ce:review`)
- [ ] PR description includes summary, testing notes, and screenshots
- [ ] PR description includes summary, testing notes, and evidence when captured
- [ ] PR description includes Compound Engineered badge with accurate model and harness
## Code Review Tiers

View File

@@ -1,382 +0,0 @@
---
name: feature-video
description: Record a video walkthrough of a feature and add it to the PR description. Use when a PR needs a visual demo for reviewers, when the user asks to demo a feature, create a PR video, record a walkthrough, show what changed visually, or add a video to a pull request.
argument-hint: "[PR number or 'current' or path/to/video.mp4] [optional: base URL, default localhost:3000]"
---
# Feature Video Walkthrough
Record browser interactions demonstrating a feature, stitch screenshots into an MP4 video, upload natively to GitHub, and embed in the PR description as an inline video player.
## Prerequisites
- Local development server running (e.g., `bin/dev`, `npm run dev`, `rails server`)
- `agent-browser` CLI installed (load the `agent-browser` skill for details)
- `ffmpeg` installed (for video conversion)
- `gh` CLI authenticated with push access to the repo
- Git repository on a feature branch (PR optional -- skill can create a draft or record-only)
- One-time GitHub browser auth (see Step 6 auth check)
## Main Tasks
### 1. Parse Arguments & Resolve PR
**Arguments:** $ARGUMENTS
Parse the input:
- First argument: PR number, "current" (defaults to current branch's PR), or path to an existing `.mp4` file (upload-only resume mode)
- Second argument: Base URL (defaults to `http://localhost:3000`)
**Upload-only resume:** If the first argument ends in `.mp4` and the file exists, skip Steps 2-5 and proceed directly to Step 6 using that file. Resolve the PR number from the current branch (`gh pr view --json number -q '.number'`).
If an explicit PR number was provided, verify it exists and use it directly:
```bash
gh pr view [number] --json number -q '.number'
```
If no explicit PR number was provided (or "current" was specified), check if a PR exists for the current branch:
```bash
gh pr view --json number -q '.number'
```
If no PR exists for the current branch, ask the user how to proceed. **Use the platform's blocking question tool** (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini):
```
No PR found for the current branch.
1. Create a draft PR now and continue (recommended)
2. Record video only -- save locally and upload later when a PR exists
3. Cancel
```
If option 1: create a draft PR with a placeholder title derived from the branch name, then continue with the new PR number:
```bash
gh pr create --draft --title "[branch-name-humanized]" --body "Draft PR for video walkthrough"
```
If option 2: set `RECORD_ONLY=true`. Proceed through Steps 2-5 (record and encode), skip Steps 6-7 (upload and PR update), and report the local video path and `[RUN_ID]` at the end.
**Upload-only resume:** To upload a previously recorded video, pass an existing video file path as the first argument (e.g., `/feature-video .context/compound-engineering/feature-video/1711234567/videos/feature-demo.mp4`). When the first argument is a path to an `.mp4` file, skip Steps 2-5 and proceed directly to Step 6 using that file for upload.
### 1b. Verify Required Tools
Before proceeding, check that required CLI tools are installed. Fail early with a clear message rather than failing mid-workflow after screenshots have been recorded:
```bash
command -v ffmpeg
```
```bash
command -v agent-browser
```
```bash
command -v gh
```
If any tool is missing, stop and report which tools need to be installed:
- `ffmpeg`: `brew install ffmpeg` (macOS) or equivalent
- `agent-browser`: load the `agent-browser` skill for installation instructions
- `gh`: `brew install gh` (macOS) or see https://cli.github.com
Do not proceed to Step 2 until all tools are available.
### 2. Gather Feature Context
**If a PR is available**, get PR details and changed files:
```bash
gh pr view [number] --json title,body,files,headRefName -q '.'
```
```bash
gh pr view [number] --json files -q '.files[].path'
```
**If in record-only mode (no PR)**, detect the default branch and derive context from the branch diff. Run both commands in a single block so the variable persists:
```bash
DEFAULT_BRANCH=$(gh repo view --json defaultBranchRef -q '.defaultBranchRef.name') && git diff --name-only "$DEFAULT_BRANCH"...HEAD && git log --oneline "$DEFAULT_BRANCH"...HEAD
```
Map changed files to routes/pages that should be demonstrated. Examine the project's routing configuration (e.g., `routes.rb`, `next.config.js`, `app/` directory structure) to determine which URLs correspond to the changed files.
### 3. Plan the Video Flow
Before recording, create a shot list:
1. **Opening shot**: Homepage or starting point (2-3 seconds)
2. **Navigation**: How user gets to the feature
3. **Feature demonstration**: Core functionality (main focus)
4. **Edge cases**: Error states, validation, etc. (if applicable)
5. **Success state**: Completed action/result
Present the proposed flow to the user for confirmation before recording.
**Use the platform's blocking question tool when available** (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options and wait for the user's reply before proceeding:
```
Proposed Video Flow for PR #[number]: [title]
1. Start at: /[starting-route]
2. Navigate to: /[feature-route]
3. Demonstrate:
- [Action 1]
- [Action 2]
- [Action 3]
4. Show result: [success state]
Estimated duration: ~[X] seconds
1. Start recording
2. Modify the flow (describe changes)
3. Add specific interactions to demonstrate
```
### 4. Record the Walkthrough
Generate a unique run ID (e.g., timestamp) and create per-run output directories. This prevents stale screenshots from prior runs being spliced into the new video.
**Important:** Shell variables do not persist across separate code blocks. After generating the run ID, substitute the concrete value into all subsequent commands in this workflow. For example, if the timestamp is `1711234567`, use that literal value in all paths below -- do not rely on `[RUN_ID]` expanding in later blocks.
```bash
date +%s
```
Use the output as RUN_ID. Create the directories with the concrete value:
```bash
mkdir -p .context/compound-engineering/feature-video/[RUN_ID]/screenshots
mkdir -p .context/compound-engineering/feature-video/[RUN_ID]/videos
```
Execute the planned flow, capturing each step with agent-browser. Number screenshots sequentially for correct frame ordering:
```bash
agent-browser open "[base-url]/[start-route]"
agent-browser wait 2000
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/01-start.png
```
```bash
agent-browser snapshot -i
agent-browser click @e1
agent-browser wait 1000
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/02-navigate.png
```
```bash
agent-browser snapshot -i
agent-browser click @e2
agent-browser wait 1000
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/03-feature.png
```
```bash
agent-browser wait 2000
agent-browser screenshot .context/compound-engineering/feature-video/[RUN_ID]/screenshots/04-result.png
```
### 5. Create Video
Stitch screenshots into an MP4 using the same `[RUN_ID]` from Step 4:
```bash
ffmpeg -y -framerate 0.5 -pattern_type glob -i ".context/compound-engineering/feature-video/[RUN_ID]/screenshots/*.png" \
-c:v libx264 -pix_fmt yuv420p -vf "scale=1280:-2" \
".context/compound-engineering/feature-video/[RUN_ID]/videos/feature-demo.mp4"
```
Notes:
- `-framerate 0.5` = 2 seconds per frame. Adjust for faster/slower playback.
- `-2` in scale ensures height is divisible by 2 (required for H.264).
### 6. Authenticate & Upload to GitHub
Upload produces a `user-attachments/assets/` URL that GitHub renders as a native inline video player -- the same result as pasting a video into the PR editor manually.
The approach: close any existing agent-browser session, start a Chrome-engine session with saved GitHub auth, navigate to the PR page, set the video file on the comment form's hidden file input, wait for GitHub to process the upload, extract the resulting URL, then clear the textarea without submitting.
#### Check for existing session
First, check if a saved GitHub session already exists:
```bash
agent-browser close
agent-browser --engine chrome --session-name github open https://github.com/settings/profile
agent-browser get title
```
If the page title contains the user's GitHub username or "Profile", the session is still valid -- skip to "Upload the video" below. If it redirects to the login page, the session has expired or was never created -- proceed to "Auth setup".
#### Auth setup (one-time)
Establish an authenticated GitHub session. This only needs to happen once -- session cookies persist across runs via the `--session-name` flag.
Close the current session and open the GitHub login page in a headed Chrome window:
```bash
agent-browser close
agent-browser --engine chrome --headed --session-name github open https://github.com/login
```
The user must log in manually in the browser window (handles 2FA, SSO, OAuth -- any login method). **Use the platform's blocking question tool** (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present the message and wait for the user's reply before proceeding:
```
GitHub login required for video upload.
A Chrome window has opened to github.com/login. Please log in manually
(this handles 2FA/SSO/OAuth automatically). Reply when done.
```
After login, verify the session works:
```bash
agent-browser open https://github.com/settings/profile
```
If the profile page loads, auth is confirmed. The `github` session is now saved and reusable.
#### Upload the video
Navigate to the PR page and scroll to the comment form:
```bash
agent-browser open "https://github.com/[owner]/[repo]/pull/[number]"
agent-browser scroll down 5000
```
Save any existing textarea content before uploading (the comment box may contain an unsent draft):
```bash
agent-browser eval "document.getElementById('new_comment_field').value"
```
Store this value as `SAVED_TEXTAREA`. If non-empty, it will be restored after extracting the upload URL.
Upload the video via the hidden file input. Use the caller-provided `.mp4` path if in upload-only resume mode, otherwise use the current run's encoded video:
```bash
agent-browser upload '#fc-new_comment_field' [VIDEO_FILE_PATH]
```
Where `[VIDEO_FILE_PATH]` is either:
- The `.mp4` path passed as the first argument (upload-only resume mode)
- `.context/compound-engineering/feature-video/[RUN_ID]/videos/feature-demo.mp4` (normal recording flow)
Wait for GitHub to process the upload (typically 3-5 seconds), then read the textarea value:
```bash
agent-browser wait 5000
agent-browser eval "document.getElementById('new_comment_field').value"
```
**Validate the extracted URL.** The value must contain `user-attachments/assets/` to confirm a successful native upload. If the textarea is empty, contains only placeholder text, or the URL does not match, do not proceed to Step 7. Instead:
1. Check `agent-browser get url` -- if it shows `github.com/login`, the session expired. Re-run auth setup.
2. If still on the PR page, wait an additional 5 seconds and re-read the textarea (GitHub processing can be slow).
3. If validation still fails after retry, report the failure and the local video path so the user can upload manually.
Restore the original textarea content (or clear if it was empty). A JSON-encoded string is also a valid JavaScript string literal, so assign it directly without `JSON.parse`:
```bash
agent-browser eval "const ta = document.getElementById('new_comment_field'); ta.value = [SAVED_TEXTAREA_AS_JS_STRING]; ta.dispatchEvent(new Event('input', { bubbles: true }))"
```
To prepare the value: take the SAVED_TEXTAREA string and produce a JS string literal from it -- escape backslashes, double quotes, and newlines (e.g., `"text with \"quotes\" and\nnewlines"`). If SAVED_TEXTAREA was empty, use `""`. The result is embedded directly as the right-hand side of the assignment -- no `JSON.parse` call needed.
### 7. Update PR Description
Get the current PR body:
```bash
gh pr view [number] --json body -q '.body'
```
Append a Demo section (or replace an existing one). The video URL renders as an inline player when placed on its own line:
```markdown
## Demo
https://github.com/user-attachments/assets/[uuid]
*Automated video walkthrough*
```
Update the PR:
```bash
gh pr edit [number] --body "[updated body with demo section]"
```
### 8. Cleanup
Ask the user before removing temporary files. If confirmed, clean up only the current run's scratch directory (other runs may still be in progress or awaiting upload).
**If the video was successfully uploaded**, remove the entire run directory:
```bash
rm -r .context/compound-engineering/feature-video/[RUN_ID]
```
**If in record-only mode or upload failed**, remove only the screenshots but preserve the video so the user can upload later:
```bash
rm -r .context/compound-engineering/feature-video/[RUN_ID]/screenshots
```
Present a completion summary:
```
Feature Video Complete
PR: #[number] - [title]
Video: [VIDEO_URL]
Shots captured:
1. [description]
2. [description]
3. [description]
4. [description]
PR description updated with demo section.
```
## Usage Examples
```bash
# Record video for current branch's PR
/feature-video
# Record video for specific PR
/feature-video 847
# Record with custom base URL
/feature-video 847 http://localhost:5000
# Record for staging environment
/feature-video current https://staging.example.com
```
## Tips
- Keep it short: 10-30 seconds is ideal for PR demos
- Focus on the change: don't include unrelated UI
- Show before/after: if fixing a bug, show the broken state first (if possible)
- The `--session-name github` session expires when GitHub invalidates the cookies (typically weeks). If upload fails with a login redirect, re-run the auth setup.
- GitHub DOM selectors (`#fc-new_comment_field`, `#new_comment_field`) may change if GitHub updates its UI. If the upload silently fails, inspect the PR page for updated selectors.
## Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| `ffmpeg: command not found` | ffmpeg not installed | Install via `brew install ffmpeg` (macOS) or equivalent |
| `agent-browser: command not found` | agent-browser not installed | Load the `agent-browser` skill for installation instructions |
| Textarea empty after upload wait | Session expired, or GitHub processing slow | Check session validity (Step 6 auth check). If valid, increase wait time and retry. |
| Textarea empty, URL is `github.com/login` | Session expired | Re-run auth setup (Step 6) |
| `gh pr view` fails | No PR for current branch | Step 1 handles this -- choose to create a draft PR or record-only mode |
| Video file too large for upload | Exceeds GitHub's 10MB (free) or 100MB (paid) limit | Re-encode: lower framerate (`-framerate 0.33`), reduce resolution (`scale=960:-2`), or increase CRF (`-crf 28`) |
| Upload URL does not contain `user-attachments/assets/` | Wrong upload method or GitHub change | Verify the file input selector is still correct by inspecting the PR page |

View File

@@ -71,9 +71,16 @@ Read the current PR description:
gh pr view --json body --jq '.body'
```
Follow the "Detect the base branch and remote" and "Gather the branch scope" sections of Step 6 to get the full branch diff. Use the PR found in DU-2 as the existing PR for base branch detection. Classify commits per the "Classify commits before writing" section -- this is especially important for description updates, where the recent commits that prompted the update are often fix-up work (code review fixes, lint fixes) rather than feature work. Then write a new description following the writing principles in Step 6, driven by the feature commits and the final diff. If the user provided a focus, incorporate it into the description alongside the branch diff context.
Build the updated description in this order:
Compare the new description against the current one and summarize the substantial changes for the user (e.g., "Added coverage of the new caching layer, updated test plan, removed outdated migration notes"). If the user provided a focus, confirm it was addressed. Ask the user to confirm before applying. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, present the summary and wait for the user's reply.
1. **Get the full branch diff** -- follow "Detect the base branch and remote" and "Gather the branch scope" in Step 6. Use the PR found in DU-2 as the existing PR for base branch detection.
2. **Classify commits** -- follow "Classify commits before writing" in Step 6. This matters especially for description updates, where the recent commits that prompted the update are often fix-up work (code review fixes, lint fixes) rather than feature work.
3. **Decide on evidence** -- check if the current PR description already contains evidence (a `## Demo` or `## Screenshots` section with image embeds). If evidence exists, preserve it unless the user's focus specifically asks to refresh or remove it. If no evidence exists, follow "Evidence for PR descriptions" in Step 6. Description-only updates may be specifically intended to add evidence.
4. **Write the new description** -- follow the writing principles in Step 6, driven by feature commits, final diff, and evidence decision.
- If the user provided a focus, incorporate it alongside the branch diff context.
5. **Compare and confirm** -- summarize the substantial changes vs the current description (e.g., "Added coverage of the new caching layer, updated test plan, removed outdated migration notes").
- If the user provided a focus, confirm it was addressed.
- Ask the user to confirm before applying. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, present the summary and wait for the user's reply.
If confirmed, apply:
@@ -107,19 +114,27 @@ If the current branch from the context above is empty, the repository is in deta
- If the user agrees, derive a descriptive branch name from the change content, create it with `git checkout -b <branch-name>`, then run `git branch --show-current` again and use that result as the current branch name for the rest of the workflow.
- If the user declines, stop.
If the git status from the context above shows a clean working tree (no staged, modified, or untracked files), check whether there are unpushed commits or a missing PR before stopping. The current branch and existing PR check are already available from the context above. Additionally:
If the git status from the context above shows a clean working tree (no staged, modified, or untracked files), determine the next action based on upstream state and PR status. The current branch and existing PR check are already available from the context above. Additionally:
1. Run `git rev-parse --abbrev-ref --symbolic-full-name @{u}` to check whether an upstream is configured.
2. If the command succeeds, run `git log <upstream>..HEAD --oneline` using the upstream name from the previous command.
- If the current branch is `main`, `master`, or the resolved default branch from Step 1 and there is **no upstream** or there are **unpushed commits**, explain that pushing now would use the default branch directly. Ask whether to create a feature branch first. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, present the options and wait for the user's reply.
- If the user agrees, derive a descriptive branch name from the change content, create it with `git checkout -b <branch-name>`, then continue from Step 5 (push).
- If the user declines, report that this workflow cannot open a PR from the default branch directly and stop.
- If there is **no upstream**, treat the branch as needing its first push. Skip Step 4 (commit) and continue from Step 5 (push).
- If there are **unpushed commits**, skip Step 4 (commit) and continue from Step 5 (push).
- If all commits are pushed but **no open PR exists** and the current branch is `main`, `master`, or the resolved default branch from Step 1, report that there is no feature branch work to open as a PR and stop.
- If all commits are pushed but **no open PR exists**, skip Steps 4-5 and continue from Step 6 (write the PR description) and Step 7 (create the PR).
- If all commits are pushed **and an open PR exists**, report that and stop -- there is nothing to do.
Then follow this decision tree:
- **On default branch** (`main`, `master`, or the resolved default branch) with no upstream or unpushed commits:
- Ask whether to create a feature branch first (pushing the default branch directly is not supported by this workflow). Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, present the options and wait for the user's reply.
- If yes -> create branch with `git checkout -b <branch-name>`, continue from Step 5 (push)
- If no -> stop
- **On default branch**, all commits pushed, no open PR:
- Report there is no feature branch work to open as a PR. Stop.
- **No upstream configured** (feature branch, never pushed):
- Skip Step 4 (commit), continue from Step 5 (push)
- **Unpushed commits exist** (feature branch, upstream configured):
- Skip Step 4 (commit), continue from Step 5 (push)
- **All commits pushed, no open PR** (feature branch):
- Skip Steps 4-5, continue from Step 6 (PR description) and Step 7 (create PR)
- **All commits pushed, open PR exists**:
- Report that everything is up to date. Stop.
### Step 2: Determine conventions
@@ -204,6 +219,27 @@ MERGE_BASE=$(git merge-base <base-remote>/<base-branch> HEAD) && echo "MERGE_BAS
Use the full branch diff and commit list as the basis for the PR description -- not the working-tree diff from Step 1.
#### Evidence for PR descriptions
Decide whether evidence capture is possible from the full branch diff before writing the PR description.
**Evidence is possible** when the final diff changes observable product behavior that can be demonstrated from the workspace: UI rendering or interactions, CLI commands and output, API/library behavior with runnable example code, generated artifacts, or workflow behavior with visible output.
**Evidence is not possible** for docs-only, markdown-only, changelog-only, release metadata, CI/config-only, test-only, or pure internal refactors with no observable output change. It is also not possible when the behavior requires unavailable credentials, paid/cloud services, bot tokens, deploy-only infrastructure, or hardware the user has not provided. In those cases, do not ask about capturing evidence; omit the evidence section and, if relevant, mention the skip reason in the final user report.
When evidence is possible, ask whether to include it in the PR description. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, present the options and wait for the user's reply.
**Question:** "This PR has observable behavior. Capture evidence for the PR description?"
**Options:**
1. **Capture evidence now**: load the `ce-demo-reel` skill with a target description as the argument (e.g., "the new settings page" or "CLI output of the migrate command"). Infer the target from the branch diff. ce-demo-reel returns `Tier`, `Description`, and `URL`. Use the URL and description to build a `## Demo` or `## Screenshots` section (browser-reel/terminal-recording/screenshot-reel use "Demo", static uses "Screenshots").
2. **Use existing evidence**: ask the user for the URL or markdown embed, then include it in the PR body.
3. **Skip evidence**: write the PR description without an evidence section.
If the user chooses capture, check ce-demo-reel's output for failure: `Tier: skipped` or `URL: "none"` means no evidence was captured. Do not add a placeholder section. Summarize the skip reason in the final user report.
Place the evidence embed before the Compound Engineering badge, typically after the summary or within the changes section. Do not label test output as "Demo" or "Screenshots".
#### Classify commits before writing
Before writing the description, scan the commit list and classify each commit:
@@ -213,10 +249,25 @@ Before writing the description, scan the commit list and classify each commit:
Only feature commits inform the description. Fix-up commits are noise -- they describe the iteration process, not the end result. The full diff already includes whatever the fix-up commits changed, so their intent is captured without narrating them. When sizing and writing the description, mentally subtract fix-up commits: a branch with 12 commits but 9 fix-ups is a 3-commit PR in terms of description weight.
This is the most important step. The description must be **adaptive** -- its depth should match the complexity of the change. A one-line bugfix does not need a table of performance results. A large architectural change should not be a bullet list.
#### Frame the narrative before sizing
After classifying commits, articulate the PR's narrative frame:
1. **Before**: What was broken, limited, or impossible? (One sentence.)
2. **After**: What's now possible or improved? (One sentence.)
3. **Scope rationale** (only if the PR touches 2+ separable-looking concerns): Why do these ship together? (One sentence.)
This frame becomes the opening of the description. For small+simple PRs (the sizing table routes to 1-2 sentences), the "after" sentence alone may be the entire description -- that's fine.
Example:
- Before: "CLI and library PRs got no visual evidence because the capture flow assumed a web app with a dev server."
- After: "Evidence capture now works for any project type -- CLI tools, libraries, desktop apps."
- Scope: "Shipped with git-commit-push-pr restructuring because ce-demo-reel integrates into the PR description flow."
#### Sizing the change
The description must be **adaptive** -- its depth should match the complexity of the change. A one-line bugfix does not need a table of performance results. A large architectural change should not be a bullet list.
Assess the PR along two axes before writing, based on the full branch diff:
- **Size**: How many files changed? How large is the diff?
@@ -228,15 +279,26 @@ Use this to select the right description depth:
|---|---|
| Small + simple (typo, config, dep bump) | 1-2 sentences, no headers. Total body under ~300 characters. |
| Small + non-trivial (targeted bugfix, behavioral change) | Short "Problem / Fix" narrative, ~3-5 sentences. Enough for a reviewer to understand *why* without reading the diff. No headers needed unless there are two distinct concerns. |
| Medium feature or refactor | Summary paragraph, then a section explaining what changed and why. Call out design decisions. |
| Medium feature or refactor | Open with the narrative frame (before/after/scope), then a section explaining what changed and why. Call out design decisions. |
| Large or architecturally significant | Full narrative: problem context, approach chosen (and why), key decisions, migration notes or rollback considerations if relevant. |
| Performance improvement | Include before/after measurements if available. A markdown table is effective here. |
**Brevity matters for small changes.** A 3-line bugfix with a 20-line PR description signals the author didn't calibrate. Match the weight of the description to the weight of the change. When in doubt, shorter is better -- reviewers can read the diff.
#### Writing voice
If the user has documented writing style preferences (in CLAUDE.md, project instructions, or prior feedback), follow those. Otherwise, apply these defaults:
- Use active voice throughout. No em dashes or double-hyphen (`--`) substitutes. Use periods, commas, colons, or parentheses instead.
- Vary sentence length deliberately -- mix short punchy sentences with longer ones. Never write three sentences of similar length in a row.
- Do not make a claim and then immediately explain it in the next sentence. Trust the reader.
- Write in plain English. If there's a simpler word, that's preferable. Never use business jargon when a common word will do. Technical jargon is fine when it's the clearest term for a developer audience.
- No filler phrases: "it's worth noting", "importantly", "essentially", "in order to", "leverage", "utilize."
- Use digits for numbers ("3 files", "7 subcommands"), not words ("three files", "seven subcommands").
#### Writing principles
- **Lead with value**: The first sentence should tell the reviewer *why this PR exists*, not *what files changed*. "Fixes timeout errors during batch exports" beats "Updated export_handler.py and config.yaml".
- **Lead with value**: The first sentence should tell the reviewer *what's now possible or fixed*, not what was moved around. "Fixes timeout errors during batch exports" beats "Updated export_handler.py and config.yaml." The subtler failure is leading with the mechanism: "Replace the hardcoded capture block with a tiered skill" is technically purposeful but still doesn't tell the reviewer what changed for users. "Evidence capture now works for CLI tools and libraries, not just web apps" does.
- **No orphaned opening paragraphs**: If the description uses `##` section headings anywhere, the opening summary must also be under a heading (e.g., `## Summary`). An untitled paragraph followed by titled sections looks like a missing heading. For short descriptions with no sections, a bare paragraph is fine.
- **Describe the net result, not the journey**: The PR description is about the end state -- what changed and why. Do not include work-product details like bugs found and fixed during development, intermediate failures, debugging steps, iteration history, or refactoring done along the way. Those are part of getting the work done, not part of the result. If a bug fix happened during development, the fix is already in the diff -- mentioning it in the description implies it's a separate concern the reviewer should evaluate, when really it's just part of the final implementation. Exception: include process details only when they are critical for a reviewer to understand a design choice (e.g., "tried approach X first but it caused Y, so went with Z instead").
- **When commits conflict, trust the final diff**: The commit list is supporting context, not the source of truth for the final PR description. If commit messages describe intermediate steps that were later revised or reverted (for example, "switch to gh pr list" followed by a later change back to `gh pr view`), describe the end state shown by the full branch diff. Do not narrate contradictory commit history as if all of it shipped.
@@ -252,7 +314,7 @@ Use this to select the right description depth:
```
- **No empty sections**: If a section (like "Breaking Changes" or "Migration Guide") doesn't apply, omit it entirely. Do not include it with "N/A" or "None".
- **Test plan -- only when it adds value**: Include a test plan section when the testing approach is non-obvious: edge cases the reviewer might not think of, verification steps for behavior that's hard to see in the diff, or scenarios that require specific setup. Omit it for straightforward changes where the tests are self-explanatory or where "run the tests" is the only useful guidance. A test plan for "verify the typo is fixed" is noise.
- **Test plan -- only when it adds value**: Include a test plan section when the testing approach is non-obvious: edge cases the reviewer might not think of, verification steps for behavior that's hard to see in the diff, or scenarios that require specific setup. Omit it for straightforward changes where the tests are self-explanatory or where "run the tests" is the only useful guidance. A test plan for "verify the typo is fixed" is noise. When the branch adds new test files, name them with what they cover -- "`tests/capture-evidence.test.ts` -- 8 tests covering arg validation and ffmpeg stitch integration" is more useful than "bun test passes."
#### Visual communication
@@ -363,7 +425,12 @@ Keep the PR title under 72 characters. The title follows the same convention as
The new commits are already on the PR from the push in Step 5. Report the PR URL, then ask the user whether they want the PR description updated to reflect the new changes. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). If no question tool is available, present the option and wait for the user's reply before proceeding.
- If **yes** -- write a new description following the same principles in Step 6 (size the full PR, not just the new commits). Classify commits per "Classify commits before writing" -- the new commits since the last push are often fix-up work (code review fixes, lint fixes) and should not appear as distinct items in the updated description. Describe the PR's net result as if writing it fresh. Include the Compound Engineering badge unless one is already present in the existing description. Apply it:
- If **yes**:
1. Classify commits per "Classify commits before writing" -- the new commits since the last push are often fix-up work (code review fixes, lint fixes) and should not appear as distinct items
2. Size the full PR (not just the new commits) using the sizing table in Step 6
3. Write the description as if fresh, following Step 6's writing principles -- describe the PR's net result
4. Include the Compound Engineering badge unless one is already present
5. Apply:
```bash
gh pr edit --body "$(cat <<'EOF'

View File

@@ -25,8 +25,6 @@ CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any required s
6. `/compound-engineering:test-browser`
7. `/compound-engineering:feature-video`
8. Output `<promise>DONE</promise>` when video is in PR
7. Output `<promise>DONE</promise>` when complete
Start with step 2 now (or step 1 if ralph-loop is available). Remember: plan FIRST, then work. Never skip the plan.

402
tests/ce-demo-reel.test.ts Normal file
View File

@@ -0,0 +1,402 @@
import { describe, expect, test, beforeAll, afterAll } from "bun:test"
import { promises as fs } from "fs"
import path from "path"
import os from "os"
const SCRIPT = path.join(
process.cwd(),
"plugins",
"compound-engineering",
"skills",
"ce-demo-reel",
"scripts",
"capture-demo.py",
)
async function run(
...args: string[]
): Promise<{ exitCode: number; stdout: string; stderr: string }> {
const proc = Bun.spawn(["python3", SCRIPT, ...args], {
stdout: "pipe",
stderr: "pipe",
})
const exitCode = await proc.exited
const stdout = await new Response(proc.stdout).text()
const stderr = await new Response(proc.stderr).text()
return { exitCode, stdout, stderr }
}
/** Create a minimal valid PNG (1x1 pixel, solid color). */
function createTestPng(color: [number, number, number]): Buffer {
const [r, g, b] = color
// Raw RGB pixel data: 1 row, filter byte 0, then RGB
const rawData = Buffer.from([0, r, g, b])
// Compress with zlib
const compressed = Bun.deflateSync(rawData, { level: 0 })
const cmf = 0x78
const flg = 0x01
let s1 = 1
let s2 = 0
for (const byte of rawData) {
s1 = (s1 + byte) % 65521
s2 = (s2 + s1) % 65521
}
const adler32 = Buffer.alloc(4)
adler32.writeUInt32BE((s2 << 16) | s1)
const zlibData = Buffer.concat([Buffer.from([cmf, flg]), compressed, adler32])
const signature = Buffer.from([137, 80, 78, 71, 13, 10, 26, 10])
function chunk(type: string, data: Buffer): Buffer {
const len = Buffer.alloc(4)
len.writeUInt32BE(data.length)
const typeB = Buffer.from(type, "ascii")
const body = Buffer.concat([typeB, data])
const crc = crc32(body)
const crcB = Buffer.alloc(4)
crcB.writeUInt32BE(crc >>> 0)
return Buffer.concat([len, body, crcB])
}
// IHDR: 1x1, 8-bit RGB (color type 2)
const ihdr = Buffer.alloc(13)
ihdr.writeUInt32BE(1, 0)
ihdr.writeUInt32BE(1, 4)
ihdr[8] = 8 // bit depth
ihdr[9] = 2 // color type: RGB
ihdr[10] = 0
ihdr[11] = 0
ihdr[12] = 0
return Buffer.concat([
signature,
chunk("IHDR", ihdr),
chunk("IDAT", zlibData),
chunk("IEND", Buffer.alloc(0)),
])
}
function crc32(data: Buffer): number {
let crc = 0xffffffff
for (const byte of data) {
crc ^= byte
for (let j = 0; j < 8; j++) {
crc = crc & 1 ? (crc >>> 1) ^ 0xedb88320 : crc >>> 1
}
}
return (crc ^ 0xffffffff) >>> 0
}
// --- Preflight ---
describe("capture-evidence.py", () => {
describe("preflight", () => {
test("returns JSON with tool availability", async () => {
const { exitCode, stdout } = await run("preflight")
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result).toHaveProperty("agent_browser")
expect(result).toHaveProperty("vhs")
expect(result).toHaveProperty("silicon")
expect(result).toHaveProperty("ffmpeg")
expect(result).toHaveProperty("ffprobe")
expect(typeof result.ffmpeg).toBe("boolean")
})
})
// --- Detect ---
describe("detect", () => {
let tmpDir: string
beforeAll(async () => {
tmpDir = await fs.mkdtemp(path.join(os.tmpdir(), "evidence-detect-"))
})
afterAll(async () => {
if (tmpDir) await fs.rm(tmpDir, { recursive: true, force: true })
})
test("detects web-app from package.json with react", async () => {
const dir = path.join(tmpDir, "webapp")
await fs.mkdir(dir)
await fs.writeFile(
path.join(dir, "package.json"),
JSON.stringify({ dependencies: { react: "^18.0.0" } }),
)
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("web-app")
})
test("detects cli-tool from package.json with bin field", async () => {
const dir = path.join(tmpDir, "clitool")
await fs.mkdir(dir)
await fs.writeFile(
path.join(dir, "package.json"),
JSON.stringify({ bin: { mycli: "./cli.js" } }),
)
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("cli-tool")
})
test("detects desktop-app from electron dependency", async () => {
const dir = path.join(tmpDir, "electron")
await fs.mkdir(dir)
await fs.writeFile(
path.join(dir, "package.json"),
JSON.stringify({ devDependencies: { electron: "^28.0.0", react: "^18.0.0" } }),
)
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("desktop-app")
})
test("detects library when manifest exists but no web/CLI signals", async () => {
const dir = path.join(tmpDir, "lib")
await fs.mkdir(dir)
await fs.writeFile(
path.join(dir, "package.json"),
JSON.stringify({ name: "my-utils", version: "1.0.0" }),
)
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("library")
})
test("detects text-only when no manifest exists", async () => {
const dir = path.join(tmpDir, "textonly")
await fs.mkdir(dir)
await fs.writeFile(path.join(dir, "README.md"), "# Hello")
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("text-only")
})
test("electron takes priority over web-app", async () => {
const dir = path.join(tmpDir, "electron-react")
await fs.mkdir(dir)
await fs.writeFile(
path.join(dir, "package.json"),
JSON.stringify({ dependencies: { react: "^18.0.0" }, devDependencies: { electron: "^28.0.0" } }),
)
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("desktop-app")
})
test("detects web-app from Gemfile with rails", async () => {
const dir = path.join(tmpDir, "rails")
await fs.mkdir(dir)
await fs.writeFile(path.join(dir, "Gemfile"), 'gem "rails", "~> 7.0"')
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("web-app")
})
test("detects cli-tool from go.mod with cmd/ directory", async () => {
const dir = path.join(tmpDir, "gocli")
await fs.mkdir(dir)
await fs.writeFile(path.join(dir, "go.mod"), "module example.com/mycli\n\ngo 1.21")
await fs.mkdir(path.join(dir, "cmd"))
const { exitCode, stdout } = await run("detect", "--repo-root", dir)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.type).toBe("cli-tool")
})
})
// --- Recommend ---
describe("recommend", () => {
const allTools = '{"agent_browser":true,"vhs":true,"silicon":true,"ffmpeg":true,"ffprobe":true}'
const noTools = '{"agent_browser":false,"vhs":false,"silicon":false,"ffmpeg":false,"ffprobe":false}'
test("web-app with browser + ffmpeg recommends browser-reel", async () => {
const { exitCode, stdout } = await run(
"recommend", "--project-type", "web-app", "--change-type", "states", "--tools", allTools,
)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.recommended).toBe("browser-reel")
})
test("cli-tool with motion + vhs recommends terminal-recording", async () => {
const { exitCode, stdout } = await run(
"recommend", "--project-type", "cli-tool", "--change-type", "motion", "--tools", allTools,
)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.recommended).toBe("terminal-recording")
})
test("cli-tool with states + silicon recommends screenshot-reel", async () => {
const tools = '{"agent_browser":false,"vhs":false,"silicon":true,"ffmpeg":true,"ffprobe":true}'
const { exitCode, stdout } = await run(
"recommend", "--project-type", "cli-tool", "--change-type", "states", "--tools", tools,
)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.recommended).toBe("screenshot-reel")
})
test("library always recommends static-screenshots", async () => {
const { exitCode, stdout } = await run(
"recommend", "--project-type", "library", "--change-type", "states", "--tools", allTools,
)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.recommended).toBe("static-screenshots")
})
test("no tools always falls back to static-screenshots", async () => {
const { exitCode, stdout } = await run(
"recommend", "--project-type", "cli-tool", "--change-type", "motion", "--tools", noTools,
)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.recommended).toBe("static-screenshots")
})
test("available list includes only tiers with tools present", async () => {
const tools = '{"agent_browser":false,"vhs":true,"silicon":false,"ffmpeg":true,"ffprobe":true}'
const { exitCode, stdout } = await run(
"recommend", "--project-type", "cli-tool", "--change-type", "motion", "--tools", tools,
)
expect(exitCode).toBe(0)
const result = JSON.parse(stdout.trim())
expect(result.available).toContain("terminal-recording")
expect(result.available).toContain("static-screenshots")
expect(result.available).not.toContain("browser-reel")
expect(result.available).not.toContain("screenshot-reel")
})
})
// --- Stitch arg validation ---
describe("stitch arg validation", () => {
test("stitch with no args fails", async () => {
const { exitCode, stderr } = await run("stitch")
expect(exitCode).not.toBe(0)
})
test("stitch fails on missing frame file", async () => {
const { exitCode, stderr } = await run(
"stitch", "out.gif", "/tmp/nonexistent-frame-abc123.png",
)
expect(exitCode).toBe(1)
expect(stderr).toContain("Frame not found")
})
test("upload fails on missing file", async () => {
const { exitCode, stderr } = await run(
"upload", "/tmp/nonexistent-file-abc123.gif",
)
expect(exitCode).toBe(1)
expect(stderr).toContain("File not found")
})
})
// --- Stitch integration (requires ffmpeg) ---
describe("stitch integration", () => {
let tmpDir: string
let hasFFmpeg: boolean
beforeAll(async () => {
const proc = Bun.spawn(["which", "ffmpeg"], {
stdout: "pipe",
stderr: "pipe",
})
hasFFmpeg = (await proc.exited) === 0
if (!hasFFmpeg) return
tmpDir = await fs.mkdtemp(path.join(os.tmpdir(), "evidence-test-"))
const red = createTestPng([255, 0, 0])
const green = createTestPng([0, 255, 0])
const blue = createTestPng([0, 0, 255])
await fs.writeFile(path.join(tmpDir, "frame1.png"), red)
await fs.writeFile(path.join(tmpDir, "frame2.png"), green)
await fs.writeFile(path.join(tmpDir, "frame3.png"), blue)
})
afterAll(async () => {
if (tmpDir) await fs.rm(tmpDir, { recursive: true, force: true })
})
test("stitches frames into a GIF", async () => {
if (!hasFFmpeg) {
console.log("Skipping: ffmpeg not available")
return
}
const output = path.join(tmpDir, "output.gif")
const { exitCode, stdout } = await run(
"stitch", "--duration", "0.5", output,
path.join(tmpDir, "frame1.png"),
path.join(tmpDir, "frame2.png"),
)
expect(exitCode).toBe(0)
expect(stdout).toContain("Stitching 2 frames")
expect(stdout).toContain("Created:")
const stat = await fs.stat(output)
expect(stat.size).toBeGreaterThan(0)
const header = Buffer.alloc(6)
const fh = await fs.open(output, "r")
await fh.read(header, 0, 6)
await fh.close()
expect(header.toString("ascii").startsWith("GIF")).toBe(true)
})
test("stitches 3 frames into a GIF", async () => {
if (!hasFFmpeg) {
console.log("Skipping: ffmpeg not available")
return
}
const output = path.join(tmpDir, "output3.gif")
const { exitCode, stdout } = await run(
"stitch", "--duration", "0.5", output,
path.join(tmpDir, "frame1.png"),
path.join(tmpDir, "frame2.png"),
path.join(tmpDir, "frame3.png"),
)
expect(exitCode).toBe(0)
expect(stdout).toContain("Stitching 3 frames")
})
test("default duration is used when --duration not specified", async () => {
if (!hasFFmpeg) {
console.log("Skipping: ffmpeg not available")
return
}
const output = path.join(tmpDir, "output-default-dur.gif")
const { exitCode, stdout } = await run(
"stitch", output,
path.join(tmpDir, "frame1.png"),
path.join(tmpDir, "frame2.png"),
)
expect(exitCode).toBe(0)
expect(stdout).toContain("Created:")
})
})
})