feat: rewrite frontend-design skill with layered architecture and visual verification (#343)

This commit is contained in:
Trevin Chow
2026-03-22 18:55:58 -07:00
committed by GitHub
parent 341c379168
commit 423e692726
4 changed files with 634 additions and 28 deletions

View File

@@ -0,0 +1,187 @@
# Frontend Design Skill Improvement
**Date:** 2026-03-22
**Status:** Design approved, pending implementation plan
**Scope:** Rewrite `frontend-design` skill + surgical addition to `ce:work-beta`
## Context
The current `frontend-design` skill (43 lines) is a brief aesthetic manifesto forked from the Anthropic official skill. It emphasizes bold design and avoiding AI slop but lacks practical structure, concrete constraints, context-specific guidance, and any verification mechanism.
Two external sources informed this redesign:
- **Anthropic's official frontend-design skill** -- nearly identical to ours, same gaps
- **OpenAI's frontend skill** (from their "Designing Delightful Frontends with GPT-5.4" article, March 2026) -- dramatically more comprehensive with composition rules, context modules, card philosophy, copy guidelines, motion specifics, and litmus checks
Additionally, the beta workflow (`ce:plan-beta` -> `deepen-plan-beta` -> `ce:work-beta`) has no mechanism to invoke the frontend-design skill. The old `deepen-plan` discovered and applied it dynamically; `deepen-plan-beta` uses deterministic agent mapping and skips skill discovery entirely. The skill is effectively orphaned in the beta workflow.
## Design Decisions
### Authority Hierarchy
Every rule in the skill is a default, not a mandate:
1. **Existing design system / codebase patterns** -- highest priority, always respected
2. **User's explicit instructions** -- override skill defaults
3. **Skill defaults** -- only fully apply in greenfield or when user asks for design guidance
This addresses a key weakness in OpenAI's approach: their rules read as absolutes ("No cards by default", "Full-bleed hero only") without escape hatches. Users who want cards in the hero shouldn't fight their own tooling.
### Layered Architecture
The skill is structured as layers:
- **Layer 0: Context Detection** -- examine codebase for existing design signals before doing anything. Short-circuits opinionated guidance when established patterns exist.
- **Layer 1: Pre-Build Planning** -- visual thesis + content plan + interaction plan (3 short statements). Adapts to greenfield vs existing codebase.
- **Layer 2: Design Guidance Core** -- always-applicable principles (typography, color, composition, motion, accessibility, imagery). All yield to existing systems.
- **Context Modules** -- agent selects one based on what's being built:
- Module A: Landing pages & marketing (greenfield)
- Module B: Apps & dashboards (greenfield)
- Module C: Components & features (default when working inside an existing app, regardless of what's being built)
### Layer 0: Detection Signals (Concrete Checklist)
The agent looks for these specific signals when classifying the codebase:
- **Design tokens / CSS variables**: `--color-*`, `--spacing-*`, `--font-*` custom properties, theme files
- **Component libraries**: shadcn/ui, Material UI, Chakra, Ant Design, Radix, or project-specific component directories
- **CSS frameworks**: `tailwind.config.*`, `styled-components` theme, Bootstrap imports, CSS modules with consistent naming
- **Typography**: Font imports in HTML/CSS, `@font-face` declarations, Google Fonts links
- **Color palette**: Defined color scales, brand color files, design token exports
- **Animation libraries**: Framer Motion, GSAP, anime.js, Motion One, Vue Transition imports
- **Spacing / layout patterns**: Consistent spacing scale usage, grid systems, layout components
**Mode classification:**
- **Existing system**: 4+ signals detected across multiple categories. Defer to it.
- **Partial system**: 1-3 signals detected. Apply skill defaults where no convention was detected; yield to detected conventions where they exist.
- **Greenfield**: No signals detected. Full skill guidance applies.
- **Ambiguous**: Signals are contradictory or unclear. Ask the user.
### Interaction Method for User Questions
When Layer 0 needs to ask the user (ambiguous detection), use the platform's blocking question tool:
- Claude Code: `AskUserQuestion`
- Codex: `request_user_input`
- Gemini CLI: `ask_user`
- Fallback: If no question tool is available, assume "partial" mode and proceed conservatively.
### Where We Improve Beyond OpenAI
1. **Accessibility as a first-class concern** -- OpenAI's skill is pure aesthetics. We include semantic HTML, contrast ratios, focus states as peers of typography and color.
2. **Existing codebase integration** -- OpenAI has one exception line buried in the rules. We make context detection the first step and add Module C specifically for "adding a feature to an existing app" -- the most common real-world case that both OpenAI and Anthropic ignore entirely.
3. **Defaults with escape hatches** -- Two-tier anti-pattern system: "default against" (overridable preferences) vs "always avoid" (genuine quality failures). OpenAI mixes these in a flat list.
4. **Framework-aware animation defaults** -- OpenAI assumes Framer Motion. We detect existing animation libraries first. When no existing library is found, the default is framework-conditional: CSS animations as the universal baseline, Framer Motion for React, Vue Transition / Motion One for Vue, Svelte transitions for Svelte.
5. **Visual self-verification** -- Neither OpenAI nor Anthropic have any verification. We add a browser-based screenshot + assessment step with a tool preference cascade:
1. Existing project browser tooling (Playwright, Puppeteer, etc.)
2. Browser MCP tools (claude-in-chrome, etc.)
3. agent-browser CLI (default when nothing else exists -- load the `agent-browser` skill for setup)
4. Mental review against litmus checks (last resort)
6. **Responsive guidance** -- kept light (trust smart models) but present, unlike OpenAI's single mention.
7. **Performance awareness** -- careful balance, noting that heavy animations and multiple font imports have costs, without being prescriptive about specific thresholds.
8. **Copy guidance without arbitrary thresholds** -- OpenAI says "if deleting 30% of the copy improves the page, keep deleting." We use: "Every sentence should earn its place. Default to less copy, not more."
### Scope Control on Verification
Visual verification is a sanity check, not a pixel-perfect review. One pass. If there's a glaring issue, fix it. If it looks solid, move on. The goal is catching "this clearly doesn't work" before the user sees it.
### ce:work-beta Integration
A small addition to Phase 2 (Execute), after the existing Figma Design Sync section:
**UI task detection heuristic:** A task is a "UI task" if any of these are true:
- The task's implementation files include view, template, component, layout, or page files
- The task creates new user-visible routes or pages
- The plan text contains explicit "UI", "frontend", "design", "layout", or "styling" language
- The task references building or modifying something the user will see in a browser
The agent uses judgment -- these are heuristics, not a rigid classifier.
**What ce:work-beta adds:**
> For UI tasks without a Figma design, load the `frontend-design` skill before implementing. Follow its detection, guidance, and verification flow.
This is intentionally minimal:
- Doesn't duplicate skill content into ce:work-beta
- Doesn't load the skill for non-UI tasks
- Doesn't load the skill when Figma designs exist (Figma sync covers that)
- Doesn't change any other phase
**Verification screenshot reuse:** The frontend-design skill's visual verification screenshot satisfies ce:work-beta Phase 4's screenshot requirement. The agent does not need to screenshot twice -- the skill's verification output is reused for the PR.
**Relationship to design-iterator agent:** The frontend-design skill's verification is a single sanity-check pass. For iterative refinement beyond that (multiple rounds of screenshot-assess-fix), see the `design-iterator` agent. The skill does not invoke design-iterator automatically.
## Files Changed
| File | Change |
|------|--------|
| `plugins/compound-engineering/skills/frontend-design/SKILL.md` | Full rewrite |
| `plugins/compound-engineering/skills/ce-work-beta/SKILL.md` | Add ~5 lines to Phase 2 |
## Skill Description (Optimized)
```yaml
name: frontend-design
description: Build web interfaces with genuine design quality, not AI slop. Use for
any frontend work: landing pages, web apps, dashboards, admin panels, components,
interactive experiences. Activates for both greenfield builds and modifications to
existing applications. Detects existing design systems and respects them. Covers
composition, typography, color, motion, and copy. Verifies results via screenshots
before declaring done.
```
## Skill Structure (frontend-design/SKILL.md)
```
Frontmatter (name, description)
Preamble (what, authority hierarchy, workflow preview)
Layer 0: Context Detection
- Detect existing design signals
- Choose mode: existing / partial / greenfield
- Ask user if ambiguous
Layer 1: Pre-Build Planning
- Visual thesis (one sentence)
- Content plan (what goes where)
- Interaction plan (2-3 motion ideas)
Layer 2: Design Guidance Core
- Typography (2 typefaces max, distinctive choices, yields to existing)
- Color & Theme (CSS variables, one accent, no purple bias, yields to existing)
- Composition (poster mindset, cardless default, whitespace before chrome)
- Motion (2-3 intentional motions, use existing library, framework-conditional defaults)
- Accessibility (semantic HTML, WCAG AA contrast, focus states)
- Imagery (real photos, stable tonal areas, image generation when available)
Context Modules (select one)
- A: Landing Pages & Marketing (greenfield -- hero rules, section sequence, copy as product language)
- B: Apps & Dashboards (greenfield -- calm surfaces, utility copy, minimal chrome)
- C: Components & Features (default in existing apps -- match existing, inherit tokens, focus on states)
Hard Rules & Anti-Patterns
- Default against (overridable): generic card grids, purple bias, overused fonts, etc.
- Always avoid (quality floor): prompt language in UI, broken contrast, missing focus states
Litmus Checks
- Context-sensitive self-review questions
Visual Verification
- Tool cascade: existing > MCP > agent-browser > mental review
- One iteration, sanity check scope
- Include screenshot in deliverable
```
## What We Keep From Current Skill
- Strong anti-AI-slop identity and messaging
- Creative energy / encouragement to be bold in greenfield work
- Tone-picking exercise (brutally minimal, maximalist chaos, retro-futuristic...)
- "Differentiation" prompt: what makes this unforgettable?
- Framework-agnostic approach (HTML/CSS/JS, React, Vue, etc.)
## Cross-Agent Compatibility
Per AGENTS.md rules:
- Describe tools by capability class with platform hints, not Claude-specific names alone
- Use platform-agnostic question patterns (name known equivalents + fallback)
- No shell recipes for routine exploration
- Reference co-located scripts with relative paths
- Skill is written once, copied as-is to other platforms

View File

@@ -0,0 +1,190 @@
---
title: "feat: Rewrite frontend-design skill with layered architecture and visual verification"
type: feat
status: completed
date: 2026-03-22
origin: docs/brainstorms/2026-03-22-frontend-design-skill-improvement.md
---
# feat: Rewrite frontend-design skill with layered architecture and visual verification
## Overview
Rewrite the `frontend-design` skill from a 43-line aesthetic manifesto into a structured, layered skill that detects existing design systems, provides context-specific guidance, and verifies its own output via browser screenshots. Add a surgical trigger in `ce-work-beta` to load the skill for UI tasks without Figma designs.
## Problem Frame
The current skill provides vague creative encouragement ("be bold", "choose a BOLD aesthetic direction") but lacks practical structure. It has no mechanism to detect existing design systems, no context-specific guidance (landing pages vs dashboards vs components in existing apps), no concrete constraints, no accessibility guidance, and no verification step. The beta workflow (`ce:plan-beta` -> `deepen-plan-beta` -> `ce:work-beta`) has no way to invoke it -- the skill is effectively orphaned.
Two external sources informed the redesign: Anthropic's official frontend-design skill (nearly identical to ours, same gaps) and OpenAI's comprehensive frontend skill from March 2026 (see origin: `docs/brainstorms/2026-03-22-frontend-design-skill-improvement.md`).
## Requirements Trace
- R1. Detect existing design systems before applying opinionated guidance (Layer 0)
- R2. Enforce authority hierarchy: existing design system > user instructions > skill defaults
- R3. Provide pre-build planning step (visual thesis, content plan, interaction plan)
- R4. Cover typography, color, composition, motion, accessibility, and imagery with concrete constraints
- R5. Provide context-specific modules: landing pages, apps/dashboards, components/features
- R6. Module C (components/features) is the default when working in an existing app
- R7. Two-tier anti-pattern system: overridable defaults vs quality floor
- R8. Visual self-verification via browser screenshot with tool cascade
- R9. Cross-agent compatibility (Claude Code, Codex, Gemini CLI)
- R10. ce-work-beta loads the skill for UI tasks without Figma designs
- R11. Verification screenshot reuse -- skill's screenshot satisfies ce-work-beta Phase 4's requirement
## Scope Boundaries
- The `frontend-design` skill itself handles all design guidance and verification. ce-work-beta gets only a trigger.
- ce-work (non-beta) is not modified.
- The design-iterator agent is not modified. The skill does not invoke it.
- The agent-browser skill is upstream-vendored and not modified.
- The design-iterator's `<frontend_aesthetics>` block (which duplicates current skill content) is not cleaned up in this plan -- that is a separate follow-up.
## Context & Research
### Relevant Code and Patterns
- `plugins/compound-engineering/skills/frontend-design/SKILL.md` -- target for full rewrite (43 lines currently)
- `plugins/compound-engineering/skills/ce-work-beta/SKILL.md` -- target for surgical Phase 2 addition (lines 210-219, between Figma Design Sync and Track Progress)
- `plugins/compound-engineering/skills/ce-plan-beta/SKILL.md` -- reference for cross-agent interaction patterns (Pattern A: platform's blocking question tool with named equivalents)
- `plugins/compound-engineering/skills/reproduce-bug/SKILL.md` -- reference for cross-agent patterns
- `plugins/compound-engineering/skills/agent-browser/SKILL.md` -- upstream-vendored, reference for browser automation CLI
- `plugins/compound-engineering/agents/design/design-iterator.md` -- contains `<frontend_aesthetics>` block that overlaps with current skill; new skill will supersede this when both are loaded
- `plugins/compound-engineering/AGENTS.md` -- skill compliance checklist (cross-platform interaction, tool selection, reference rules)
### Institutional Learnings
- **Cross-platform tool references** (`docs/solutions/skill-design/compound-refresh-skill-improvements.md`): Never hardcode a single tool name with an escape hatch. Use capability-first language with platform examples and plain-text fallback. Anti-pattern table directly applicable.
- **Beta skills framework** (`docs/solutions/skill-design/beta-skills-framework.md`): frontend-design is NOT a beta skill -- it is a stable skill being improved. ce-work-beta should reference it by its stable name.
- **Codex skill conversion** (`docs/solutions/codex-skill-prompt-entrypoints.md`): Skills are copied as-is to Codex. Slash references inside SKILL.md are NOT rewritten. Use semantic wording ("load the `agent-browser` skill") rather than slash syntax.
- **Context token budget** (`docs/plans/2026-02-08-refactor-reduce-plugin-context-token-usage-plan.md`): Description field's only job is discovery. The proposed 6-line description is well-sized for the budget.
- **Script-first architecture** (`docs/solutions/skill-design/script-first-skill-architecture.md`): When a skill's core value IS the model's judgment, script-first does not apply. Frontend-design is judgment-based. Detection checklist should be inline, not in reference files.
## Key Technical Decisions
- **No `disable-model-invocation`**: The skill should auto-invoke when the model detects frontend work. Current skill does not have it; the rewrite preserves this.
- **Drop `license` frontmatter field**: Only the current frontend-design skill has this field. No other skill uses it. Drop it for consistency.
- **Inline everything in SKILL.md**: No reference files or scripts directory. The skill is pure guidance (~300-400 lines of markdown). The detection checklist, context modules, anti-patterns, litmus checks, and verification cascade all live in one file.
- **Fix ce-work-beta duplicate numbering**: The current Phase 2 has two items numbered "6." (Figma Design Sync and Track Progress). Fix this while inserting the new section.
- **Framework-conditional animation defaults**: CSS animations as universal baseline. Framer Motion for React, Vue Transition / Motion One for Vue, Svelte transitions for Svelte. Only when no existing animation library is detected.
- **Semantic skill references only**: Reference agent-browser as "load the `agent-browser` skill" not `/agent-browser`. Per AGENTS.md and Codex conversion learnings.
## Open Questions
### Resolved During Planning
- **Should the skill have `disable-model-invocation: true`?** No. It should auto-invoke for frontend work. The current skill does not have it.
- **Should Module A/B ever apply in an existing app?** No. When working inside an existing app, always default to Module C regardless of what's being built. Modules A and B are for greenfield work.
- **Should the `license` field be kept?** No. It is unique to this skill and inconsistent with all other skills.
### Deferred to Implementation
- **Exact line count of the rewritten skill**: Estimated 300-400 lines. The implementer should prioritize clarity over brevity but avoid bloat.
- **Whether the design-iterator's `<frontend_aesthetics>` block needs updating**: Out of scope. The new skill supersedes it when loaded. Cleanup is a separate follow-up.
## Implementation Units
- [x] **Unit 1: Rewrite frontend-design SKILL.md**
**Goal:** Replace the 43-line aesthetic manifesto with the full layered skill covering detection, planning, guidance, context modules, anti-patterns, litmus checks, and visual verification.
**Requirements:** R1, R2, R3, R4, R5, R6, R7, R8, R9
**Dependencies:** None
**Files:**
- Modify: `plugins/compound-engineering/skills/frontend-design/SKILL.md`
**Approach:**
- Full rewrite preserving only the `name` field from current frontmatter
- Use the optimized description from the brainstorm doc (see origin: Section "Skill Description (Optimized)")
- Structure as: Frontmatter -> Preamble (authority hierarchy, workflow preview) -> Layer 0 (context detection with concrete checklist, mode classification, cross-platform question pattern) -> Layer 1 (pre-build planning) -> Layer 2 (design guidance core with subsections for typography, color, composition, motion, accessibility, imagery) -> Context Modules (A/B/C) -> Hard Rules & Anti-Patterns (two tiers) -> Litmus Checks -> Visual Verification (tool cascade with scope control)
- Carry forward from current skill: anti-AI-slop identity, creative energy for greenfield, tone-picking exercise, differentiation prompt
- Apply AGENTS.md skill compliance checklist: imperative voice, capability-first tool references with platform examples, semantic skill references, no shell recipes for exploration, cross-platform question patterns with fallback
- All rules framed as defaults that yield to existing design systems and user instructions
- Copy guidance uses "Every sentence should earn its place. Default to less copy, not more." (not arbitrary percentage thresholds)
- Animation defaults are framework-conditional: CSS baseline, then Framer Motion (React), Vue Transition/Motion One (Vue), Svelte transitions (Svelte)
- Visual verification cascade: existing project tooling -> browser MCP tools -> agent-browser CLI (load the `agent-browser` skill for setup) -> mental review as last resort
- One verification pass with scope control ("sanity check, not pixel-perfect review")
- Note relationship to design-iterator: "For iterative refinement beyond a single pass, see the `design-iterator` agent"
**Patterns to follow:**
- `plugins/compound-engineering/skills/ce-plan-beta/SKILL.md` -- cross-agent interaction pattern (Pattern A)
- `plugins/compound-engineering/skills/reproduce-bug/SKILL.md` -- cross-agent tool reference pattern
- `plugins/compound-engineering/AGENTS.md` -- skill compliance checklist
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md` -- anti-pattern table for tool references
**Test scenarios:**
- Skill passes all items in the AGENTS.md skill compliance checklist
- Description field is present and follows "what + when" format
- No hardcoded Claude-specific tool names without platform equivalents
- No slash references to other skills (uses semantic wording)
- No `TodoWrite`/`TodoRead` references
- No shell commands for routine file exploration
- Cross-platform question pattern includes AskUserQuestion, request_user_input, ask_user, and a fallback
- All design rules explicitly framed as defaults (not absolutes)
- Layer 0 detection checklist is concrete (specific file patterns and config names)
- Mode classification has clear thresholds (4+ signals = existing, 1-3 = partial, 0 = greenfield)
- Visual verification section references agent-browser semantically ("load the `agent-browser` skill")
**Verification:**
- `grep -E 'description:' plugins/compound-engineering/skills/frontend-design/SKILL.md` returns the optimized description
- `grep -E '^\`(references|assets|scripts)/[^\`]+\`' plugins/compound-engineering/skills/frontend-design/SKILL.md` returns nothing (no unlinked references)
- Manual review confirms the layered structure matches the brainstorm doc's "Skill Structure" outline
- `bun run release:validate` passes
- [x] **Unit 2: Add frontend-design trigger to ce-work-beta Phase 2**
**Goal:** Insert a conditional section in ce-work-beta Phase 2 that loads the `frontend-design` skill for UI tasks without Figma designs, and fix the duplicate item numbering.
**Requirements:** R10, R11
**Dependencies:** Unit 1 (the skill must exist in its new form for the reference to be meaningful)
**Files:**
- Modify: `plugins/compound-engineering/skills/ce-work-beta/SKILL.md`
**Approach:**
- Insert new section after Figma Design Sync (line 217) and before Track Progress (line 219)
- New section titled "Frontend Design Guidance" (if applicable), following the same conditional pattern as Figma Design Sync
- Content: UI task detection heuristic (implementation files include views/templates/components/layouts/pages, creates user-visible routes, plan text contains UI/frontend/design language, or task builds something user-visible in browser) + instruction to load the `frontend-design` skill + note that the skill's verification screenshot satisfies Phase 4's screenshot requirement
- Fix duplicate "6." numbering: Figma Design Sync = 6, Frontend Design Guidance = 7, Track Progress = 8
- Keep the addition to ~10 lines including the heuristic and the verification-reuse note
- Use semantic skill reference: "load the `frontend-design` skill" (not slash syntax)
**Patterns to follow:**
- The existing Figma Design Sync section (lines 210-217) -- same conditional "(if applicable)" pattern, same level of brevity
**Test scenarios:**
- New section follows same formatting as Figma Design Sync section
- No duplicate item numbers in Phase 2
- Semantic skill reference used (no slash syntax for frontend-design)
- Verification screenshot reuse is explicit
- `bun run release:validate` passes
**Verification:**
- Phase 2 items are numbered sequentially without duplicates
- The new section references `frontend-design` skill semantically
- The verification-reuse note is present
- `bun run release:validate` passes
## System-Wide Impact
- **Interaction graph:** The frontend-design skill is auto-invocable (no `disable-model-invocation`). When loaded, it may interact with: agent-browser CLI (for verification screenshots), browser MCP tools, or existing project browser tooling. ce-work-beta Phase 2 will conditionally trigger the skill load. The design-iterator agent's `<frontend_aesthetics>` block will be superseded when both the skill and agent are active in the same context.
- **Error propagation:** If browser tooling is unavailable for verification, the skill falls back to mental review. No hard failure path.
- **State lifecycle risks:** None. This is markdown document work -- no runtime state, no data, no migrations.
- **API surface parity:** The skill description change affects how Claude discovers and triggers the skill. The new description is broader (covers existing app modifications) which may increase trigger rate.
- **Integration coverage:** The primary integration is ce-work-beta -> frontend-design skill -> agent-browser. This flow should be manually tested end-to-end with a UI task in the beta workflow.
## Risks & Dependencies
- **Trigger rate change:** The broader description may cause the skill to trigger for borderline cases (e.g., a task that touches one CSS class). Mitigated by the Layer 0 detection step which will quickly identify "existing system" mode and short-circuit most opinionated guidance.
- **Skill length:** Estimated 300-400 lines is substantial for a skill body. Mitigated by the layered architecture -- an agent in "existing system" mode can skip Layer 2's opinionated sections entirely.
- **design-iterator overlap:** The design-iterator's `<frontend_aesthetics>` block now partially duplicates the skill's Layer 2 content. Not a functional problem (the skill supersedes when loaded) but creates maintenance overhead. Flagged for follow-up cleanup.
## Sources & References
- **Origin document:** [docs/brainstorms/2026-03-22-frontend-design-skill-improvement.md](docs/brainstorms/2026-03-22-frontend-design-skill-improvement.md)
- Related code: `plugins/compound-engineering/skills/frontend-design/SKILL.md`, `plugins/compound-engineering/skills/ce-work-beta/SKILL.md`
- External inspiration: Anthropic official frontend-design skill, OpenAI "Designing Delightful Frontends with GPT-5.4" skill (March 2026)
- Institutional learnings: `docs/solutions/skill-design/compound-refresh-skill-improvements.md`, `docs/solutions/skill-design/beta-skills-framework.md`, `docs/solutions/codex-skill-prompt-entrypoints.md`