Files
claude-engineering-plugin/docs/solutions/skill-design/script-first-skill-architecture.md
2026-03-18 01:18:27 -07:00

5.2 KiB

title, category, date, tags, severity, component
title category date tags severity component
Offload data processing to bundled scripts to reduce token consumption skill-design 2026-03-17
token-optimization
skill-architecture
bundled-scripts
data-processing
high plugins/compound-engineering/skills

Script-First Skill Architecture

When a skill processes large datasets (session transcripts, log files, configuration inventories), having the model do the processing is a token-expensive anti-pattern. Moving data processing into a bundled Node.js script and having the model present the results cuts token usage by 60-75%.

Origin

Learned while building the claude-permissions-optimizer skill, which analyzes Claude Code session transcripts to find safe Bash commands to auto-allow. Initial iterations had the model reading JSONL session files, classifying commands against a 370-line reference doc, and normalizing patterns -- averaging 85-115k tokens per run. After moving all processing into the extraction script, runs dropped to ~40k tokens with equivalent output quality.

The Anti-Pattern: Model-as-Processor

The default instinct when building a skill that touches data is to have the model read everything into context, parse it, classify it, and reason about it. This works for small inputs but scales terribly:

  • Token usage grows linearly with data volume
  • Most tokens are spent on mechanical work (parsing JSON, matching patterns, counting frequencies)
  • Loading reference docs for classification rules inflates context further
  • The model's actual judgment contributes almost nothing to the classification output

The Pattern: Script Produces, Model Presents

skills/<skill-name>/
  SKILL.md              # Instructions: run script, present output
  scripts/
    process.mjs         # Does ALL data processing, outputs JSON
  1. Script does all mechanical work. Reading files, parsing structured formats, applying classification rules (regex, keyword lists), normalizing results, computing counts. Outputs pre-classified JSON to stdout.

  2. SKILL.md instructs presentation only. Run the script, read the JSON, format it for the user. Explicitly prohibit re-classifying, re-parsing, or loading reference files.

  3. Single source of truth for rules. Classification logic lives exclusively in the script. The SKILL.md references the script's output categories as given facts but does not define them.

Token Impact

Approach Tokens Reduction
Model does everything (read, parse, classify, present) ~100k baseline
Added "do NOT grep session files" instruction ~84k 16%
Script classifies; model still loads reference doc ~38k 62%
Script classifies; model presents only ~35k 65%

The biggest single win was moving classification into the script. The second was removing the instruction to load the reference file -- once the script handles classification, the reference file is maintenance documentation only.

When to Apply

Apply script-first architecture when a skill meets any of these:

  • Processes more than ~50 items or reads files larger than a few KB
  • Classification rules are deterministic (regex, keyword lists, lookup tables)
  • Input data follows a consistent schema (JSONL, CSV, structured logs)
  • The skill runs frequently or feeds into further analysis

Do not apply when:

  • The skill's core value is the model's judgment (code review, architectural analysis)
  • Input is unstructured natural language
  • The dataset is small enough that processing costs are negligible

Anti-Patterns to Avoid

  • Instruction-only optimization. Adding "don't do X" to SKILL.md without providing a script alternative. The model will find other token-expensive paths to the same result.

  • Hybrid classification. Having the script classify some items and the model classify the rest. This still loads context and reference docs. Go all-in on the script. Items the script can't classify should be dropped as "unclassified," not handed to the model.

  • Dual rule definitions. Classification rules in both the script AND the SKILL.md. They drift apart, the model may override the script's decisions, and tokens are wasted on re-evaluation. One source of truth.

Checklist for Skill Authors

  • Can the data processing be expressed as deterministic logic (regex, keyword matching, field checks)?
  • Script is the single owner of all classification rules
  • SKILL.md instructs the model to run the script as its first action
  • SKILL.md does not restate or duplicate the script's classification logic
  • Script output is structured JSON the model can present directly
  • Reference docs exist for maintainers but are never loaded at runtime
  • After building, verify the model is not doing any mechanical parsing or rule-application work