Script-First Skill Architecture

When a skill processes large datasets (session transcripts, log files, configuration inventories), having the model do the processing is a token-expensive anti-pattern. Moving data processing into a bundled Node.js script and having the model present the results cuts token usage by 60-75%.

Origin

Learned while building the claude-permissions-optimizer skill, which analyzes Claude Code session transcripts to find safe Bash commands to auto-allow. Initial iterations had the model reading JSONL session files, classifying commands against a 370-line reference doc, and normalizing patterns -- averaging 85-115k tokens per run. After moving all processing into the extraction script, runs dropped to ~40k tokens with equivalent output quality.

The Anti-Pattern: Model-as-Processor

The default instinct when building a skill that touches data is to have the model read everything into context, parse it, classify it, and reason about it. This works for small inputs but scales terribly:

Token usage grows linearly with data volume
Most tokens are spent on mechanical work (parsing JSON, matching patterns, counting frequencies)
Loading reference docs for classification rules inflates context further
The model's actual judgment contributes almost nothing to the classification output

The Pattern: Script Produces, Model Presents

skills/<skill-name>/
  SKILL.md              # Instructions: run script, present output
  scripts/
    process.mjs         # Does ALL data processing, outputs JSON

Script does all mechanical work. Reading files, parsing structured formats, applying classification rules (regex, keyword lists), normalizing results, computing counts. Outputs pre-classified JSON to stdout.
SKILL.md instructs presentation only. Run the script, read the JSON, format it for the user. Explicitly prohibit re-classifying, re-parsing, or loading reference files.
Single source of truth for rules. Classification logic lives exclusively in the script. The SKILL.md references the script's output categories as given facts but does not define them.

Token Impact

Approach	Tokens	Reduction
Model does everything (read, parse, classify, present)	~100k	baseline
Added "do NOT grep session files" instruction	~84k	16%
Script classifies; model still loads reference doc	~38k	62%
Script classifies; model presents only	~35k	65%

The biggest single win was moving classification into the script. The second was removing the instruction to load the reference file -- once the script handles classification, the reference file is maintenance documentation only.

When to Apply

Apply script-first architecture when a skill meets any of these:

Processes more than ~50 items or reads files larger than a few KB
Classification rules are deterministic (regex, keyword lists, lookup tables)
Input data follows a consistent schema (JSONL, CSV, structured logs)
The skill runs frequently or feeds into further analysis

Do not apply when:

The skill's core value is the model's judgment (code review, architectural analysis)
Input is unstructured natural language
The dataset is small enough that processing costs are negligible

Anti-Patterns to Avoid

Instruction-only optimization. Adding "don't do X" to SKILL.md without providing a script alternative. The model will find other token-expensive paths to the same result.
Hybrid classification. Having the script classify some items and the model classify the rest. This still loads context and reference docs. Go all-in on the script. Items the script can't classify should be dropped as "unclassified," not handed to the model.
Dual rule definitions. Classification rules in both the script AND the SKILL.md. They drift apart, the model may override the script's decisions, and tokens are wasted on re-evaluation. One source of truth.

Checklist for Skill Authors

Can the data processing be expressed as deterministic logic (regex, keyword matching, field checks)?
Script is the single owner of all classification rules
SKILL.md instructs the model to run the script as its first action
SKILL.md does not restate or duplicate the script's classification logic
Script output is structured JSON the model can present directly
Reference docs exist for maintainers but are never loaded at runtime
After building, verify the model is not doing any mechanical parsing or rule-application work

Reduce plugin context token usage -- established the principle that descriptions are for discovery, detailed content belongs in the body
Compound refresh skill improvements -- patterns for autonomous skill execution and subagent architecture
Beta skills framework -- skill organization and rollout conventions

5.2 KiB Raw Blame History