5.2 KiB
title, category, date, tags, severity, component
| title | category | date | tags | severity | component | ||||
|---|---|---|---|---|---|---|---|---|---|
| Offload data processing to bundled scripts to reduce token consumption | skill-design | 2026-03-17 |
|
high | plugins/compound-engineering/skills |
Script-First Skill Architecture
When a skill processes large datasets (session transcripts, log files, configuration inventories), having the model do the processing is a token-expensive anti-pattern. Moving data processing into a bundled Node.js script and having the model present the results cuts token usage by 60-75%.
Origin
Learned while building the claude-permissions-optimizer skill, which analyzes Claude Code session transcripts to find safe Bash commands to auto-allow. Initial iterations had the model reading JSONL session files, classifying commands against a 370-line reference doc, and normalizing patterns -- averaging 85-115k tokens per run. After moving all processing into the extraction script, runs dropped to ~40k tokens with equivalent output quality.
The Anti-Pattern: Model-as-Processor
The default instinct when building a skill that touches data is to have the model read everything into context, parse it, classify it, and reason about it. This works for small inputs but scales terribly:
- Token usage grows linearly with data volume
- Most tokens are spent on mechanical work (parsing JSON, matching patterns, counting frequencies)
- Loading reference docs for classification rules inflates context further
- The model's actual judgment contributes almost nothing to the classification output
The Pattern: Script Produces, Model Presents
skills/<skill-name>/
SKILL.md # Instructions: run script, present output
scripts/
process.mjs # Does ALL data processing, outputs JSON
-
Script does all mechanical work. Reading files, parsing structured formats, applying classification rules (regex, keyword lists), normalizing results, computing counts. Outputs pre-classified JSON to stdout.
-
SKILL.md instructs presentation only. Run the script, read the JSON, format it for the user. Explicitly prohibit re-classifying, re-parsing, or loading reference files.
-
Single source of truth for rules. Classification logic lives exclusively in the script. The SKILL.md references the script's output categories as given facts but does not define them.
Token Impact
| Approach | Tokens | Reduction |
|---|---|---|
| Model does everything (read, parse, classify, present) | ~100k | baseline |
| Added "do NOT grep session files" instruction | ~84k | 16% |
| Script classifies; model still loads reference doc | ~38k | 62% |
| Script classifies; model presents only | ~35k | 65% |
The biggest single win was moving classification into the script. The second was removing the instruction to load the reference file -- once the script handles classification, the reference file is maintenance documentation only.
When to Apply
Apply script-first architecture when a skill meets any of these:
- Processes more than ~50 items or reads files larger than a few KB
- Classification rules are deterministic (regex, keyword lists, lookup tables)
- Input data follows a consistent schema (JSONL, CSV, structured logs)
- The skill runs frequently or feeds into further analysis
Do not apply when:
- The skill's core value is the model's judgment (code review, architectural analysis)
- Input is unstructured natural language
- The dataset is small enough that processing costs are negligible
Anti-Patterns to Avoid
-
Instruction-only optimization. Adding "don't do X" to SKILL.md without providing a script alternative. The model will find other token-expensive paths to the same result.
-
Hybrid classification. Having the script classify some items and the model classify the rest. This still loads context and reference docs. Go all-in on the script. Items the script can't classify should be dropped as "unclassified," not handed to the model.
-
Dual rule definitions. Classification rules in both the script AND the SKILL.md. They drift apart, the model may override the script's decisions, and tokens are wasted on re-evaluation. One source of truth.
Checklist for Skill Authors
- Can the data processing be expressed as deterministic logic (regex, keyword matching, field checks)?
- Script is the single owner of all classification rules
- SKILL.md instructs the model to run the script as its first action
- SKILL.md does not restate or duplicate the script's classification logic
- Script output is structured JSON the model can present directly
- Reference docs exist for maintainers but are never loaded at runtime
- After building, verify the model is not doing any mechanical parsing or rule-application work
Related
- Reduce plugin context token usage -- established the principle that descriptions are for discovery, detailed content belongs in the body
- Compound refresh skill improvements -- patterns for autonomous skill execution and subagent architecture
- Beta skills framework -- skill organization and rollout conventions