Files
claude-engineering-plugin/docs/plans/2026-02-08-refactor-reduce-plugin-context-token-usage-plan.md
Kieran Klaassen f744b797ef Reduce context token usage by 79% — fix silent component exclusion (#161)
* Update create-agent-skills to match 2026 official docs, add /triage-prs command

- Rewrite SKILL.md to document that commands and skills are now merged
- Add new frontmatter fields: disable-model-invocation, user-invocable, context, agent
- Add invocation control table and dynamic context injection docs
- Fix skill-structure.md: was incorrectly recommending XML tags over markdown headings
- Update official-spec.md with complete 2026 specification
- Add local /triage-prs command for PR triage workflow
- Add PR triage plan document

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [2.31.0] Reduce context token usage by 79%, include recent community contributions

The plugin was consuming 316% of Claude Code's description character budget
(~50,500 chars vs 16,000 limit), causing components to be silently excluded.
Now at 65% (~10,400 chars) with all components visible.

Changes:
- Trim all 29 agent descriptions (move examples to body)
- Add disable-model-invocation to 18 manual commands
- Add disable-model-invocation to 6 manual skills
- Include recent community contributions in changelog
- Fix component counts (29 agents, 24 commands, 18 skills)

Contributors: @trevin, @terryli, @robertomello, @zacwilliams,
@aarnikoskela, @samxie, @davidalley

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: keep disable-model-invocation off commands called by /lfg, rename xcode-test

- Remove disable-model-invocation from test-browser, feature-video,
  resolve_todo_parallel — these are called programmatically by /lfg and /slfg
- Rename xcode-test to test-xcode to match test-browser naming convention

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: keep git-worktree skill auto-invocable (used by /workflows:work)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(converter): support disable-model-invocation frontmatter

Parse disable-model-invocation from command and skill frontmatter.
Commands/skills with this flag are excluded from OpenCode command maps
and Codex prompt/skill generation, matching Claude Code behavior where
these components are user-only invocable.

Bump converter version to 0.3.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 22:28:51 -06:00

9.4 KiB

title, type, date
title type date
Reduce compound-engineering plugin context token usage refactor 2026-02-08

Reduce compound-engineering Plugin Context Token Usage

Overview

The compound-engineering plugin is overflowing the default context budget by ~3x, causing Claude Code to silently drop components. The plugin consumes ~50,500 characters in always-loaded descriptions against a default budget of 16,000 characters (2% of context window). This means Claude literally doesn't know some agents/skills exist during sessions.

Problem Statement

How Context Loading Works

Claude Code uses progressive disclosure for plugin content:

Level What Loads When
Always in context description frontmatter from skills, commands, and agents Session startup (unless disable-model-invocation: true)
On invocation Full SKILL.md / command body / agent body When triggered
On demand Reference files in skill directories When Claude reads them

The total budget for ALL descriptions combined is 2% of context window (~16,000 chars fallback). When exceeded, components are silently excluded.

Current State: 316% of Budget

Component Count Always-Loaded Chars % of 16K Budget
Agent descriptions 29 ~41,400 259%
Skill descriptions 16 ~5,450 34%
Command descriptions 24 ~3,700 23%
Total 69 ~50,500 316%

Root Cause: Bloated Agent Descriptions

Agent description fields contain full <example> blocks with user/assistant dialog. These examples belong in the agent body (system prompt), not the description. The description's only job is discovery — helping Claude decide whether to delegate.

Examples of the problem:

  • design-iterator.md: 2,488 chars in description (should be ~200)
  • spec-flow-analyzer.md: 2,289 chars in description
  • security-sentinel.md: 1,986 chars in description
  • kieran-rails-reviewer.md: 1,822 chars in description
  • Average agent description: ~1,400 chars (should be 100-250)

Compare to Anthropic's official examples at 100-200 chars:

# Official (140 chars)
description: Expert code review specialist. Proactively reviews code for quality, security, and maintainability. Use immediately after writing or modifying code.

# Current plugin (1,822 chars)
description: "Use this agent when you need to review Rails code changes with an extremely high quality bar...\n\nExamples:\n- <example>\n  Context: The user has just implemented..."

Secondary Cause: No disable-model-invocation on Manual Commands

Zero commands set disable-model-invocation: true. Commands like /deploy-docs, /lfg, /slfg, /triage, /feature-video, /test-browser, /xcode-test are manual workflows with side effects. Their descriptions consume budget unnecessarily.

The official docs explicitly state:

Use disable-model-invocation: true for workflows with side effects: /deploy, /commit, /triage-prs. You don't want Claude deciding to deploy because your code looks ready.


Proposed Solution

Three changes, ordered by impact:

Phase 1: Trim Agent Descriptions (saves ~35,600 chars)

For all 29 agents: move <example> blocks from the description field into the agent body markdown. Keep descriptions to 1-2 sentences (100-250 chars).

Before (agent frontmatter):

---
name: kieran-rails-reviewer
description: "Use this agent when you need to review Rails code changes with an extremely high quality bar. This agent should be invoked after implementing features, modifying existing code, or creating new Rails components. The agent applies Kieran's strict Rails conventions and taste preferences to ensure code meets exceptional standards.\n\nExamples:\n- <example>\n  Context: The user has just implemented a new controller action with turbo streams.\n  user: \"I've added a new update action to the posts controller\"\n  ..."
---

Detailed system prompt...

After (agent frontmatter):

---
name: kieran-rails-reviewer
description: Review Rails code with Kieran's strict conventions. Use after implementing features, modifying code, or creating new Rails components.
---

<examples>
<example>
Context: The user has just implemented a new controller action with turbo streams.
user: "I've added a new update action to the posts controller"
...
</example>
</examples>

Detailed system prompt...

The examples move into the body (which only loads when the agent is actually invoked).

Impact: ~41,400 chars → ~5,800 chars (86% reduction)

Phase 2: Add disable-model-invocation: true to Manual Commands (saves ~3,100 chars)

Commands that should only run when explicitly invoked by the user:

Command Reason
/deploy-docs Side effect: deploys
/release-docs Side effect: regenerates docs
/changelog Side effect: generates changelog
/lfg Side effect: autonomous workflow
/slfg Side effect: swarm workflow
/triage Side effect: categorizes findings
/resolve_parallel Side effect: resolves TODOs
/resolve_todo_parallel Side effect: resolves todos
/resolve_pr_parallel Side effect: resolves PR comments
/feature-video Side effect: records video
/test-browser Side effect: runs browser tests
/xcode-test Side effect: builds/tests iOS
/reproduce-bug Side effect: runs reproduction
/report-bug Side effect: creates bug report
/agent-native-audit Side effect: runs audit
/heal-skill Side effect: modifies skill files
/generate_command Side effect: creates files
/create-agent-skill Side effect: creates files

Keep these without the flag (Claude should know about them):

  • /workflows:plan — Claude might suggest planning
  • /workflows:work — Claude might suggest starting work
  • /workflows:review — Claude might suggest review
  • /workflows:brainstorm — Claude might suggest brainstorming
  • /workflows:compound — Claude might suggest documenting
  • /deepen-plan — Claude might suggest deepening a plan

Impact: ~3,700 chars → ~600 chars for commands in context

Phase 3: Add disable-model-invocation: true to Manual Skills (saves ~1,000 chars)

Skills that are manual workflows:

Skill Reason
skill-creator Only invoked manually
orchestrating-swarms Only invoked manually
git-worktree Only invoked manually
resolve-pr-parallel Side effect
compound-docs Only invoked manually
file-todos Only invoked manually

Keep without the flag (Claude should auto-invoke):

  • dhh-rails-style — Claude should use when writing Rails code
  • frontend-design — Claude should use when building UI
  • brainstorming — Claude should suggest before implementation
  • agent-browser — Claude should use for browser tasks
  • gemini-imagegen — Claude should use for image generation
  • create-agent-skills — Claude should use when creating skills
  • every-style-editor — Claude should use for editing
  • dspy-ruby — Claude should use for DSPy.rb
  • agent-native-architecture — Claude should use for agent-native design
  • andrew-kane-gem-writer — Claude should use for gem writing
  • rclone — Claude should use for cloud uploads
  • document-review — Claude should use for doc review

Impact: ~5,450 chars → ~4,000 chars for skills in context


Projected Result

Component Before (chars) After (chars) Reduction
Agent descriptions ~41,400 ~5,800 -86%
Command descriptions ~3,700 ~600 -84%
Skill descriptions ~5,450 ~4,000 -27%
Total ~50,500 ~10,400 -79%
% of 16K budget 316% 65% --

From 316% of budget (components silently dropped) to 65% of budget (room for growth).


Acceptance Criteria

  • All 29 agent description fields are under 250 characters
  • All <example> blocks moved from description to agent body
  • 18 manual commands have disable-model-invocation: true
  • 6 manual skills have disable-model-invocation: true
  • Total always-loaded description content is under 16,000 characters
  • Run /context to verify no "excluded skills" warnings
  • All agents still function correctly (examples are in body, not lost)
  • All commands still invocable via /command-name
  • Update plugin version in plugin.json and marketplace.json
  • Update CHANGELOG.md

Implementation Notes

  • Agent examples should use <examples><example>...</example></examples> tags in the body — Claude understands these natively
  • Description format: "[What it does]. Use [when/trigger condition]." — two sentences max
  • The lint agent at 115 words shows compact agents work great
  • Test with claude --plugin-dir ./plugins/compound-engineering after changes
  • The SLASH_COMMAND_TOOL_CHAR_BUDGET env var can override the default budget for testing

References

  • Skills docs — "Skill descriptions are loaded into context... If you have many skills, they may exceed the character budget"
  • Subagents docs — description field used for automatic delegation
  • Skills troubleshooting — "The budget scales dynamically at 2% of the context window, with a fallback of 16,000 characters"