Files

Trevin Chow 12aaad31eb feat(ce-ideate): mode-aware v2 ideation (#588 )

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-17 11:40:54 -07:00

17 KiB

Raw Blame History

Codex Delegation Workflow

When delegation_active is true, code implementation is delegated to the Codex CLI (codex exec) instead of being implemented directly. The orchestrating Claude Code agent retains control of planning, review, git operations, and orchestration.

Delegation Decision

If work_delegate_decision is ask, present the recommendation and wait for the user's choice before proceeding.

When recommending Codex delegation:

"Codex delegation active. [N] implementation units -- delegating in one batch."

Delegate to Codex (recommended)

Execute with Claude Code instead

When recommending Codex delegation, multiple batches:

"Codex delegation active. [N] implementation units -- delegating in [X] batches."

Delegate to Codex (recommended)

Execute with Claude Code instead

When recommending Claude Code (all units are trivial):

"Codex delegation active, but these are small changes where the cost of delegating outweighs having Claude Code do them."

Execute with Claude Code (recommended)

Delegate to Codex anyway

If the user chooses the delegation option, proceed to Pre-Delegation Checks below. If the user chooses the Claude Code option, set delegation_active to false and return to standard execution in the parent skill.

If work_delegate_decision is auto (the default), state the execution plan in one line and proceed without waiting: "Codex delegation active. Delegating [N] units in [X] batch(es)." If all units are trivial, set delegation_active to false and proceed: "Codex delegation active. All units are trivial -- executing with Claude Code."

Pre-Delegation Checks

Run these checks once before the first batch. If any check fails, fall back to standard mode for the remainder of the plan execution. Do not re-run on subsequent batches.

0. Platform Gate

Codex delegation is only supported when the orchestrating agent is running in Claude Code. If the current session is Codex, Gemini CLI, OpenCode, or any other platform, set delegation_active to false and proceed in standard mode.

1. Environment Guard

Check whether the current agent is already running inside a Codex sandbox:

if [ -n "$CODEX_SANDBOX" ] || [ -n "$CODEX_SESSION_ID" ]; then
  echo "inside_sandbox=true"
else
  echo "inside_sandbox=false"
fi

If inside_sandbox is true, delegation would recurse or fail.

If delegation_source is argument: emit "Already inside Codex sandbox -- using standard mode." and set delegation_active to false.
If delegation_source is config or default: set delegation_active to false silently.

2. Availability Check

Codex availability (pre-resolved): !command -v codex >/dev/null 2>&1 && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_FOUND"

If the line above shows CODEX_AVAILABLE, proceed to the next check. If it shows CODEX_NOT_FOUND, the Codex CLI is not installed. Emit "Codex CLI not found (install via npm install -g @openai/codex or brew install codex) -- using standard mode." and set delegation_active to false. If it shows an unresolved command string, run command -v codex using a shell tool. If the command prints a path, proceed. If it fails or prints nothing, emit the same message and set delegation_active to false.

3. Consent Flow

If consent_granted is not true (from config work_delegate_consent):

Present a one-time consent warning using the platform's blocking question tool (AskUserQuestion in Claude Code). The consent warning explains:

Delegation sends implementation units to codex exec as a structured prompt
yolo mode (--yolo): Full system access including network. Required for verification steps that run tests or install dependencies. Recommended.
full-auto mode (--full-auto): Workspace-write sandbox, no network access.

Present the sandbox mode choice: (1) yolo (recommended), (2) full-auto.

On acceptance:

Resolve the repo root: git rev-parse --show-toplevel. Write work_delegate_consent: true and work_delegate_sandbox: <chosen-mode> to <repo-root>/.compound-engineering/config.local.yaml
To write: (1) if file or directory does not exist, create <repo-root>/.compound-engineering/ and write the YAML file; (2) if file exists, merge new keys preserving existing keys
Update consent_granted and sandbox_mode in the resolved state

On decline:

Ask whether to disable delegation entirely for this project
If yes: write work_delegate: false to <repo-root>/.compound-engineering/config.local.yaml (using the same repo root resolved above). To write: (1) if file or directory does not exist, create <repo-root>/.compound-engineering/ and write the YAML file; (2) if file exists, merge new keys preserving existing keys. Set delegation_active to false, proceed in standard mode
If no: set delegation_active to false for this invocation only, proceed in standard mode

Headless consent: If running in a headless or non-interactive context, delegation proceeds only if work_delegate_consent is already true in the config file. If consent is not recorded, set delegation_active to false silently.

Batching

Delegate all units in one batch. If the plan exceeds 5 units, split into batches at the plan's own phase boundaries, or in groups of roughly 5 -- never splitting units that share files. Skip delegation entirely if every unit is trivial.

Prompt Template

At the start of delegated execution, create a per-run OS-temp scratch directory via mktemp -d and capture its absolute path for all downstream use. All scratch files for this invocation live under that directory. Do not use .context/ — these scratch files are per-run throwaway that get cleaned up when delegated execution ends (see Cleanup below), matching the repo Scratch Space convention for one-shot artifacts. Do not pass unresolved shell-variable strings to non-shell tools (Write, Read); use the absolute path returned by mktemp -d.

SCRATCH_DIR="$(mktemp -d -t ce-work-codex-XXXXXX)"
echo "$SCRATCH_DIR"

Refer to the echoed absolute path as <scratch-dir> throughout the rest of this workflow.

Before each batch, write a prompt file to <scratch-dir>/prompt-batch-<batch-num>.md.

Build the prompt from the batch's implementation units using these XML-tagged sections:

<task>
[For a single-unit batch: Goal from the implementation unit.
For a multi-unit batch: list each unit with its Goal, stating the concrete
job, repository context, and expected end state for each.]
</task>

<files>
[Combined file list from all units in the batch -- files to create, modify, or read.]
</files>

<patterns>
[File paths from all units' "Patterns to follow" fields. If no patterns:
"No explicit patterns referenced -- follow existing conventions in the
modified files."]
</patterns>

<approach>
[For a single-unit batch: Approach from the unit.
For a multi-unit batch: list each unit's approach, noting dependencies
and suggested ordering.]
</approach>

<constraints>
- Do NOT run git commit, git push, or create PRs -- the orchestrating agent handles all git operations
- Restrict all modifications to files within the repository root
- Keep changes tightly scoped to the stated task -- avoid unrelated refactors, renames, or cleanup
- Resolve the task fully before stopping -- do not stop at the first plausible answer
- If you discover mid-execution that you need to modify files outside the repo root, complete what you can within the repo and report what you could not do via the result schema issues field
</constraints>

<testing>
Before writing tests, check whether the plan's test scenarios cover all
categories that apply to each unit. Supplement gaps before writing tests:
- Happy path: core input/output pairs from each unit's goal
- Edge cases: boundary values, empty/nil inputs, type mismatches
- Error/failure paths: invalid inputs, permission denials, downstream failures
- Integration: cross-layer scenarios that mocks alone won't prove

Write tests that name specific inputs and expected outcomes. If your changes
touch code with callbacks, middleware, or event handlers, verify the
interaction chain works end-to-end.
</testing>

<verify>
After implementing, run ALL test files together in a single command (not
per-file). Cross-file contamination (e.g., mocked globals leaking between
test files) only surfaces when tests run in the same process. If tests
fail, fix the issues and re-run until they pass. Do not report status
"completed" unless verification passes. This is your responsibility --
the orchestrator will not re-run verification independently.

[Test and lint commands from the project. Use the union of all units'
verification commands as a single combined invocation.]
</verify>

<output_contract>
Report your result via the --output-schema mechanism. Fill in every field:
- status: "completed" ONLY if all changes were made AND verification passes,
  "partial" if incomplete, "failed" if no meaningful progress
- files_modified: array of file paths you changed
- issues: array of strings describing any problems, gaps, or out-of-scope
  work discovered
- summary: one-paragraph description of what was done
- verification_summary: what you ran to verify (command and outcome).
  Example: "Ran `bun test` -- 14 tests passed, 0 failed."
  If no verification was possible, say why.
</output_contract>

Result Schema

Write the result schema to <scratch-dir>/result-schema.json (using the absolute path captured at the start) once at the start of delegated execution:

{
  "type": "object",
  "properties": {
    "status": { "enum": ["completed", "partial", "failed"] },
    "files_modified": { "type": "array", "items": { "type": "string" } },
    "issues": { "type": "array", "items": { "type": "string" } },
    "summary": { "type": "string" },
    "verification_summary": { "type": "string" }
  },
  "required": ["status", "files_modified", "issues", "summary", "verification_summary"],
  "additionalProperties": false
}

Each batch's result is written to <scratch-dir>/result-batch-<batch-num>.json via the -o flag. On plan failure, files are left in place for debugging.

If the result JSON is absent or malformed after a successful exit code, classify as task failure.

Execution Loop

Initialize a consecutive_failures counter at 0 before the first batch.

Clean-baseline preflight: Before the first batch, verify there are no uncommitted changes to tracked files:

git diff --quiet HEAD

This intentionally ignores untracked files. Only staged or unstaged modifications to tracked files make rollback unsafe. However, if untracked files exist at paths in the batch's planned Files list, rollback (git clean -fd -- <paths>) would delete them. If such overlaps are detected, warn the user and recommend committing or stashing those files before proceeding.

If tracked files are dirty, stop and present options: (1) commit current changes, (2) stash explicitly (git stash push -m "pre-delegation"), (3) continue in standard mode (sets delegation_active to false). Do not auto-stash user changes.

Delegation invocation: For each batch, execute these as separate Bash tool calls (not combined into one):

Step A — Launch (background, separate Bash call):

Write the prompt file, then make a single Bash tool call with run_in_background: true set on the tool parameter. This call returns immediately and has no timeout ceiling.

Substitute the literal absolute path captured at setup for every <scratch-dir> below. Each Bash tool call starts a fresh shell, so the $SCRATCH_DIR variable from the setup snippet is not preserved — an unresolved $SCRATCH_DIR would expand empty and break result detection.

# Substitute the resolved sandbox_mode value (yolo or full-auto) from the skill state
SANDBOX_MODE="<sandbox_mode>"

# Resolve sandbox flag
if [ "$SANDBOX_MODE" = "full-auto" ]; then
  SANDBOX_FLAG="--full-auto"
else
  SANDBOX_FLAG="--dangerously-bypass-approvals-and-sandbox"
fi

codex exec \
  -m "<delegate_model>" \
  -c 'model_reasoning_effort="<delegate_effort>"' \
  $SANDBOX_FLAG \
  --output-schema "<scratch-dir>/result-schema.json" \
  -o "<scratch-dir>/result-batch-<batch-num>.json" \
  - < "<scratch-dir>/prompt-batch-<batch-num>.md"

Critical: run_in_background: true must be set as a Bash tool parameter, not as a shell & suffix. The tool parameter is what removes the timeout ceiling. A shell & inside a foreground Bash call still hits the 2-minute default timeout.

Quoting is critical for the -c flag: use single quotes around the entire key=value and double quotes around the TOML string value inside. Example: -c 'model_reasoning_effort="high"'.

Do not improvise CLI flags or modify this invocation template.

Step B — Poll (foreground, separate Bash calls):

After the launch call returns, make a new, separate foreground Bash tool call that polls for the result file. This keeps the agent's turn active so the user cannot interfere with the working tree.

Substitute the literal absolute path captured at setup for <scratch-dir>. The shell variable from Step A does not survive across separate Bash tool calls.

RESULT_FILE="<scratch-dir>/result-batch-<batch-num>.json"
for i in $(seq 1 6); do
  test -s "$RESULT_FILE" && echo "DONE" && exit 0
  sleep 10
done
echo "Waiting for Codex..."

If the output is "Waiting for Codex...", issue the same polling command again as another separate Bash call. Repeat until the output is "DONE", then read the result file and proceed to classification.

Polling termination conditions: Stop polling when any of these conditions is met:

Result file appears (output is "DONE") -- proceed to result classification normally.
Background process exits with non-zero code -- classify as CLI failure (row 1). Rollback and fall back to standard mode.
Background process exits with zero code but result file is absent -- classify as task failure (row 2: exit 0, result JSON missing). Rollback and increment consecutive_failures.
5 polling rounds elapse (~5 minutes) without the result file appearing and without a background process notification -- treat as a hung process. Classify as CLI failure (row 1). Rollback and fall back to standard mode.

Result classification: Codex is responsible for running verification internally and fixing failures before reporting -- the orchestrator does not re-run verification independently.

#	Signal	Classification	Action
1	Exit code != 0	CLI failure	Rollback to HEAD. Fall back to standard mode for ALL remaining work.
2	Exit code 0, result JSON missing or malformed	Task failure	Rollback to HEAD. Increment `consecutive_failures`.
3	Exit code 0, `status: "failed"`	Task failure	Rollback to HEAD. Increment `consecutive_failures`.
4	Exit code 0, `status: "partial"`	Partial success	Keep the diff. Complete remaining work locally, verify, and commit. Increment `consecutive_failures`.
5	Exit code 0, `status: "completed"`	Success	Commit changes. Reset `consecutive_failures` to 0.

Result handoff — surface to user: After reading the result JSON and before committing or rolling back, display a summary so the user sees what happened. Format:

Codex batch —

Files: Verification: <verification_summary from result JSON> Issues: <issues list, or "None">

On failure or partial results, include the classification reason (e.g., "status: failed", "result JSON missing") so the user understands why the orchestrator is rolling back or completing locally.

Keep this brief — the goal is transparency, not a wall of text. One short block per batch.

Rollback procedure:

git checkout -- .
git clean -fd -- <paths from the batch's combined Files list>

Do NOT use bare git clean -fd without path arguments.

Commit on success:

git add $(git diff --name-only HEAD; git ls-files --others --exclude-standard)
git commit -m "feat(<scope>): <batch summary>"

Between batches (plans split into multiple batches): Report what completed, test results, and what's next. Continue immediately unless the user intervenes -- the checkpoint exists so the user can steer, not so they must.

Circuit breaker: After 3 consecutive failures, set delegation_active to false and emit: "Codex delegation disabled after 3 consecutive failures -- completing remaining units in standard mode."

Scratch cleanup: No explicit cleanup needed — OS temp handles eventual cleanup (macOS $TMPDIR periodic purge; Linux/WSL /tmp reboot or periodic cleanup). Leaving <scratch-dir> in place after the run also preserves intermediate artifacts for debugging if anything went wrong.

Mixed-Model Attribution

When some units are executed by Codex and others locally:

If all units used delegation: attribute to the Codex model
If all units used standard mode: attribute to the current agent's model
If mixed: note which units were delegated in the PR description and credit both models

17 KiB Raw Blame History