From a9fd8421f42d598e8d85c4cb50cbec0fa3d6af46 Mon Sep 17 00:00:00 2001
From: Trevin Chow <trevin@trevinchow.com>
Date: Wed, 22 Apr 2026 19:52:02 -0700
Subject: [PATCH] fix(ce-proof): correct op shapes and add retry/batch
 discipline (#658)

---
 .../skills/ce-proof/SKILL.md                  | 45 +++++++---
 .../skills/ce-proof/references/hitl-review.md | 89 +++++++++++++++----
 2 files changed, 105 insertions(+), 29 deletions(-)

diff --git a/plugins/compound-engineering/skills/ce-proof/SKILL.md b/plugins/compound-engineering/skills/ce-proof/SKILL.md
index 7098932..0d091a3 100644
--- a/plugins/compound-engineering/skills/ce-proof/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-proof/SKILL.md
@@ -79,9 +79,21 @@ All operations go to `POST https://www.proofeditor.ai/api/agent/{slug}/ops`
 
 **Wire-format reminder.** `/api/agent/{slug}/ops` uses a top-level `type` field; `/api/agent/{slug}/edit/v2` uses an `operations` array where each entry has `op`. Do not mix — sending `op` to `/ops` returns 422.
 
-**Every mutation requires a `baseToken`.** Read it from `/state.mutationBase.token` (or `/snapshot.mutationBase.token`) immediately before each write, and include it in the request body. On `BASE_TOKEN_REQUIRED` or `STALE_BASE`, re-read and retry once. See the baseToken recipe in `references/hitl-review.md`.
+**Every mutation requires a `baseToken`.** Reuse the `mutationBase.token` from the most recent `/state` or `/snapshot` read — tokens don't go stale in seconds, and `STALE_BASE` is a recoverable error. On `BASE_TOKEN_REQUIRED` or `STALE_BASE`, re-read and retry once. Only do a pre-mutation read if no prior read has happened in this session. See the baseToken recipe in `references/hitl-review.md`.
 
-**`Idempotency-Key` header** is recommended on every mutation for safe automation retries; required when `/state.contract.idempotencyRequired` is true.
+`/edit/v2` block refs are a separate concern: they can drift across revisions, so re-fetch `/snapshot` for fresh refs before a block edit if any writes have landed since your last snapshot.
+
+**Retry discipline after mutation errors — verify before retrying.** An error response is not proof that nothing was written.
+
+- `STALE_BASE`, `BASE_TOKEN_REQUIRED`, `MISSING_BASE`, `INVALID_BASE_TOKEN` — pre-commit, token-related. Re-read `/state` and retry once with the same payload and a fresh `baseToken`. A generic mutate helper can auto-retry these.
+- `ANCHOR_NOT_FOUND`, `ANCHOR_AMBIGUOUS` — pre-commit, but the `quote` no longer uniquely matches content. Re-reading does not help by itself; the caller must tighten or regenerate the anchor before retrying. Do not auto-retry blindly.
+- `INVALID_OPERATIONS`, `INVALID_REQUEST`, `INVALID_REF`, `INVALID_BLOCK_MARKDOWN`, `INVALID_RANGE`, `INVALID_MARKDOWN`, 422 — pre-commit, but the payload is wrong. Do not retry blindly; fix the payload first.
+- `COLLAB_SYNC_FAILED`, `REWRITE_BARRIER_FAILED`, `PROJECTION_STALE`, `INTERNAL_ERROR`, 5xx, network timeout, and any **202 with `collab.status: "pending"`** — the canonical doc may have been written even though the call looks like a failure. Before any retry, re-read `/state` and check whether the intended mark/edit is already present; only retry if it isn't.
+- `Idempotency-Key` (see below) protects against double-apply *on the same request* (e.g., TCP-level retry). It does not help if you build a new request body and send a second call — that is a new logical write with a new key.
+
+Duplicate-mark incidents usually come from retrying a `comment.add` or `suggestion.add` after a timeout without verifying. When in doubt: re-read, diff, then decide.
+
+**`Idempotency-Key` header** is recommended on every mutation for safe automation retries; required when `/state.contract.idempotencyRequired` is true. Use the same key when retrying the exact same logical write (same payload) so the server can collapse the retry. A new key means a new write — even if the payload is identical.
 
 **Comment on text:**
 ```json
@@ -136,12 +148,23 @@ curl -X POST "https://www.proofeditor.ai/api/agent/{slug}/edit/v2" \
     "baseToken": "mt1:<token>",
     "operations": [
       {"op": "replace_block", "ref": "b3", "block": {"markdown": "Updated paragraph."}},
-      {"op": "insert_after", "ref": "b3", "block": {"markdown": "## New section"}}
+      {"op": "insert_after", "ref": "b3", "blocks": [{"markdown": "## New section"}]}
     ]
   }'
 ```
 
-Supported `op` kinds inside `operations`: `replace_block`, `insert_before`, `insert_after`, `delete_block`, `replace_range` (uses `fromRef` + `toRef`), `find_replace_in_block` (takes `occurrence: "first" | "all"`). Read `/snapshot` to get stable block `ref` IDs and the `mutationBase.token`.
+Per-op body shape (singular `block` vs plural `blocks` is load-bearing — sending the wrong one returns 422):
+
+| op | body fields |
+|---|---|
+| `replace_block` | `ref`, `block: {markdown}` |
+| `insert_after` | `ref`, `blocks: [{markdown}, ...]` |
+| `insert_before` | `ref`, `blocks: [{markdown}, ...]` |
+| `delete_block` | `ref` |
+| `replace_range` | `fromRef`, `toRef`, `blocks: [{markdown}, ...]` |
+| `find_replace_in_block` | `ref`, `find`, `replace`, `occurrence: "first" \| "all"` |
+
+Read `/snapshot` to get stable block `ref` IDs. `operations` commits atomically — either every op lands or none do — so one `/edit/v2` call can batch dozens of block edits safely and efficiently (see the bulk-sweep guidance in `references/hitl-review.md` Phase 2.4).
 
 **Editing while a client is connected is fine.** `/edit/v2`, `suggestion.add` (including `status: "accepted"`), and all comment ops work during active collab. Only `rewrite.apply` is blocked by `LIVE_CLIENTS_PRESENT` — it would clobber in-flight Yjs edits.
 
@@ -193,13 +216,11 @@ When given a Proof URL like `https://www.proofeditor.ai/d/abc123?token=xxx`:
 4. The author sees changes in real-time
 
 ```bash
-# Read
-curl -s "https://www.proofeditor.ai/api/agent/abc123/state" \
-  -H "x-share-token: xxx"
-
-# Get baseToken for the next mutation
-BASE=$(curl -s "https://www.proofeditor.ai/api/agent/abc123/state" \
-  -H "x-share-token: xxx" | jq -r '.mutationBase.token')
+# Read once — the same response yields both the doc content and the baseToken for every mutation below.
+STATE=$(curl -s "https://www.proofeditor.ai/api/agent/abc123/state" \
+  -H "x-share-token: xxx")
+BASE=$(printf '%s' "$STATE" | jq -r '.mutationBase.token')
+# Inspect doc fields as needed: printf '%s' "$STATE" | jq '.markdown, .revision'
 
 # Comment
 curl -X POST "https://www.proofeditor.ai/api/agent/abc123/ops" \
@@ -291,4 +312,4 @@ rm "$STATE_TMP"
 - During active collab use `edit/v2` (direct block changes) or `suggestion.add` (tracked changes); reserve `rewrite.apply` for no-client scenarios since it's blocked by `LIVE_CLIENTS_PRESENT` when anyone is connected
 - Don't span table cells in a single replace
 - Always include `by: "ai:compound-engineering"` on every op and `X-Agent-Id: ai:compound-engineering` in headers for consistent attribution
-- Read a fresh `baseToken` before every mutation; on `STALE_BASE`, re-read and retry once
+- Reuse `baseToken` from your most recent `/state` or `/snapshot` read; on `STALE_BASE`, re-read and retry once
diff --git a/plugins/compound-engineering/skills/ce-proof/references/hitl-review.md b/plugins/compound-engineering/skills/ce-proof/references/hitl-review.md
index 4a56053..5fe0a9a 100644
--- a/plugins/compound-engineering/skills/ce-proof/references/hitl-review.md
+++ b/plugins/compound-engineering/skills/ce-proof/references/hitl-review.md
@@ -101,6 +101,8 @@ Real feedback blends types — "this is wrong, rename to Y" is both objection an
 
 **Invariant:** every attention-needing mark ends the pass with an agent reply in its thread. Unreplied = "still to do" — the next pass re-classifies it. This is what makes the loop idempotent without a sidecar: mark state *is* the state. Even when the agent disagrees or can't decide, reply (with reasoning or a question) rather than silently skip.
 
+**Parallelize independent thread ops.** `comment.reply` and `comment.resolve` across different marks don't conflict — they touch different thread state and a stale `baseToken` on one doesn't poison another (retry-on-`STALE_BASE` is cheap, per-mark, and local). When there are more than ~3 attention-needing marks that classify as plain replies or resolves, dispatch them in parallel — either via multiple tool calls in one turn or via sub-agents (`Agent`/`Task` in Claude Code, `spawn_agent` in Codex, `subagent` in Pi). Keep block-mutating edits (`suggestion.add`, `/edit/v2`) sequential or batch them through one `/edit/v2` call — concurrent block edits can stale one another's `baseToken` and force retries, and they interact in ways that are easier to reason about as an ordered sequence.
+
 ### 2.4 Apply edits
 
 The user is collaborating in the doc, not waiting on approval. Every mutation works with live clients — only whole-doc `rewrite.apply` is gated. Pick the tool that matches intent:
@@ -130,19 +132,20 @@ curl -X POST "https://www.proofeditor.ai/api/agent/{slug}/edit/v2" \
   -d '{"by":"ai:compound-engineering","baseToken":"<token>","operations":[...]}'
 ```
 
-Supported `op` kinds: `replace_block`, `insert_before`, `insert_after`, `delete_block`, `replace_range` (`fromRef`+`toRef`), `find_replace_in_block` (`occurrence: "first"|"all"`).
-
-Op body shapes (block content must be wrapped in `block: {markdown}` — the server rejects flat `{op, ref, markdown}` shapes):
+Per-op body shape (singular `block` for `replace_block`; plural `blocks:[{markdown},...]` for anything that can add content; the server returns 422 on the wrong shape):
 
 ```json
 {"op":"replace_block","ref":"b8","block":{"markdown":"new content"}}
-{"op":"insert_after","ref":"b3","block":{"markdown":"new block"}}
+{"op":"insert_after","ref":"b3","blocks":[{"markdown":"new block"}]}
+{"op":"insert_before","ref":"b3","blocks":[{"markdown":"new block"}]}
 {"op":"delete_block","ref":"b6"}
 {"op":"find_replace_in_block","ref":"b4","find":"old","replace":"new","occurrence":"first"}
-{"op":"replace_range","fromRef":"b2","toRef":"b5","block":{"markdown":"..."}}
+{"op":"replace_range","fromRef":"b2","toRef":"b5","blocks":[{"markdown":"..."}]}
 ```
 
-Block `ref` values drift across revisions — always re-fetch `/snapshot` for fresh refs before each `/edit/v2` call.
+Block `ref` values drift across revisions — re-fetch `/snapshot` for fresh refs before each `/edit/v2` call if any writes have landed since the last snapshot.
+
+**Bulk mechanical sweep — prefer one `/edit/v2` call over N `suggestion.add` calls.** When a uniform change hits more than ~5 blocks (emdash sweep, terminology rename across a doc, heading-style normalization), batch it as a single `/edit/v2` with many `operations`. One round-trip, one atomic commit, one audit entry — versus N separate ops-endpoint calls, N baseToken reads (without the caching below), and N tracked marks for what is one logical change. Use `suggestion.add` + `accepted` when edits are distinct and anchored (each deserves its own reject-to-revert trail); use `/edit/v2` batch when they're variations of the same mechanical rule.
 
 **Use pending `suggestion.add` (no status)** when the change is judgment-sensitive enough that the agent wants explicit user approval before commit — rare in HITL, since the point of auto-applied edits is to reduce round-trips. Most judgment-sensitive cases are better handled by leaving the thread open with a clarifying question.
 
@@ -151,12 +154,19 @@ Block `ref` values drift across revisions — always re-fetch `/snapshot` for fr
 **Mutation requirements (every write, including replies and resolves):**
 
 - Top-level field is `type` on `/ops`; `operations[].op` on `/edit/v2`. Do not mix.
-- Include `baseToken` from `/state.mutationBase.token` (or `/snapshot.mutationBase.token` for `/edit/v2`). On `STALE_BASE` or `BASE_TOKEN_REQUIRED`, re-read and retry once.
+- Include `baseToken` from `/state.mutationBase.token` (or `/snapshot.mutationBase.token` for `/edit/v2`).
 - Set `by: "ai:compound-engineering"` and header `X-Agent-Id: ai:compound-engineering`.
-- Include an `Idempotency-Key` header (fresh UUID per logical write) so retries stay safe.
+- Include an `Idempotency-Key` header (fresh UUID per logical write). Reuse the same key on a proven retry of the same payload; use a new key for a new logical write.
 - Reply: `{"type":"comment.reply","markId":"<id>","by":"ai:compound-engineering","text":"..."}`. Resolve: `{"type":"comment.resolve","markId":"<id>","by":"ai:compound-engineering"}`. Reopen if needed: `{"type":"comment.unresolve", ...}`.
 
-**When the loop breaks.** If a mutation keeps failing after a fresh read and one retry, or two reads disagree about state, call `POST https://www.proofeditor.ai/api/bridge/report_bug` with the request ID, slug, and raw response body before falling back. Don't silently skip — that loses the audit trail the user is relying on.
+**Retry after any error is verify-first, not retry-first.** The Proof API can commit canonically and still return a non-2xx or a 202 with `collab.status: "pending"`; network timeouts can hit after the server has already written. Retrying without verifying is the most common cause of duplicate marks (same comment twice, same suggestion twice) that then need a manual cleanup pass.
+
+- On `STALE_BASE` / `BASE_TOKEN_REQUIRED` / `MISSING_BASE` / `INVALID_BASE_TOKEN`: pre-commit, token-related. Re-read `/state`, send the same payload with a fresh `baseToken`, retry once. The `mutate()` helper below auto-retries these.
+- On `ANCHOR_NOT_FOUND` / `ANCHOR_AMBIGUOUS`: pre-commit, but the `quote` no longer matches uniquely. Re-read is not enough; the caller must tighten or regenerate the anchor before retrying. The helper surfaces the error instead of auto-retrying.
+- On `INVALID_OPERATIONS` / `INVALID_REQUEST` / `INVALID_REF` / `INVALID_BLOCK_MARKDOWN` / `INVALID_RANGE` / `INVALID_MARKDOWN` / 422: the payload is wrong. Do not retry — fix the payload and send a new write.
+- On `COLLAB_SYNC_FAILED` / `REWRITE_BARRIER_FAILED` / `PROJECTION_STALE` / `INTERNAL_ERROR` / 5xx / network error / timeout / **202 with `collab.status: "pending"`**: the write may have landed. Re-read `/state`, diff against the intended change (mark exists? suggestion applied? quote replaced?), and only retry if the server did not actually commit it. If the diff shows the write did land, treat the call as successful even though the response said otherwise.
+
+**When the loop breaks.** If a mutation keeps failing after a fresh read and a verified-needed retry, or two reads disagree about state, call `POST https://www.proofeditor.ai/api/bridge/report_bug` with the request ID, slug, and raw response body before falling back. Don't silently skip — that loses the audit trail the user is relying on.
 
 ---
 
@@ -277,28 +287,73 @@ Do **not** delete the Proof doc. It remains the durable review record; the calle
 
 ### BaseToken-aware mutation
 
+Reuse `baseToken` from the most recent `/state` read. Only re-read on `STALE_BASE` / `BASE_TOKEN_REQUIRED`. For an ingest pass this means one `/state` read at Phase 2.1 feeds every subsequent mutation, not N reads for N mutations.
+
+Two retry classes, and they behave differently. The helper below only covers the safe class; the ambiguous class needs a caller-supplied verifier because "did this write land?" depends on what the payload was (look for a markId, a quote replacement, a thread reply, etc.).
+
 ```bash
 SLUG=<slug>
 TOKEN=<accessToken>
 AGENT_ID=ai:compound-engineering
+BASE=<cached from most recent /state or /snapshot read>
 
 mutate() {
   local PAYLOAD="$1"  # jq template without baseToken
-  local BASE
-  BASE=$(curl -s "https://www.proofeditor.ai/api/agent/$SLUG/state" \
-    -H "x-share-token: $TOKEN" | jq -r '.mutationBase.token')
-  curl -s -X POST "https://www.proofeditor.ai/api/agent/$SLUG/ops" \
+  local IDEM_KEY BODY RESP CODE
+  # Fresh key per logical write (per mutate() call). The same key is then
+  # reused below only within the retry of this single logical write — NOT
+  # across different payloads, which would make the server collapse
+  # distinct writes as duplicates and silently drop later edits.
+  IDEM_KEY=$(uuidgen)
+  BODY=$(jq -n --arg base "$BASE" --argjson payload "$PAYLOAD" '$payload + {baseToken: $base}')
+  RESP=$(curl -s -X POST "https://www.proofeditor.ai/api/agent/$SLUG/ops" \
     -H "Content-Type: application/json" \
     -H "x-share-token: $TOKEN" \
     -H "X-Agent-Id: $AGENT_ID" \
-    -H "Idempotency-Key: $(uuidgen)" \
-    -d "$(jq -n --arg base "$BASE" --argjson payload "$PAYLOAD" '$payload + {baseToken: $base}')"
+    -H "Idempotency-Key: $IDEM_KEY" \
+    -d "$BODY")
+  CODE=$(printf '%s' "$RESP" | jq -r '.code // .error // empty')
+  # Pre-commit token-related errors — safe to auto-retry with the same
+  # payload and a fresh baseToken. Anchor errors (ANCHOR_NOT_FOUND,
+  # ANCHOR_AMBIGUOUS) are also pre-commit but require a tighter quote,
+  # so they are surfaced instead of auto-retried.
+  if [ "$CODE" = "STALE_BASE" ] \
+    || [ "$CODE" = "BASE_TOKEN_REQUIRED" ] \
+    || [ "$CODE" = "MISSING_BASE" ] \
+    || [ "$CODE" = "INVALID_BASE_TOKEN" ]; then
+    BASE=$(curl -s "https://www.proofeditor.ai/api/agent/$SLUG/state" \
+      -H "x-share-token: $TOKEN" | jq -r '.mutationBase.token')
+    BODY=$(jq -n --arg base "$BASE" --argjson payload "$PAYLOAD" '$payload + {baseToken: $base}')
+    RESP=$(curl -s -X POST "https://www.proofeditor.ai/api/agent/$SLUG/ops" \
+      -H "Content-Type: application/json" \
+      -H "x-share-token: $TOKEN" \
+      -H "X-Agent-Id: $AGENT_ID" \
+      -H "Idempotency-Key: $IDEM_KEY" \
+      -d "$BODY")
+  fi
+  printf '%s' "$RESP"
 }
 ```
 
-Every mutation sends a fresh `Idempotency-Key` so retries on network hiccups do not double-apply the op. This is required when `/state.contract.idempotencyRequired` is true and harmless otherwise.
+The `Idempotency-Key` is minted fresh at the top of each call (one key per logical write). The retry inside the same call reuses that key so the server can collapse a duplicate TCP-level send, but a later `mutate()` for a different payload gets its own key. Minting outside the function would make every call share one key, and the server would treat the second and subsequent writes as duplicates of the first and silently drop them.
 
-On `STALE_BASE` in the response, re-run — the state read picks up the fresh token automatically.
+**Ambiguous failures (anything outside the pre-commit set above — `COLLAB_SYNC_FAILED`, `INTERNAL_ERROR`, 5xx, network timeout, 202 with `collab.status: "pending"`):** do not retry from this helper. Re-read `/state` in the caller, diff the marks/content against the intended change, and only re-issue the write if the diff proves nothing landed. Pattern:
+
+```bash
+# After an ambiguous failure on comment.add with quote "X" and text "Y":
+STATE=$(curl -s "https://www.proofeditor.ai/api/agent/$SLUG/state" \
+  -H "x-share-token: $TOKEN")
+LANDED=$(printf '%s' "$STATE" | jq --arg q "X" --arg t "Y" \
+  '[.marks[]? | select(.by == "ai:compound-engineering" and .quote == $q and (.thread[0].text // .text) == $t)] | length')
+if [ "$LANDED" -gt 0 ]; then
+  echo "Already applied — skipping retry."
+else
+  # Safe to retry with a fresh BASE.
+  ...
+fi
+```
+
+Match on a payload-identifying field (quote + text for `comment.add`; quote + content for `suggestion.add`; markId + text for `comment.reply`; block ordinal + markdown for `/edit/v2`). When no such invariant is available, prefer leaving the write undone and surfacing it in the terminal report over risking a duplicate.
 
 ### jq gotcha when inspecting responses