Files
claude-engineering-plugin/docs/solutions/best-practices/prefer-python-over-bash-for-pipeline-scripts-2026-04-09.md
2026-04-09 21:29:51 -07:00

4.6 KiB

title, date, category, module, problem_type, component, severity, applies_when, tags
title date category module problem_type component severity applies_when tags
Prefer Python over bash for multi-step pipeline scripts 2026-04-09 best-practices skill scripting / ce-demo-reel best_practice tooling medium
Script orchestrates 2+ external CLI tools (ffmpeg, curl, silicon, vhs)
Script needs retry logic or graceful degradation on tool failure
Script will run on macOS where bash 3.2 is the default
Script needs to be tested from a non-shell test runner (Bun, Jest, pytest)
Script has conditional failure paths where some errors should be caught and others should abort
bash-vs-python
pipeline-scripts
skill-scripting
set-e-footguns
error-handling
ce-demo-reel

Prefer Python over bash for multi-step pipeline scripts

Context

When building the ce-demo-reel skill, the initial implementation used a bash script (capture-evidence.sh) to orchestrate ffmpeg stitching, frame normalization, and catbox.moe upload. Over 4 review rounds, the script hit 4 distinct bug classes that are inherent to bash's execution model rather than simple coding mistakes.

Guidance

Use Python for agent pipeline scripts that chain multiple CLI tools with error handling. Bash set -euo pipefail works for simple sequential scripts but becomes a footgun when you need controlled failure paths.

Python subprocess model (explicit error handling):

result = subprocess.run(
    ["curl", "-s", "-F", f"fileToUpload=@{file_path}", url],
    capture_output=True, text=True, timeout=30, check=False
)
if result.returncode != 0:
    # Retry logic runs normally
    attempts += 1
    continue

Python timeout handling (explicit catch):

try:
    result = subprocess.run(cmd, timeout=60)
except subprocess.TimeoutExpired:
    # Controlled failure, not a crash
    return subprocess.CompletedProcess(cmd, returncode=1, stdout="", stderr="Timed out")

Bash equivalent (the footgun):

set -euo pipefail

# Exits the entire script before retry logic runs
url=$(curl -s -F "fileToUpload=@${file}" "$endpoint")
# Never reaches here on curl failure

# Workaround: || true on every line that might fail
url=$(curl -s -F "fileToUpload=@${file}" "$endpoint") || true
# Works but fragile and easy to forget

Why This Matters

Agent pipeline scripts run in environments the skill author does not control: different macOS versions (bash 3.2 vs 5.x), CI containers, worktrees. Each bash portability issue requires a non-obvious workaround that reviewers must catch. Python's subprocess model makes error handling explicit and testable rather than implicit and version-dependent.

The 4 bugs found were not unusual. They are the predictable consequence of using bash for scripts that exceed its sweet spot.

When to Apply

Use Python when:

  • The script orchestrates 2+ external CLI tools
  • The script needs retry logic or graceful degradation on tool failure
  • The script will run on macOS where bash 3.2 is the default
  • The script needs to be tested from a non-shell test runner
  • The script has more than ~3 subcommands

Bash is still the right choice when:

  • Simple sequential scripts with no error recovery (set -e is fine)
  • One-liner wrappers around a single tool
  • Scripts using only POSIX features with no array manipulation
  • Git hooks and CI steps where the only failure mode is "abort the pipeline"

Examples

Before (bash, 4 bugs across 4 review rounds):

Bug Cause Workaround needed
url=$(curl ...) exits on network failure set -e + command substitution || true on every line
${array[-1]} fails Bash 3.2 lacks negative indexing ${array[${#array[@]}-1]}
Frame reduction keeps all frames for n=3,4 Integer math: step=(n-1)/2 with min 1 Minimum step of 2
command -v ffmpeg in Bun tests command is a shell builtin, not spawnable Use which instead

After (Python, all 4 bug classes eliminated):

# Negative indexing just works
last = frames[-1]

# Timeout handling is explicit
try:
    result = subprocess.run(cmd, timeout=30)
except subprocess.TimeoutExpired:
    return None

# Tool detection is a regular function
if not shutil.which("ffmpeg"):
    sys.exit("ffmpeg not found")

# Math is straightforward
step = max(2, (len(frames) - 1) // 2)
  • docs/solutions/skill-design/script-first-skill-architecture.md: covers when to use scripts vs agent logic (complementary: that doc answers "should a script do this?", this doc answers "which language?")
  • docs/solutions/agent-friendly-cli-principles.md: CLI design from the consumer side (overlaps on exit code and stderr patterns)