claude-engineering-plugin/plugins/compound-engineering/skills/dspy-python/SKILL.md

---
name: dspy-python
description: This skill should be used when working with DSPy, the Python framework for programming language models instead of prompting them. Use this when implementing LLM-powered features, creating DSPy signatures and modules, configuring language model providers (OpenAI, Anthropic, Gemini, Ollama), building agent systems with tools, optimizing prompts with teleprompters, integrating with FastAPI endpoints, or testing DSPy modules with pytest.
---

# DSPy Expert (Python)

## Overview

DSPy is a Python framework that enables developers to **program language models, not prompt them**. Instead of manually crafting prompts, define application requirements through composable, optimizable modules that can be tested, improved, and version-controlled like regular code.

This skill provides comprehensive guidance on:
- Creating signatures for LLM operations
- Building composable modules and workflows
- Configuring multiple LLM providers
- Implementing agents with tools (ReAct)
- Testing with pytest
- Optimizing with teleprompters (MIPROv2, BootstrapFewShot)
- Integrating with FastAPI for production APIs
- Production deployment patterns

## Core Capabilities

### 1. Signatures

Create input/output specifications for LLM operations using inline or class-based signatures.

**When to use**: Defining any LLM task, from simple classification to complex analysis.

**Quick reference**:
```python
import dspy

# Inline signature (simple tasks)
classify = dspy.Predict("email: str -> category: str, priority: str")

# Class-based signature (complex tasks with documentation)
class EmailClassification(dspy.Signature):
    """Classify customer support emails into categories."""

    email_subject: str = dspy.InputField(desc="Subject line of the email")
    email_body: str = dspy.InputField(desc="Full body content of the email")
    category: str = dspy.OutputField(desc="One of: Technical, Billing, General")
    priority: str = dspy.OutputField(desc="One of: Low, Medium, High")
```

**Templates**: See [signature-template.py](./assets/signature-template.py) for comprehensive examples including:
- Inline signatures for quick tasks
- Class-based signatures with type hints
- Signatures with Pydantic model outputs
- Multi-field complex signatures

**Best practices**:
- Always provide clear docstrings for class-based signatures
- Use `desc` parameter for field documentation
- Prefer specific descriptions over generic ones
- Use Pydantic models for structured complex outputs

**Full documentation**: See [core-concepts.md](./references/core-concepts.md) sections on Signatures and Type Safety.

### 2. Modules

Build reusable, composable modules that encapsulate LLM operations.

**When to use**: Implementing any LLM-powered feature, especially complex multi-step workflows.

**Quick reference**:
```python
import dspy

class EmailProcessor(dspy.Module):
    def __init__(self):
        super().__init__()
        self.classifier = dspy.ChainOfThought(EmailClassification)

    def forward(self, email_subject: str, email_body: str) -> dspy.Prediction:
        return self.classifier(
            email_subject=email_subject,
            email_body=email_body
        )
```

**Templates**: See [module-template.py](./assets/module-template.py) for comprehensive examples including:
- Basic modules with single predictors
- Multi-step pipelines that chain modules
- Modules with conditional logic
- Error handling and retry patterns
- Async modules for FastAPI
- Caching implementations

**Module composition**: Chain modules together to create complex workflows:
```python
class Pipeline(dspy.Module):
    def __init__(self):
        super().__init__()
        self.step1 = Classifier()
        self.step2 = Analyzer()
        self.step3 = Responder()

    def forward(self, input_text):
        result1 = self.step1(text=input_text)
        result2 = self.step2(classification=result1.category)
        return self.step3(analysis=result2.analysis)
```

**Full documentation**: See [core-concepts.md](./references/core-concepts.md) sections on Modules and Module Composition.

### 3. Predictor Types

Choose the right predictor for your task:

**Predict**: Basic LLM inference
```python
predictor = dspy.Predict(TaskSignature)
result = predictor(input="data")
```

**ChainOfThought**: Adds automatic step-by-step reasoning
```python
predictor = dspy.ChainOfThought(TaskSignature)
result = predictor(input="data")
# result.reasoning contains the thought process
```

**ReAct**: Tool-using agents with iterative reasoning
```python
predictor = dspy.ReAct(
    TaskSignature,
    tools=[search_tool, calculator_tool],
    max_iters=5
)
```

**ProgramOfThought**: Generates and executes Python code
```python
predictor = dspy.ProgramOfThought(TaskSignature)
result = predictor(task="Calculate factorial of 10")
```

**When to use each**:
- **Predict**: Simple tasks, classification, extraction
- **ChainOfThought**: Complex reasoning, analysis, multi-step thinking
- **ReAct**: Tasks requiring external tools (search, calculation, API calls)
- **ProgramOfThought**: Tasks best solved with generated code

**Full documentation**: See [core-concepts.md](./references/core-concepts.md) section on Predictors.

### 4. LLM Provider Configuration

Support for OpenAI, Anthropic Claude, Google, Ollama, and many more via LiteLLM.

**Quick configuration examples**:
```python
import dspy

# OpenAI
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'])
dspy.configure(lm=lm)

# Anthropic Claude
lm = dspy.LM('anthropic/claude-3-5-sonnet-20241022', api_key=os.environ['ANTHROPIC_API_KEY'])
dspy.configure(lm=lm)

# Google Gemini
lm = dspy.LM('google/gemini-1.5-pro', api_key=os.environ['GOOGLE_API_KEY'])
dspy.configure(lm=lm)

# Local Ollama (free, private)
lm = dspy.LM('ollama_chat/llama3.1', api_base='http://localhost:11434')
dspy.configure(lm=lm)
```

**Templates**: See [config-template.py](./assets/config-template.py) for comprehensive examples including:
- Environment-based configuration
- Multi-model setups for different tasks
- Async LM configuration
- Retry logic and fallback strategies
- Caching with dspy.cache

**Provider compatibility matrix**:

| Feature | OpenAI | Anthropic | Google | Ollama |
|---------|--------|-----------|--------|--------|
| Structured Output | Full | Full | Full | Partial |
| Vision (Images) | Full | Full | Full | Limited |
| Tool Calling | Full | Full | Full | Varies |
| Streaming | Full | Full | Full | Full |

**Cost optimization strategy**:
- Development: Ollama (free) or gpt-4o-mini (cheap)
- Testing: gpt-4o-mini with temperature=0.0
- Production simple tasks: gpt-4o-mini, claude-3-haiku, gemini-1.5-flash
- Production complex tasks: gpt-4o, claude-3-5-sonnet, gemini-1.5-pro

**Full documentation**: See [providers.md](./references/providers.md) for all configuration options.

### 5. FastAPI Integration

Serve DSPy modules as production API endpoints.

**Quick reference**:
```python
from fastapi import FastAPI
from pydantic import BaseModel
import dspy

app = FastAPI()

# Initialize DSPy
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)

# Load optimized module
classifier = EmailProcessor()

class EmailRequest(BaseModel):
    subject: str
    body: str

class EmailResponse(BaseModel):
    category: str
    priority: str

@app.post("/classify", response_model=EmailResponse)
async def classify_email(request: EmailRequest):
    result = classifier(
        email_subject=request.subject,
        email_body=request.body
    )
    return EmailResponse(
        category=result.category,
        priority=result.priority
    )
```

**Production patterns**:
- Load optimized modules at startup
- Use Pydantic models for request/response validation
- Implement proper error handling
- Add observability with OpenTelemetry
- Use async where possible

**Full documentation**: See [fastapi-integration.md](./references/fastapi-integration.md) for complete patterns.

### 6. Testing DSPy Modules

Write standard pytest tests for LLM logic.

**Quick reference**:
```python
import pytest
import dspy

@pytest.fixture(scope="module")
def configure_dspy():
    lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'])
    dspy.configure(lm=lm)

def test_email_classifier(configure_dspy):
    classifier = EmailProcessor()
    result = classifier(
        email_subject="Can't log in",
        email_body="Unable to access account"
    )

    assert result.category in ['Technical', 'Billing', 'General']
    assert result.priority in ['High', 'Medium', 'Low']

def test_technical_email_classification(configure_dspy):
    classifier = EmailProcessor()
    result = classifier(
        email_subject="Error 500 on checkout",
        email_body="Getting server error when trying to complete purchase"
    )

    assert result.category == 'Technical'
```

**Testing patterns**:
- Use pytest fixtures for DSPy configuration
- Test type correctness of outputs
- Test edge cases (empty inputs, special characters, long texts)
- Use VCR/responses for deterministic API testing
- Integration test complete workflows

**Full documentation**: See [optimization.md](./references/optimization.md) section on Testing.

### 7. Optimization with Teleprompters

Automatically improve prompts and modules using optimization techniques.

**MIPROv2 optimization**:
```python
import dspy
from dspy.teleprompt import MIPROv2

# Define evaluation metric
def accuracy_metric(example, pred, trace=None):
    return example.category == pred.category

# Prepare training data
trainset = [
    dspy.Example(
        email_subject="Can't log in",
        email_body="Password reset not working",
        category="Technical"
    ).with_inputs("email_subject", "email_body"),
    # More examples...
]

# Run optimization
optimizer = MIPROv2(
    metric=accuracy_metric,
    num_candidates=10,
    init_temperature=0.7
)

optimized_module = optimizer.compile(
    EmailProcessor(),
    trainset=trainset,
    max_bootstrapped_demos=3,
    max_labeled_demos=5
)

# Save optimized module
optimized_module.save("optimized_classifier.json")
```

**BootstrapFewShot** (simpler, faster):
```python
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(
    metric=accuracy_metric,
    max_bootstrapped_demos=4
)

optimized = optimizer.compile(
    EmailProcessor(),
    trainset=trainset
)
```

**Full documentation**: See [optimization.md](./references/optimization.md) section on Teleprompters.

### 8. Caching and Performance

Optimize performance with built-in caching.

**Enable caching**:
```python
import dspy

# Enable global caching
dspy.configure(
    lm=lm,
    cache=True  # Uses SQLite by default
)

# Or with custom cache directory
dspy.configure(
    lm=lm,
    cache_dir="/path/to/cache"
)
```

**Cache control**:
```python
# Clear cache
dspy.cache.clear()

# Disable cache for specific call
with dspy.settings.context(cache=False):
    result = module(input="data")
```

**Full documentation**: See [optimization.md](./references/optimization.md) section on Caching.

## Quick Start Workflow

### For New Projects

1. **Install DSPy**:
```bash
pip install dspy-ai
```

2. **Configure LLM provider** (see [config-template.py](./assets/config-template.py)):
```python
import dspy
import os

lm = dspy.LM('openai/gpt-4o-mini', api_key=os.environ['OPENAI_API_KEY'])
dspy.configure(lm=lm)
```

3. **Create a signature** (see [signature-template.py](./assets/signature-template.py)):
```python
class MySignature(dspy.Signature):
    """Clear description of task."""

    input_field: str = dspy.InputField(desc="Description")
    output_field: str = dspy.OutputField(desc="Description")
```

4. **Create a module** (see [module-template.py](./assets/module-template.py)):
```python
class MyModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.predictor = dspy.Predict(MySignature)

    def forward(self, input_field: str):
        return self.predictor(input_field=input_field)
```

5. **Use the module**:
```python
module = MyModule()
result = module(input_field="test")
print(result.output_field)
```

6. **Add tests** (see [optimization.md](./references/optimization.md)):
```python
def test_my_module():
    result = MyModule()(input_field="test")
    assert isinstance(result.output_field, str)
```

### For FastAPI Applications

1. **Install dependencies**:
```bash
pip install dspy-ai fastapi uvicorn pydantic
```

2. **Create app structure**:
```
my_app/
├── app/
│   ├── __init__.py
│   ├── main.py           # FastAPI app
│   ├── dspy_modules/     # DSPy modules
│   │   ├── __init__.py
│   │   └── classifier.py
│   ├── models/           # Pydantic models
│   │   └── __init__.py
│   └── config.py         # DSPy configuration
├── tests/
│   └── test_classifier.py
└── requirements.txt
```

3. **Configure DSPy** in `config.py`:
```python
import dspy
import os

def configure_dspy():
    lm = dspy.LM(
        'openai/gpt-4o-mini',
        api_key=os.environ['OPENAI_API_KEY']
    )
    dspy.configure(lm=lm, cache=True)
```

4. **Create FastAPI app** in `main.py`:
```python
from fastapi import FastAPI
from contextlib import asynccontextmanager
from app.config import configure_dspy
from app.dspy_modules.classifier import EmailProcessor

@asynccontextmanager
async def lifespan(app: FastAPI):
    configure_dspy()
    yield

app = FastAPI(lifespan=lifespan)
classifier = EmailProcessor()

@app.post("/classify")
async def classify(request: EmailRequest):
    result = classifier(
        email_subject=request.subject,
        email_body=request.body
    )
    return {"category": result.category, "priority": result.priority}
```

## Common Patterns

### Pattern: Multi-Step Analysis Pipeline

```python
class AnalysisPipeline(dspy.Module):
    def __init__(self):
        super().__init__()
        self.extract = dspy.Predict(ExtractSignature)
        self.analyze = dspy.ChainOfThought(AnalyzeSignature)
        self.summarize = dspy.Predict(SummarizeSignature)

    def forward(self, text: str):
        extracted = self.extract(text=text)
        analyzed = self.analyze(data=extracted.data)
        return self.summarize(analysis=analyzed.result)
```

### Pattern: Agent with Tools

```python
import dspy

def search_web(query: str) -> str:
    """Search the web for information."""
    # Implementation here
    return f"Results for: {query}"

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))

class ResearchAgent(dspy.Module):
    def __init__(self):
        super().__init__()
        self.agent = dspy.ReAct(
            ResearchSignature,
            tools=[search_web, calculate],
            max_iters=10
        )

    def forward(self, question: str):
        return self.agent(question=question)
```

### Pattern: Conditional Routing

```python
class SmartRouter(dspy.Module):
    def __init__(self):
        super().__init__()
        self.classifier = dspy.Predict(ClassifyComplexity)
        self.simple_handler = SimpleModule()
        self.complex_handler = ComplexModule()

    def forward(self, input_text: str):
        classification = self.classifier(text=input_text)

        if classification.complexity == "Simple":
            return self.simple_handler(input=input_text)
        else:
            return self.complex_handler(input=input_text)
```

### Pattern: Retry with Fallback

```python
import dspy
from tenacity import retry, stop_after_attempt, wait_exponential

class RobustModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.predictor = dspy.Predict(TaskSignature)

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10)
    )
    def forward(self, input_text: str):
        result = self.predictor(input=input_text)
        self._validate(result)
        return result

    def _validate(self, result):
        if not result.output:
            raise ValueError("Empty output from LLM")
```

### Pattern: Pydantic Output Models

```python
from pydantic import BaseModel, Field
import dspy

class ClassificationResult(BaseModel):
    category: str = Field(description="Category: Technical, Billing, or General")
    priority: str = Field(description="Priority: Low, Medium, or High")
    confidence: float = Field(ge=0.0, le=1.0, description="Confidence score")

class TypedClassifier(dspy.Signature):
    """Classify with structured output."""

    text: str = dspy.InputField()
    result: ClassificationResult = dspy.OutputField()
```

## Resources

This skill includes comprehensive reference materials and templates:

### References (load as needed for detailed information)

- [core-concepts.md](./references/core-concepts.md): Complete guide to signatures, modules, predictors, and best practices
- [providers.md](./references/providers.md): All LLM provider configurations, compatibility matrix, and troubleshooting
- [optimization.md](./references/optimization.md): Testing patterns, teleprompters, caching, and monitoring
- [fastapi-integration.md](./references/fastapi-integration.md): Production patterns for serving DSPy with FastAPI

### Assets (templates for quick starts)

- [signature-template.py](./assets/signature-template.py): Examples of signatures including inline, class-based, and Pydantic outputs
- [module-template.py](./assets/module-template.py): Module patterns including pipelines, agents, async, and caching
- [config-template.py](./assets/config-template.py): Configuration examples for all providers and environments

## When to Use This Skill

Trigger this skill when:
- Implementing LLM-powered features in Python applications
- Creating programmatic interfaces for AI operations
- Building agent systems with tool usage
- Setting up or troubleshooting LLM providers with DSPy
- Optimizing prompts using teleprompters
- Testing LLM functionality with pytest
- Integrating DSPy with FastAPI
- Converting from manual prompt engineering to programmatic approach
- Debugging DSPy code or configuration issues