diff --git a/docs/plans/2026-02-09-refactor-dspy-ruby-skill-update-plan.md b/docs/plans/2026-02-09-refactor-dspy-ruby-skill-update-plan.md new file mode 100644 index 0000000..59192ee --- /dev/null +++ b/docs/plans/2026-02-09-refactor-dspy-ruby-skill-update-plan.md @@ -0,0 +1,104 @@ +--- +title: "refactor: Update dspy-ruby skill to DSPy.rb v0.34.3 API" +type: refactor +date: 2026-02-09 +--- + +# Update dspy-ruby Skill to DSPy.rb v0.34.3 API + +## Problem + +The `dspy-ruby` skill uses outdated API patterns (`.forward()`, `result[:field]`, inline `T.enum([...])`, `DSPy::Tool`) and is missing 10+ features (events, lifecycle callbacks, GEPA, evaluation framework, BAML/TOON, storage, etc.). + +## Solution + +Use the engineering skill as base (already has correct API), enhance with official docs content, rewrite all reference files and templates. + +### Source Priority (when conflicts arise) + +1. **Official docs** (`../dspy.rb/docs/src/`) — source of truth for API correctness +2. **Engineering skill** (`../engineering/.../dspy-rb/SKILL.md`) — source of truth for structure/style +3. **NavigationContext brainstorm** — for Typed Context pattern only + +## Files to Update + +### Core (SKILL.md) + +1. **`skills/dspy-ruby/SKILL.md`** — Copy from engineering base, then: + - Fix frontmatter: `name: dspy-rb` → `name: dspy-ruby`, keep long description format + - Add sections before "Guidelines for Claude": Events System, Lifecycle Callbacks, Fiber-Local LM Context, Evaluation Framework, GEPA Optimization, Typed Context Pattern, Schema Formats (BAML/TOON) + - Update Resources section with 5 references + 3 assets using markdown links + - Fix any backtick references to markdown link format + +### References (rewrite from themed doc batches) + +2. **`references/core-concepts.md`** — Rewrite + - Source: `core-concepts/signatures.md`, `modules.md`, `predictors.md`, `advanced/complex-types.md` + - Cover: signatures (Date/Time types, T::Enum, defaults, field descriptions, BAML/TOON, recursive types), modules (.call() API, lifecycle callbacks, instruction update contract), predictors (all 4 types, concurrent predictions), type system (discriminators, union types) + +3. **`references/toolsets.md`** — NEW + - Source: `core-concepts/toolsets.md`, `toolsets-guide.md` + - Cover: Tools::Base, Tools::Toolset DSL, type safety with Sorbet sigs, schema generation, built-in toolsets, testing + +4. **`references/providers.md`** — Rewrite + - Source: `llms.txt.erb`, engineering SKILL.md, `core-concepts/module-runtime-context.md` + - Cover: per-provider adapters, RubyLLM unified adapter, Rails initializer, fiber-local LM context (`DSPy.with_lm`), feature-flagged model selection, compatibility matrix + +5. **`references/optimization.md`** — Rewrite + - Source: `optimization/miprov2.md`, `gepa.md`, `evaluation.md`, `production/storage.md` + - Cover: MIPROv2 (dspy-miprov2 gem, AutoMode presets), GEPA (dspy-gepa gem, feedback maps), Evaluation (DSPy::Evals, built-in metrics, DSPy::Example), Storage (ProgramStorage) + +6. **`references/observability.md`** — NEW + - Source: `production/observability.md`, `core-concepts/events.md`, `advanced/observability-interception.md` + - Cover: event system (module-scoped + global), dspy-o11y gems, Langfuse (env vars), score reporting (DSPy.score()), observation types, DSPy::Context.with_span + +### Assets (rewrite to current API) + +7. **`assets/signature-template.rb`** — T::Enum classes, `description:` kwarg, Date/Time types, defaults, union types, `.call()` / `result.field` usage examples + +8. **`assets/module-template.rb`** — `.call()` API, `result.field`, Tools::Base, lifecycle callbacks, `DSPy.with_lm`, `configure_predictor` + +9. **`assets/config-template.rb`** — RubyLLM adapter, `structured_outputs: true`, `after_initialize` Rails pattern, dspy-o11y env vars, feature-flagged model selection + +### Metadata + +10. **`.claude-plugin/plugin.json`** — Version `2.31.0` → `2.31.1` + +11. **`CHANGELOG.md`** — Add `[2.31.1] - 2026-02-09` entry under `### Changed` + +## Verification + +```bash +# No old API patterns +grep -n '\.forward(\|result\[:\|T\.enum(\[\|DSPy::Tool[^s]' plugins/compound-engineering/skills/dspy-ruby/SKILL.md + +# No backtick references +grep -E '`(references|assets|scripts)/' plugins/compound-engineering/skills/dspy-ruby/SKILL.md + +# Frontmatter correct +head -4 plugins/compound-engineering/skills/dspy-ruby/SKILL.md + +# JSON valid +cat plugins/compound-engineering/.claude-plugin/plugin.json | jq . + +# All files exist +ls plugins/compound-engineering/skills/dspy-ruby/{references,assets}/ +``` + +## Success Criteria + +- [x] All API patterns updated (`.call()`, `result.field`, `T::Enum`, `Tools::Base`) +- [x] New features covered: events, callbacks, fiber-local LM, GEPA, evals, BAML/TOON, storage, score API, RubyLLM, typed context +- [x] 5 reference files present (core-concepts, toolsets, providers, optimization, observability) +- [x] 3 asset templates updated to current API +- [x] YAML frontmatter: `name: dspy-ruby`, description has "what" and "when" +- [x] All reference links use `[file.md](./references/file.md)` format +- [x] Writing style: imperative form, no "you should" +- [x] Version bumped to `2.31.1`, CHANGELOG updated +- [x] Verification commands all pass + +## Source Materials + +- Engineering skill: `/Users/vicente/Workspaces/vicente.services/engineering/plugins/engineering-skills/skills/dspy-rb/SKILL.md` +- Official docs: `/Users/vicente/Workspaces/vicente.services/dspy.rb/docs/src/` +- NavigationContext brainstorm: `/Users/vicente/Workspaces/vicente.services/observo/observo-server/docs/brainstorms/2026-02-09-typed-navigation-context-brainstorm.md` diff --git a/plugins/compound-engineering/.claude-plugin/plugin.json b/plugins/compound-engineering/.claude-plugin/plugin.json index f84b1a8..382bb8a 100644 --- a/plugins/compound-engineering/.claude-plugin/plugin.json +++ b/plugins/compound-engineering/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "compound-engineering", - "version": "2.31.0", + "version": "2.31.1", "description": "AI-powered development tools. 29 agents, 24 commands, 18 skills, 1 MCP server for code review, research, design, and workflow automation.", "author": { "name": "Kieran Klaassen", diff --git a/plugins/compound-engineering/CHANGELOG.md b/plugins/compound-engineering/CHANGELOG.md index ec00291..d9390a9 100644 --- a/plugins/compound-engineering/CHANGELOG.md +++ b/plugins/compound-engineering/CHANGELOG.md @@ -5,6 +5,12 @@ All notable changes to the compound-engineering plugin will be documented in thi The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [2.31.1] - 2026-02-09 + +### Changed + +- **`dspy-ruby` skill** — Complete rewrite to DSPy.rb v0.34.3 API: `.call()` / `result.field` patterns, `T::Enum` classes, `DSPy::Tools::Base` / `Toolset`. Added events system, lifecycle callbacks, fiber-local LM context, GEPA optimization, evaluation framework, typed context pattern, BAML/TOON schema formats, storage system, score reporting, RubyLLM adapter. 5 reference files (2 new: toolsets, observability), 3 asset templates rewritten. + ## [2.31.0] - 2026-02-08 ### Added diff --git a/plugins/compound-engineering/skills/dspy-ruby/SKILL.md b/plugins/compound-engineering/skills/dspy-ruby/SKILL.md index 359a642..577c72c 100644 --- a/plugins/compound-engineering/skills/dspy-ruby/SKILL.md +++ b/plugins/compound-engineering/skills/dspy-ruby/SKILL.md @@ -1,594 +1,737 @@ --- name: dspy-ruby -description: This skill should be used when working with DSPy.rb, a Ruby framework for building type-safe, composable LLM applications. Use this when implementing predictable AI features, creating LLM signatures and modules, configuring language model providers (OpenAI, Anthropic, Gemini, Ollama), building agent systems with tools, optimizing prompts, or testing LLM-powered functionality in Ruby applications. +description: Build type-safe LLM applications with DSPy.rb — Ruby's programmatic prompt framework with signatures, modules, agents, and optimization. Use when implementing predictable AI features, creating LLM signatures and modules, configuring language model providers, building agent systems with tools, optimizing prompts, or testing LLM-powered functionality in Ruby applications. --- -# DSPy.rb Expert +# DSPy.rb + +> Build LLM apps like you build software. Type-safe, modular, testable. + +DSPy.rb brings software engineering best practices to LLM development. Instead of tweaking prompts, define what you want with Ruby types and let DSPy handle the rest. ## Overview -DSPy.rb is a Ruby framework that enables developers to **program LLMs, not prompt them**. Instead of manually crafting prompts, define application requirements through type-safe, composable modules that can be tested, optimized, and version-controlled like regular code. +DSPy.rb is a Ruby framework for building language model applications with programmatic prompts. It provides: -This skill provides comprehensive guidance on: -- Creating type-safe signatures for LLM operations -- Building composable modules and workflows -- Configuring multiple LLM providers -- Implementing agents with tools -- Testing and optimizing LLM applications -- Production deployment patterns +- **Type-safe signatures** — Define inputs/outputs with Sorbet types +- **Modular components** — Compose and reuse LLM logic +- **Automatic optimization** — Use data to improve prompts, not guesswork +- **Production-ready** — Built-in observability, testing, and error handling -## Core Capabilities +## Core Concepts -### 1. Type-Safe Signatures +### 1. Signatures -Create input/output contracts for LLM operations with runtime type checking. +Define interfaces between your app and LLMs using Ruby types: -**When to use**: Defining any LLM task, from simple classification to complex analysis. - -**Quick reference**: ```ruby -class EmailClassificationSignature < DSPy::Signature - description "Classify customer support emails" +class EmailClassifier < DSPy::Signature + description "Classify customer support emails by category and priority" - input do - const :email_subject, String - const :email_body, String - end - - output do - const :category, T.enum(["Technical", "Billing", "General"]) - const :priority, T.enum(["Low", "Medium", "High"]) - end -end -``` - -**Templates**: See `assets/signature-template.rb` for comprehensive examples including: -- Basic signatures with multiple field types -- Vision signatures for multimodal tasks -- Sentiment analysis signatures -- Code generation signatures - -**Best practices**: -- Always provide clear, specific descriptions -- Use enums for constrained outputs -- Include field descriptions with `desc:` parameter -- Prefer specific types over generic String when possible - -**Full documentation**: See `references/core-concepts.md` sections on Signatures and Type Safety. - -### 2. Composable Modules - -Build reusable, chainable modules that encapsulate LLM operations. - -**When to use**: Implementing any LLM-powered feature, especially complex multi-step workflows. - -**Quick reference**: -```ruby -class EmailProcessor < DSPy::Module - def initialize - super - @classifier = DSPy::Predict.new(EmailClassificationSignature) - end - - def forward(email_subject:, email_body:) - @classifier.forward( - email_subject: email_subject, - email_body: email_body - ) - end -end -``` - -**Templates**: See `assets/module-template.rb` for comprehensive examples including: -- Basic modules with single predictors -- Multi-step pipelines that chain modules -- Modules with conditional logic -- Error handling and retry patterns -- Stateful modules with history -- Caching implementations - -**Module composition**: Chain modules together to create complex workflows: -```ruby -class Pipeline < DSPy::Module - def initialize - super - @step1 = Classifier.new - @step2 = Analyzer.new - @step3 = Responder.new - end - - def forward(input) - result1 = @step1.forward(input) - result2 = @step2.forward(result1) - @step3.forward(result2) - end -end -``` - -**Full documentation**: See `references/core-concepts.md` sections on Modules and Module Composition. - -### 3. Multiple Predictor Types - -Choose the right predictor for your task: - -**Predict**: Basic LLM inference with type-safe inputs/outputs -```ruby -predictor = DSPy::Predict.new(TaskSignature) -result = predictor.forward(input: "data") -``` - -**ChainOfThought**: Adds automatic reasoning for improved accuracy -```ruby -predictor = DSPy::ChainOfThought.new(TaskSignature) -result = predictor.forward(input: "data") -# Returns: { reasoning: "...", output: "..." } -``` - -**ReAct**: Tool-using agents with iterative reasoning -```ruby -predictor = DSPy::ReAct.new( - TaskSignature, - tools: [SearchTool.new, CalculatorTool.new], - max_iterations: 5 -) -``` - -**CodeAct**: Dynamic code generation (requires `dspy-code_act` gem) -```ruby -predictor = DSPy::CodeAct.new(TaskSignature) -result = predictor.forward(task: "Calculate factorial of 5") -``` - -**When to use each**: -- **Predict**: Simple tasks, classification, extraction -- **ChainOfThought**: Complex reasoning, analysis, multi-step thinking -- **ReAct**: Tasks requiring external tools (search, calculation, API calls) -- **CodeAct**: Tasks best solved with generated code - -**Full documentation**: See `references/core-concepts.md` section on Predictors. - -### 4. LLM Provider Configuration - -Support for OpenAI, Anthropic Claude, Google Gemini, Ollama, and OpenRouter. - -**Quick configuration examples**: -```ruby -# OpenAI -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) -end - -# Anthropic Claude -DSPy.configure do |c| - c.lm = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY']) -end - -# Google Gemini -DSPy.configure do |c| - c.lm = DSPy::LM.new('gemini/gemini-1.5-pro', - api_key: ENV['GOOGLE_API_KEY']) -end - -# Local Ollama (free, private) -DSPy.configure do |c| - c.lm = DSPy::LM.new('ollama/llama3.1') -end -``` - -**Templates**: See `assets/config-template.rb` for comprehensive examples including: -- Environment-based configuration -- Multi-model setups for different tasks -- Configuration with observability (OpenTelemetry, Langfuse) -- Retry logic and fallback strategies -- Budget tracking -- Rails initializer patterns - -**Provider compatibility matrix**: - -| Feature | OpenAI | Anthropic | Gemini | Ollama | -|---------|--------|-----------|--------|--------| -| Structured Output | ✅ | ✅ | ✅ | ✅ | -| Vision (Images) | ✅ | ✅ | ✅ | ⚠️ Limited | -| Image URLs | ✅ | ❌ | ❌ | ❌ | -| Tool Calling | ✅ | ✅ | ✅ | Varies | - -**Cost optimization strategy**: -- Development: Ollama (free) or gpt-4o-mini (cheap) -- Testing: gpt-4o-mini with temperature=0.0 -- Production simple tasks: gpt-4o-mini, claude-3-haiku, gemini-1.5-flash -- Production complex tasks: gpt-4o, claude-3-5-sonnet, gemini-1.5-pro - -**Full documentation**: See `references/providers.md` for all configuration options, provider-specific features, and troubleshooting. - -### 5. Multimodal & Vision Support - -Process images alongside text using the unified `DSPy::Image` interface. - -**Quick reference**: -```ruby -class VisionSignature < DSPy::Signature - description "Analyze image and answer questions" - - input do - const :image, DSPy::Image - const :question, String - end - - output do - const :answer, String - end -end - -predictor = DSPy::Predict.new(VisionSignature) -result = predictor.forward( - image: DSPy::Image.from_file("path/to/image.jpg"), - question: "What objects are visible?" -) -``` - -**Image loading methods**: -```ruby -# From file -DSPy::Image.from_file("path/to/image.jpg") - -# From URL (OpenAI only) -DSPy::Image.from_url("https://example.com/image.jpg") - -# From base64 -DSPy::Image.from_base64(base64_data, mime_type: "image/jpeg") -``` - -**Provider support**: -- OpenAI: Full support including URLs -- Anthropic, Gemini: Base64 or file loading only -- Ollama: Limited multimodal depending on model - -**Full documentation**: See `references/core-concepts.md` section on Multimodal Support. - -### 6. Testing LLM Applications - -Write standard RSpec tests for LLM logic. - -**Quick reference**: -```ruby -RSpec.describe EmailClassifier do - before do - DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) + class Priority < T::Enum + enums do + Low = new('low') + Medium = new('medium') + High = new('high') + Urgent = new('urgent') end end - it 'classifies technical emails correctly' do - classifier = EmailClassifier.new - result = classifier.forward( - email_subject: "Can't log in", - email_body: "Unable to access account" - ) - - expect(result[:category]).to eq('Technical') - expect(result[:priority]).to be_in(['High', 'Medium', 'Low']) - end -end -``` - -**Testing patterns**: -- Mock LLM responses for unit tests -- Use VCR for deterministic API testing -- Test type safety and validation -- Test edge cases (empty inputs, special characters, long texts) -- Integration test complete workflows - -**Full documentation**: See `references/optimization.md` section on Testing. - -### 7. Optimization & Improvement - -Automatically improve prompts and modules using optimization techniques. - -**MIPROv2 optimization**: -```ruby -require 'dspy/mipro' - -# Define evaluation metric -def accuracy_metric(example, prediction) - example[:expected_output][:category] == prediction[:category] ? 1.0 : 0.0 -end - -# Prepare training data -training_examples = [ - { - input: { email_subject: "...", email_body: "..." }, - expected_output: { category: 'Technical' } - }, - # More examples... -] - -# Run optimization -optimizer = DSPy::MIPROv2.new( - metric: method(:accuracy_metric), - num_candidates: 10 -) - -optimized_module = optimizer.compile( - EmailClassifier.new, - trainset: training_examples -) -``` - -**A/B testing different approaches**: -```ruby -# Test ChainOfThought vs ReAct -approach_a_score = evaluate_approach(ChainOfThoughtModule, test_set) -approach_b_score = evaluate_approach(ReActModule, test_set) -``` - -**Full documentation**: See `references/optimization.md` section on Optimization. - -### 8. Observability & Monitoring - -Track performance, token usage, and behavior in production. - -**OpenTelemetry integration**: -```ruby -require 'opentelemetry/sdk' - -OpenTelemetry::SDK.configure do |c| - c.service_name = 'my-dspy-app' - c.use_all -end - -# DSPy automatically creates traces -``` - -**Langfuse tracing**: -```ruby -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) - - c.langfuse = { - public_key: ENV['LANGFUSE_PUBLIC_KEY'], - secret_key: ENV['LANGFUSE_SECRET_KEY'] - } -end -``` - -**Custom monitoring**: -- Token tracking -- Performance monitoring -- Error rate tracking -- Custom logging - -**Full documentation**: See `references/optimization.md` section on Observability. - -## Quick Start Workflow - -### For New Projects - -1. **Install DSPy.rb and provider gems**: -```bash -gem install dspy dspy-openai # or dspy-anthropic, dspy-gemini -``` - -2. **Configure LLM provider** (see `assets/config-template.rb`): -```ruby -require 'dspy' - -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) -end -``` - -3. **Create a signature** (see `assets/signature-template.rb`): -```ruby -class MySignature < DSPy::Signature - description "Clear description of task" - input do - const :input_field, String, desc: "Description" + const :email_content, String + const :sender, String end output do - const :output_field, String, desc: "Description" + const :category, String + const :priority, Priority # Type-safe enum with defined values + const :confidence, Float end end ``` -4. **Create a module** (see `assets/module-template.rb`): +### 2. Modules + +Build complex workflows from simple building blocks: + +- **Predict** — Basic LLM calls with signatures +- **ChainOfThought** — Step-by-step reasoning +- **ReAct** — Tool-using agents +- **CodeAct** — Dynamic code generation agents (install the `dspy-code_act` gem) + +### 3. Tools & Toolsets + +Create type-safe tools for agents with comprehensive Sorbet support: + ```ruby -class MyModule < DSPy::Module - def initialize - super - @predictor = DSPy::Predict.new(MySignature) +# Enum-based tool with automatic type conversion +class CalculatorTool < DSPy::Tools::Base + tool_name 'calculator' + tool_description 'Performs arithmetic operations with type-safe enum inputs' + + class Operation < T::Enum + enums do + Add = new('add') + Subtract = new('subtract') + Multiply = new('multiply') + Divide = new('divide') + end end - def forward(input_field:) - @predictor.forward(input_field: input_field) + sig { params(operation: Operation, num1: Float, num2: Float).returns(T.any(Float, String)) } + def call(operation:, num1:, num2:) + case operation + when Operation::Add then num1 + num2 + when Operation::Subtract then num1 - num2 + when Operation::Multiply then num1 * num2 + when Operation::Divide + return "Error: Division by zero" if num2 == 0 + num1 / num2 + end + end +end + +# Multi-tool toolset with rich types +class DataToolset < DSPy::Tools::Toolset + toolset_name "data_processing" + + class Format < T::Enum + enums do + JSON = new('json') + CSV = new('csv') + XML = new('xml') + end + end + + tool :convert, description: "Convert data between formats" + tool :validate, description: "Validate data structure" + + sig { params(data: String, from: Format, to: Format).returns(String) } + def convert(data:, from:, to:) + "Converted from #{from.serialize} to #{to.serialize}" + end + + sig { params(data: String, format: Format).returns(T::Hash[String, T.any(String, Integer, T::Boolean)]) } + def validate(data:, format:) + { valid: true, format: format.serialize, row_count: 42, message: "Data validation passed" } end end ``` -5. **Use the module**: -```ruby -module_instance = MyModule.new -result = module_instance.forward(input_field: "test") -puts result[:output_field] -``` +### 4. Type System & Discriminators -6. **Add tests** (see `references/optimization.md`): -```ruby -RSpec.describe MyModule do - it 'produces expected output' do - result = MyModule.new.forward(input_field: "test") - expect(result[:output_field]).to be_a(String) - end -end -``` +DSPy.rb uses sophisticated type discrimination for complex data structures: -### For Rails Applications +- **Automatic `_type` field injection** — DSPy adds discriminator fields to structs for type safety +- **Union type support** — `T.any()` types automatically disambiguated by `_type` +- **Reserved field name** — Avoid defining your own `_type` fields in structs +- **Recursive filtering** — `_type` fields filtered during deserialization at all nesting levels + +### 5. Optimization + +Improve accuracy with real data: + +- **MIPROv2** — Advanced multi-prompt optimization with bootstrap sampling and Bayesian optimization +- **GEPA** — Genetic-Pareto Reflective Prompt Evolution with feedback maps, experiment tracking, and telemetry +- **Evaluation** — Comprehensive framework with built-in and custom metrics, error handling, and batch processing + +## Quick Start -1. **Add to Gemfile**: ```ruby +# Install gem 'dspy' -gem 'dspy-openai' # or other provider -``` - -2. **Create initializer** at `config/initializers/dspy.rb` (see `assets/config-template.rb` for full example): -```ruby -require 'dspy' +# Configure DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) + c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) +end + +# Define a task +class SentimentAnalysis < DSPy::Signature + description "Analyze sentiment of text" + + input do + const :text, String + end + + output do + const :sentiment, String # positive, negative, neutral + const :score, Float # 0.0 to 1.0 + end +end + +# Use it +analyzer = DSPy::Predict.new(SentimentAnalysis) +result = analyzer.call(text: "This product is amazing!") +puts result.sentiment # => "positive" +puts result.score # => 0.92 +``` + +## Provider Adapter Gems + +Two strategies for connecting to LLM providers: + +### Per-provider adapters (direct SDK access) + +```ruby +# Gemfile +gem 'dspy' +gem 'dspy-openai' # OpenAI, OpenRouter, Ollama +gem 'dspy-anthropic' # Claude +gem 'dspy-gemini' # Gemini +``` + +Each adapter gem pulls in the official SDK (`openai`, `anthropic`, `gemini-ai`). + +### Unified adapter via RubyLLM (recommended for multi-provider) + +```ruby +# Gemfile +gem 'dspy' +gem 'dspy-ruby_llm' # Routes to any provider via ruby_llm +gem 'ruby_llm' +``` + +RubyLLM handles provider routing based on the model name. Use the `ruby_llm/` prefix: + +```ruby +DSPy.configure do |c| + c.lm = DSPy::LM.new('ruby_llm/gemini-2.5-flash', structured_outputs: true) + # c.lm = DSPy::LM.new('ruby_llm/claude-sonnet-4-20250514', structured_outputs: true) + # c.lm = DSPy::LM.new('ruby_llm/gpt-4o-mini', structured_outputs: true) end ``` -3. **Create modules in** `app/llm/` directory: -```ruby -# app/llm/email_classifier.rb -class EmailClassifier < DSPy::Module - # Implementation here -end -``` +## Events System + +DSPy.rb ships with a structured event bus for observing runtime behavior. + +### Module-Scoped Subscriptions (preferred for agents) -4. **Use in controllers/services**: ```ruby -class EmailsController < ApplicationController - def classify - classifier = EmailClassifier.new - result = classifier.forward( - email_subject: params[:subject], - email_body: params[:body] - ) - render json: result +class MyAgent < DSPy::Module + subscribe 'lm.tokens', :track_tokens, scope: :descendants + + def track_tokens(_event, attrs) + @total_tokens += attrs.fetch(:total_tokens, 0) end end ``` -## Common Patterns - -### Pattern: Multi-Step Analysis Pipeline +### Global Subscriptions (for observability/integrations) ```ruby -class AnalysisPipeline < DSPy::Module - def initialize - super - @extract = DSPy::Predict.new(ExtractSignature) - @analyze = DSPy::ChainOfThought.new(AnalyzeSignature) - @summarize = DSPy::Predict.new(SummarizeSignature) - end - - def forward(text:) - extracted = @extract.forward(text: text) - analyzed = @analyze.forward(data: extracted[:data]) - @summarize.forward(analysis: analyzed[:result]) - end +subscription_id = DSPy.events.subscribe('score.create') do |event, attrs| + Langfuse.export_score(attrs) end + +# Wildcards supported +DSPy.events.subscribe('llm.*') { |name, attrs| puts "[#{name}] tokens=#{attrs[:total_tokens]}" } ``` -### Pattern: Agent with Tools +Event names use dot-separated namespaces (`llm.generate`, `react.iteration_complete`). Every event includes module metadata (`module_path`, `module_leaf`, `module_scope.ancestry_token`) for filtering. + +## Lifecycle Callbacks + +Rails-style lifecycle hooks ship with every `DSPy::Module`: + +- **`before`** — Runs ahead of `forward` for setup (metrics, context loading) +- **`around`** — Wraps `forward`, calls `yield`, and lets you pair setup/teardown logic +- **`after`** — Fires after `forward` returns for cleanup or persistence ```ruby -class ResearchAgent < DSPy::Module - def initialize - super - @agent = DSPy::ReAct.new( - ResearchSignature, - tools: [ - WebSearchTool.new, - DatabaseQueryTool.new, - SummarizerTool.new - ], - max_iterations: 10 - ) - end +class InstrumentedModule < DSPy::Module + before :setup_metrics + around :manage_context + after :log_metrics def forward(question:) - @agent.forward(question: question) + @predictor.call(question: question) + end + + private + + def setup_metrics + @start_time = Time.now + end + + def manage_context + load_context + result = yield + save_context + result + end + + def log_metrics + duration = Time.now - @start_time + Rails.logger.info "Prediction completed in #{duration}s" end end +``` -class WebSearchTool < DSPy::Tool +Execution order: before → around (before yield) → forward → around (after yield) → after. Callbacks are inherited from parent classes and execute in registration order. + +## Fiber-Local LM Context + +Override the language model temporarily using fiber-local storage: + +```ruby +fast_model = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY']) + +DSPy.with_lm(fast_model) do + result = classifier.call(text: "test") # Uses fast_model inside this block +end +# Back to global LM outside the block +``` + +**LM resolution hierarchy**: Instance-level LM → Fiber-local LM (`DSPy.with_lm`) → Global LM (`DSPy.configure`). + +Use `configure_predictor` for fine-grained control over agent internals: + +```ruby +agent = DSPy::ReAct.new(MySignature, tools: tools) +agent.configure { |c| c.lm = default_model } +agent.configure_predictor('thought_generator') { |c| c.lm = powerful_model } +``` + +## Evaluation Framework + +Systematically test LLM application performance with `DSPy::Evals`: + +```ruby +metric = DSPy::Metrics.exact_match(field: :answer, case_sensitive: false) +evaluator = DSPy::Evals.new(predictor, metric: metric) +result = evaluator.evaluate(test_examples, display_table: true) +puts "Pass Rate: #{(result.pass_rate * 100).round(1)}%" +``` + +Built-in metrics: `exact_match`, `contains`, `numeric_difference`, `composite_and`. Custom metrics return `true`/`false` or a `DSPy::Prediction` with `score:` and `feedback:` fields. + +Use `DSPy::Example` for typed test data and `export_scores: true` to push results to Langfuse. + +## GEPA Optimization + +GEPA (Genetic-Pareto Reflective Prompt Evolution) uses reflection-driven instruction rewrites: + +```ruby +gem 'dspy-gepa' + +teleprompter = DSPy::Teleprompt::GEPA.new( + metric: metric, + reflection_lm: DSPy::ReflectionLM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']), + feedback_map: feedback_map, + config: { max_metric_calls: 600, minibatch_size: 6 } +) + +result = teleprompter.compile(program, trainset: train, valset: val) +optimized_program = result.optimized_program +``` + +The metric must return `DSPy::Prediction.new(score:, feedback:)` so the reflection model can reason about failures. Use `feedback_map` to target individual predictors in composite modules. + +## Typed Context Pattern + +Replace opaque string context blobs with `T::Struct` inputs. Each field gets its own `description:` annotation in the JSON schema the LLM sees: + +```ruby +class NavigationContext < T::Struct + const :workflow_hint, T.nilable(String), + description: "Current workflow phase guidance for the agent" + const :action_log, T::Array[String], default: [], + description: "Compact one-line-per-action history of research steps taken" + const :iterations_remaining, Integer, + description: "Budget remaining. Each tool call costs 1 iteration." +end + +class ToolSelectionSignature < DSPy::Signature + input do + const :query, String + const :context, NavigationContext # Structured, not an opaque string + end + + output do + const :tool_name, String + const :tool_args, String, description: "JSON-encoded arguments" + end +end +``` + +Benefits: type safety at compile time, per-field descriptions in the LLM schema, easy to test as value objects, extensible by adding `const` declarations. + +## Schema Formats (BAML / TOON) + +Control how DSPy describes signature structure to the LLM: + +- **JSON Schema** (default) — Standard format, works with `structured_outputs: true` +- **BAML** (`schema_format: :baml`) — 84% token reduction for Enhanced Prompting mode. Requires `sorbet-baml` gem. +- **TOON** (`schema_format: :toon, data_format: :toon`) — Table-oriented format for both schemas and data. Enhanced Prompting mode only. + +BAML and TOON apply only when `structured_outputs: false`. With `structured_outputs: true`, the provider receives JSON Schema directly. + +## Storage System + +Persist and reload optimized programs with `DSPy::Storage::ProgramStorage`: + +```ruby +storage = DSPy::Storage::ProgramStorage.new(storage_path: "./dspy_storage") +storage.save_program(result.optimized_program, result, metadata: { optimizer: 'MIPROv2' }) +``` + +Supports checkpoint management, optimization history tracking, and import/export between environments. + +## Rails Integration + +### Directory Structure + +Organize DSPy components using Rails conventions: + +``` +app/ + entities/ # T::Struct types shared across signatures + signatures/ # DSPy::Signature definitions + tools/ # DSPy::Tools::Base implementations + concerns/ # Shared tool behaviors (error handling, etc.) + modules/ # DSPy::Module orchestrators + services/ # Plain Ruby services that compose DSPy modules +config/ + initializers/ + dspy.rb # DSPy + provider configuration + feature_flags.rb # Model selection per role +spec/ + signatures/ # Schema validation tests + tools/ # Tool unit tests + modules/ # Integration tests with VCR + vcr_cassettes/ # Recorded HTTP interactions +``` + +### Initializer + +```ruby +# config/initializers/dspy.rb +Rails.application.config.after_initialize do + next if Rails.env.test? && ENV["DSPY_ENABLE_IN_TEST"].blank? + + RubyLLM.configure do |config| + config.gemini_api_key = ENV["GEMINI_API_KEY"] if ENV["GEMINI_API_KEY"].present? + config.anthropic_api_key = ENV["ANTHROPIC_API_KEY"] if ENV["ANTHROPIC_API_KEY"].present? + config.openai_api_key = ENV["OPENAI_API_KEY"] if ENV["OPENAI_API_KEY"].present? + end + + model = ENV.fetch("DSPY_MODEL", "ruby_llm/gemini-2.5-flash") + DSPy.configure do |config| + config.lm = DSPy::LM.new(model, structured_outputs: true) + config.logger = Rails.logger + end + + # Langfuse observability (optional) + if ENV["LANGFUSE_PUBLIC_KEY"].present? && ENV["LANGFUSE_SECRET_KEY"].present? + DSPy::Observability.configure! + end +end +``` + +### Feature-Flagged Model Selection + +Use different models for different roles (fast/cheap for classification, powerful for synthesis): + +```ruby +# config/initializers/feature_flags.rb +module FeatureFlags + SELECTOR_MODEL = ENV.fetch("DSPY_SELECTOR_MODEL", "ruby_llm/gemini-2.5-flash-lite") + SYNTHESIZER_MODEL = ENV.fetch("DSPY_SYNTHESIZER_MODEL", "ruby_llm/gemini-2.5-flash") +end +``` + +Then override per-tool or per-predictor: + +```ruby +class ClassifyTool < DSPy::Tools::Base def call(query:) - results = perform_search(query) - { results: results } + predictor = DSPy::Predict.new(ClassifyQuery) + predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SELECTOR_MODEL, structured_outputs: true) } + predictor.call(query: query) end end ``` -### Pattern: Conditional Routing +## Schema-Driven Signatures + +**Prefer typed schemas over string descriptions.** Let the type system communicate structure to the LLM rather than prose in the signature description. + +### Entities as Shared Types + +Define reusable `T::Struct` and `T::Enum` types in `app/entities/` and reference them across signatures: ```ruby -class SmartRouter < DSPy::Module - def initialize - super - @classifier = DSPy::Predict.new(ClassifySignature) - @simple_handler = SimpleModule.new - @complex_handler = ComplexModule.new +# app/entities/search_strategy.rb +class SearchStrategy < T::Enum + enums do + SingleSearch = new("single_search") + DateDecomposition = new("date_decomposition") end +end - def forward(input:) - classification = @classifier.forward(text: input) +# app/entities/scored_item.rb +class ScoredItem < T::Struct + const :id, String + const :score, Float, description: "Relevance score 0.0-1.0" + const :verdict, String, description: "relevant, maybe, or irrelevant" + const :reason, String, default: "" +end +``` - if classification[:complexity] == 'Simple' - @simple_handler.forward(input: input) - else - @complex_handler.forward(input: input) - end +### Schema vs Description: When to Use Each + +**Use schemas (T::Struct/T::Enum)** for: +- Multi-field outputs with specific types +- Enums with defined values the LLM must pick from +- Nested structures, arrays of typed objects +- Outputs consumed by code (not displayed to users) + +**Use string descriptions** for: +- Simple single-field outputs where the type is `String` +- Natural language generation (summaries, answers) +- Fields where constraint guidance helps (e.g., `description: "YYYY-MM-DD format"`) + +**Rule of thumb**: If you'd write a `case` statement on the output, it should be a `T::Enum`. If you'd call `.each` on it, it should be `T::Array[SomeStruct]`. + +## Tool Patterns + +### Tools That Wrap Predictions + +A common pattern: tools encapsulate a DSPy prediction, adding error handling, model selection, and serialization: + +```ruby +class RerankTool < DSPy::Tools::Base + tool_name "rerank" + tool_description "Score and rank search results by relevance" + + MAX_ITEMS = 200 + MIN_ITEMS_FOR_LLM = 5 + + sig { params(query: String, items: T::Array[T::Hash[Symbol, T.untyped]]).returns(T::Hash[Symbol, T.untyped]) } + def call(query:, items: []) + return { scored_items: items, reranked: false } if items.size < MIN_ITEMS_FOR_LLM + + capped_items = items.first(MAX_ITEMS) + predictor = DSPy::Predict.new(RerankSignature) + predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SYNTHESIZER_MODEL, structured_outputs: true) } + + result = predictor.call(query: query, items: capped_items) + { scored_items: result.scored_items, reranked: true } + rescue => e + Rails.logger.warn "[RerankTool] LLM rerank failed: #{e.message}" + { error: "Rerank failed: #{e.message}", scored_items: items, reranked: false } end end ``` -### Pattern: Retry with Fallback +**Key patterns:** +- Short-circuit LLM calls when unnecessary (small data, trivial cases) +- Cap input size to prevent token overflow +- Per-tool model selection via `configure` +- Graceful error handling with fallback data + +### Error Handling Concern ```ruby -class RobustModule < DSPy::Module - MAX_RETRIES = 3 +module ErrorHandling + extend ActiveSupport::Concern - def forward(input, retry_count: 0) - begin - @predictor.forward(input) - rescue DSPy::ValidationError => e - if retry_count < MAX_RETRIES - sleep(2 ** retry_count) - forward(input, retry_count: retry_count + 1) - else - # Fallback to default or raise - raise - end - end + private + + def safe_predict(signature_class, **inputs) + predictor = DSPy::Predict.new(signature_class) + yield predictor if block_given? + predictor.call(**inputs) + rescue Faraday::Error, Net::HTTPError => e + Rails.logger.error "[#{self.class.name}] API error: #{e.message}" + nil + rescue JSON::ParserError => e + Rails.logger.error "[#{self.class.name}] Invalid LLM output: #{e.message}" + nil + end +end +``` + +## Observability + +### Tracing with DSPy::Context + +Wrap operations in spans for Langfuse/OpenTelemetry visibility: + +```ruby +result = DSPy::Context.with_span( + operation: "tool_selector.select", + "dspy.module" => "ToolSelector", + "tool_selector.tools" => tool_names.join(",") +) do + @predictor.call(query: query, context: context, available_tools: schemas) +end +``` + +### Setup for Langfuse + +```ruby +# Gemfile +gem 'dspy-o11y' +gem 'dspy-o11y-langfuse' + +# .env +LANGFUSE_PUBLIC_KEY=pk-... +LANGFUSE_SECRET_KEY=sk-... +DSPY_TELEMETRY_BATCH_SIZE=5 +``` + +Every `DSPy::Predict`, `DSPy::ReAct`, and tool call is automatically traced when observability is configured. + +### Score Reporting + +Report evaluation scores to Langfuse: + +```ruby +DSPy.score(name: "relevance", value: 0.85, trace_id: current_trace_id) +``` + +## Testing + +### VCR Setup for Rails + +```ruby +VCR.configure do |config| + config.cassette_library_dir = "spec/vcr_cassettes" + config.hook_into :webmock + config.configure_rspec_metadata! + config.filter_sensitive_data('') { ENV['GEMINI_API_KEY'] } + config.filter_sensitive_data('') { ENV['OPENAI_API_KEY'] } +end +``` + +### Signature Schema Tests + +Test that signatures produce valid schemas without calling any LLM: + +```ruby +RSpec.describe ClassifyResearchQuery do + it "has required input fields" do + schema = described_class.input_json_schema + expect(schema[:required]).to include("query") + end + + it "has typed output fields" do + schema = described_class.output_json_schema + expect(schema[:properties]).to have_key(:search_strategy) + end +end +``` + +### Tool Tests with Mocked Predictions + +```ruby +RSpec.describe RerankTool do + let(:tool) { described_class.new } + + it "skips LLM for small result sets" do + expect(DSPy::Predict).not_to receive(:new) + result = tool.call(query: "test", items: [{ id: "1" }]) + expect(result[:reranked]).to be false + end + + it "calls LLM for large result sets", :vcr do + items = 10.times.map { |i| { id: i.to_s, title: "Item #{i}" } } + result = tool.call(query: "relevant items", items: items) + expect(result[:reranked]).to be true end end ``` ## Resources -This skill includes comprehensive reference materials and templates: +- [core-concepts.md](./references/core-concepts.md) — Signatures, modules, predictors, type system deep-dive +- [toolsets.md](./references/toolsets.md) — Tools::Base, Tools::Toolset DSL, type safety, testing +- [providers.md](./references/providers.md) — Provider adapters, RubyLLM, fiber-local LM context, compatibility matrix +- [optimization.md](./references/optimization.md) — MIPROv2, GEPA, evaluation framework, storage system +- [observability.md](./references/observability.md) — Event system, dspy-o11y gems, Langfuse, score reporting +- [signature-template.rb](./assets/signature-template.rb) — Signature scaffold with T::Enum, Date/Time, defaults, union types +- [module-template.rb](./assets/module-template.rb) — Module scaffold with .call(), lifecycle callbacks, fiber-local LM +- [config-template.rb](./assets/config-template.rb) — Rails initializer with RubyLLM, observability, feature flags -### References (load as needed for detailed information) +## Key URLs -- [core-concepts.md](./references/core-concepts.md): Complete guide to signatures, modules, predictors, multimodal support, and best practices -- [providers.md](./references/providers.md): All LLM provider configurations, compatibility matrix, cost optimization, and troubleshooting -- [optimization.md](./references/optimization.md): Testing patterns, optimization techniques, observability setup, and monitoring +- Homepage: https://oss.vicente.services/dspy.rb/ +- GitHub: https://github.com/vicentereig/dspy.rb +- Documentation: https://oss.vicente.services/dspy.rb/getting-started/ -### Assets (templates for quick starts) +## Guidelines for Claude -- [signature-template.rb](./assets/signature-template.rb): Examples of signatures including basic, vision, sentiment analysis, and code generation -- [module-template.rb](./assets/module-template.rb): Module patterns including pipelines, agents, error handling, caching, and state management -- [config-template.rb](./assets/config-template.rb): Configuration examples for all providers, environments, observability, and production patterns +When helping users with DSPy.rb: -## When to Use This Skill +1. **Schema over prose** — Define output structure with `T::Struct` and `T::Enum` types, not string descriptions +2. **Entities in `app/entities/`** — Extract shared types so signatures stay thin +3. **Per-tool model selection** — Use `predictor.configure { |c| c.lm = ... }` to pick the right model per task +4. **Short-circuit LLM calls** — Skip the LLM for trivial cases (small data, cached results) +5. **Cap input sizes** — Prevent token overflow by limiting array sizes before sending to LLM +6. **Test schemas without LLM** — Validate `input_json_schema` and `output_json_schema` in unit tests +7. **VCR for integration tests** — Record real HTTP interactions, never mock LLM responses by hand +8. **Trace with spans** — Wrap tool calls in `DSPy::Context.with_span` for observability +9. **Graceful degradation** — Always rescue LLM errors and return fallback data -Trigger this skill when: -- Implementing LLM-powered features in Ruby applications -- Creating type-safe interfaces for AI operations -- Building agent systems with tool usage -- Setting up or troubleshooting LLM providers -- Optimizing prompts and improving accuracy -- Testing LLM functionality -- Adding observability to AI applications -- Converting from manual prompt engineering to programmatic approach -- Debugging DSPy.rb code or configuration issues +### Signature Best Practices + +**Keep description concise** — The signature `description` should state the goal, not the field details: + +```ruby +# Good — concise goal +class ParseOutline < DSPy::Signature + description 'Extract block-level structure from HTML as a flat list of skeleton sections.' + + input do + const :html, String, description: 'Raw HTML to parse' + end + + output do + const :sections, T::Array[Section], description: 'Block elements: headings, paragraphs, code blocks, lists' + end +end +``` + +**Use defaults over nilable arrays** — For OpenAI structured outputs compatibility: + +```ruby +# Good — works with OpenAI structured outputs +class ASTNode < T::Struct + const :children, T::Array[ASTNode], default: [] +end +``` + +### Recursive Types with `$defs` + +DSPy.rb supports recursive types in structured outputs using JSON Schema `$defs`: + +```ruby +class TreeNode < T::Struct + const :value, String + const :children, T::Array[TreeNode], default: [] # Self-reference +end +``` + +The schema generator automatically creates `#/$defs/TreeNode` references for recursive types, compatible with OpenAI and Gemini structured outputs. + +### Field Descriptions for T::Struct + +DSPy.rb extends T::Struct to support field-level `description:` kwargs that flow to JSON Schema: + +```ruby +class ASTNode < T::Struct + const :node_type, NodeType, description: 'The type of node (heading, paragraph, etc.)' + const :text, String, default: "", description: 'Text content of the node' + const :level, Integer, default: 0 # No description — field is self-explanatory + const :children, T::Array[ASTNode], default: [] +end +``` + +**When to use field descriptions**: complex field semantics, enum-like strings, constrained values, nested structs with ambiguous names. **When to skip**: self-explanatory fields like `name`, `id`, `url`, or boolean flags. + +## Version + +Current: 0.34.3 diff --git a/plugins/compound-engineering/skills/dspy-ruby/assets/config-template.rb b/plugins/compound-engineering/skills/dspy-ruby/assets/config-template.rb index 16a01d2..6c19633 100644 --- a/plugins/compound-engineering/skills/dspy-ruby/assets/config-template.rb +++ b/plugins/compound-engineering/skills/dspy-ruby/assets/config-template.rb @@ -1,359 +1,187 @@ # frozen_string_literal: true -# DSPy.rb Configuration Examples -# This file demonstrates various configuration patterns for different use cases - -require 'dspy' - -# ============================================================================ -# Basic Configuration -# ============================================================================ - -# Simple OpenAI configuration -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) -end - -# ============================================================================ -# Multi-Provider Configuration -# ============================================================================ - -# Anthropic Claude -DSPy.configure do |c| - c.lm = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY']) -end - -# Google Gemini -DSPy.configure do |c| - c.lm = DSPy::LM.new('gemini/gemini-1.5-pro', - api_key: ENV['GOOGLE_API_KEY']) -end - -# Local Ollama -DSPy.configure do |c| - c.lm = DSPy::LM.new('ollama/llama3.1', - base_url: 'http://localhost:11434') -end - -# OpenRouter (access to 200+ models) -DSPy.configure do |c| - c.lm = DSPy::LM.new('openrouter/anthropic/claude-3.5-sonnet', - api_key: ENV['OPENROUTER_API_KEY'], - base_url: 'https://openrouter.ai/api/v1') -end - -# ============================================================================ -# Environment-Based Configuration -# ============================================================================ - -# Different models for different environments -if Rails.env.development? - # Use local Ollama for development (free, private) - DSPy.configure do |c| - c.lm = DSPy::LM.new('ollama/llama3.1') - end -elsif Rails.env.test? - # Use cheap model for testing - DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) - end -else - # Use powerful model for production - DSPy.configure do |c| - c.lm = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY']) - end -end - -# ============================================================================ -# Configuration with Custom Parameters -# ============================================================================ - -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o', - api_key: ENV['OPENAI_API_KEY'], - temperature: 0.7, # Creativity (0.0-2.0, default: 1.0) - max_tokens: 2000, # Maximum response length - top_p: 0.9, # Nucleus sampling - frequency_penalty: 0.0, # Reduce repetition (-2.0 to 2.0) - presence_penalty: 0.0 # Encourage new topics (-2.0 to 2.0) - ) -end - -# ============================================================================ -# Multiple Model Configuration (Task-Specific) -# ============================================================================ - -# Create different language models for different tasks -module MyApp - # Fast model for simple tasks - FAST_LM = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY'], - temperature: 0.3 # More deterministic - ) - - # Powerful model for complex tasks - POWERFUL_LM = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY'], - temperature: 0.7 - ) - - # Creative model for content generation - CREATIVE_LM = DSPy::LM.new('openai/gpt-4o', - api_key: ENV['OPENAI_API_KEY'], - temperature: 1.2, # More creative - top_p: 0.95 - ) - - # Vision-capable model - VISION_LM = DSPy::LM.new('openai/gpt-4o', - api_key: ENV['OPENAI_API_KEY']) -end - -# Use in modules -class SimpleClassifier < DSPy::Module - def initialize - super - DSPy.configure { |c| c.lm = MyApp::FAST_LM } - @predictor = DSPy::Predict.new(SimpleSignature) - end -end - -class ComplexAnalyzer < DSPy::Module - def initialize - super - DSPy.configure { |c| c.lm = MyApp::POWERFUL_LM } - @predictor = DSPy::ChainOfThought.new(ComplexSignature) - end -end - -# ============================================================================ -# Configuration with Observability (OpenTelemetry) -# ============================================================================ - -require 'opentelemetry/sdk' - -# Configure OpenTelemetry -OpenTelemetry::SDK.configure do |c| - c.service_name = 'my-dspy-app' - c.use_all -end - -# Configure DSPy (automatically integrates with OpenTelemetry) -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) -end - -# ============================================================================ -# Configuration with Langfuse Tracing -# ============================================================================ - -require 'dspy/langfuse' - -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) - - # Enable Langfuse tracing - c.langfuse = { - public_key: ENV['LANGFUSE_PUBLIC_KEY'], - secret_key: ENV['LANGFUSE_SECRET_KEY'], - host: ENV['LANGFUSE_HOST'] || 'https://cloud.langfuse.com' - } -end - -# ============================================================================ -# Configuration with Retry Logic -# ============================================================================ - -class RetryableConfig - MAX_RETRIES = 3 - - def self.configure - DSPy.configure do |c| - c.lm = create_lm_with_retry - end - end - - def self.create_lm_with_retry - lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) - - # Wrap with retry logic - lm.extend(RetryBehavior) - lm - end - - module RetryBehavior - def forward(input, retry_count: 0) - super(input) - rescue RateLimitError, TimeoutError => e - if retry_count < MAX_RETRIES - sleep(2 ** retry_count) # Exponential backoff - forward(input, retry_count: retry_count + 1) - else - raise - end - end - end -end - -RetryableConfig.configure - -# ============================================================================ -# Configuration with Fallback Models -# ============================================================================ - -class FallbackConfig - def self.configure - DSPy.configure do |c| - c.lm = create_lm_with_fallback - end - end - - def self.create_lm_with_fallback - primary = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY']) - - fallback = DSPy::LM.new('openai/gpt-4o', - api_key: ENV['OPENAI_API_KEY']) - - FallbackLM.new(primary, fallback) - end - - class FallbackLM - def initialize(primary, fallback) - @primary = primary - @fallback = fallback - end - - def forward(input) - @primary.forward(input) - rescue => e - puts "Primary model failed: #{e.message}. Falling back..." - @fallback.forward(input) - end - end -end - -FallbackConfig.configure - -# ============================================================================ -# Configuration with Budget Tracking -# ============================================================================ - -class BudgetTrackedConfig - def self.configure(monthly_budget_usd:) - DSPy.configure do |c| - c.lm = BudgetTracker.new( - DSPy::LM.new('openai/gpt-4o', - api_key: ENV['OPENAI_API_KEY']), - monthly_budget_usd: monthly_budget_usd - ) - end - end - - class BudgetTracker - def initialize(lm, monthly_budget_usd:) - @lm = lm - @monthly_budget_usd = monthly_budget_usd - @monthly_cost = 0.0 - end - - def forward(input) - result = @lm.forward(input) - - # Track cost (simplified - actual costs vary by model) - tokens = result.metadata[:usage][:total_tokens] - cost = estimate_cost(tokens) - @monthly_cost += cost - - if @monthly_cost > @monthly_budget_usd - raise "Monthly budget of $#{@monthly_budget_usd} exceeded!" - end - - result - end - - private - - def estimate_cost(tokens) - # Simplified cost estimation (check provider pricing) - (tokens / 1_000_000.0) * 5.0 # $5 per 1M tokens - end - end -end - -BudgetTrackedConfig.configure(monthly_budget_usd: 100) - -# ============================================================================ -# Configuration Initializer for Rails -# ============================================================================ - -# Save this as config/initializers/dspy.rb +# ============================================================================= +# DSPy.rb Configuration Template — v0.34.3 API # -# require 'dspy' +# Rails initializer patterns for DSPy.rb with RubyLLM, observability, +# and feature-flagged model selection. # -# DSPy.configure do |c| -# # Environment-specific configuration -# model_config = case Rails.env.to_sym -# when :development -# { provider: 'ollama', model: 'llama3.1' } -# when :test -# { provider: 'openai', model: 'gpt-4o-mini', temperature: 0.0 } -# when :production -# { provider: 'anthropic', model: 'claude-3-5-sonnet-20241022' } -# end -# -# # Configure language model -# c.lm = DSPy::LM.new( -# "#{model_config[:provider]}/#{model_config[:model]}", -# api_key: ENV["#{model_config[:provider].upcase}_API_KEY"], -# **model_config.except(:provider, :model) -# ) -# -# # Optional: Add observability -# if Rails.env.production? -# c.langfuse = { -# public_key: ENV['LANGFUSE_PUBLIC_KEY'], -# secret_key: ENV['LANGFUSE_SECRET_KEY'] -# } -# end -# end +# Key patterns: +# - Use after_initialize for Rails setup +# - Use dspy-ruby_llm for multi-provider routing +# - Use structured_outputs: true for reliable parsing +# - Use dspy-o11y + dspy-o11y-langfuse for observability +# - Use ENV-based feature flags for model selection +# ============================================================================= -# ============================================================================ -# Testing Configuration -# ============================================================================ - -# In spec/spec_helper.rb or test/test_helper.rb +# ============================================================================= +# Gemfile Dependencies +# ============================================================================= # -# RSpec.configure do |config| -# config.before(:suite) do -# DSPy.configure do |c| -# c.lm = DSPy::LM.new('openai/gpt-4o-mini', -# api_key: ENV['OPENAI_API_KEY'], -# temperature: 0.0 # Deterministic for testing -# ) +# # Core +# gem 'dspy' +# +# # Provider adapter (choose one strategy): +# +# # Strategy A: Unified adapter via RubyLLM (recommended) +# gem 'dspy-ruby_llm' +# gem 'ruby_llm' +# +# # Strategy B: Per-provider adapters (direct SDK access) +# gem 'dspy-openai' # OpenAI, OpenRouter, Ollama +# gem 'dspy-anthropic' # Claude +# gem 'dspy-gemini' # Gemini +# +# # Observability (optional) +# gem 'dspy-o11y' +# gem 'dspy-o11y-langfuse' +# +# # Optimization (optional) +# gem 'dspy-miprov2' # MIPROv2 optimizer +# gem 'dspy-gepa' # GEPA optimizer +# +# # Schema formats (optional) +# gem 'sorbet-baml' # BAML schema format (84% token reduction) + +# ============================================================================= +# Rails Initializer — config/initializers/dspy.rb +# ============================================================================= + +Rails.application.config.after_initialize do + # Skip in test unless explicitly enabled + next if Rails.env.test? && ENV["DSPY_ENABLE_IN_TEST"].blank? + + # Configure RubyLLM provider credentials + RubyLLM.configure do |config| + config.gemini_api_key = ENV["GEMINI_API_KEY"] if ENV["GEMINI_API_KEY"].present? + config.anthropic_api_key = ENV["ANTHROPIC_API_KEY"] if ENV["ANTHROPIC_API_KEY"].present? + config.openai_api_key = ENV["OPENAI_API_KEY"] if ENV["OPENAI_API_KEY"].present? + end + + # Configure DSPy with unified RubyLLM adapter + model = ENV.fetch("DSPY_MODEL", "ruby_llm/gemini-2.5-flash") + DSPy.configure do |config| + config.lm = DSPy::LM.new(model, structured_outputs: true) + config.logger = Rails.logger + end + + # Enable Langfuse observability (optional) + if ENV["LANGFUSE_PUBLIC_KEY"].present? && ENV["LANGFUSE_SECRET_KEY"].present? + DSPy::Observability.configure! + end +end + +# ============================================================================= +# Feature Flags — config/initializers/feature_flags.rb +# ============================================================================= + +# Use different models for different roles: +# - Fast/cheap for classification, routing, simple tasks +# - Powerful for synthesis, reasoning, complex analysis + +module FeatureFlags + SELECTOR_MODEL = ENV.fetch("DSPY_SELECTOR_MODEL", "ruby_llm/gemini-2.5-flash-lite") + SYNTHESIZER_MODEL = ENV.fetch("DSPY_SYNTHESIZER_MODEL", "ruby_llm/gemini-2.5-flash") + REASONING_MODEL = ENV.fetch("DSPY_REASONING_MODEL", "ruby_llm/claude-sonnet-4-20250514") +end + +# Usage in tools/modules: +# +# class ClassifyTool < DSPy::Tools::Base +# def call(query:) +# predictor = DSPy::Predict.new(ClassifySignature) +# predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SELECTOR_MODEL, structured_outputs: true) } +# predictor.call(query: query) # end # end + +# ============================================================================= +# Environment Variables — .env +# ============================================================================= +# +# # Provider API keys (set the ones you need) +# GEMINI_API_KEY=... +# ANTHROPIC_API_KEY=... +# OPENAI_API_KEY=... +# +# # DSPy model configuration +# DSPY_MODEL=ruby_llm/gemini-2.5-flash +# DSPY_SELECTOR_MODEL=ruby_llm/gemini-2.5-flash-lite +# DSPY_SYNTHESIZER_MODEL=ruby_llm/gemini-2.5-flash +# DSPY_REASONING_MODEL=ruby_llm/claude-sonnet-4-20250514 +# +# # Langfuse observability (optional) +# LANGFUSE_PUBLIC_KEY=pk-... +# LANGFUSE_SECRET_KEY=sk-... +# DSPY_TELEMETRY_BATCH_SIZE=5 +# +# # Test environment +# DSPY_ENABLE_IN_TEST=1 # Set to enable DSPy in test env + +# ============================================================================= +# Per-Provider Configuration (without RubyLLM) +# ============================================================================= + +# OpenAI (dspy-openai gem) +# DSPy.configure do |c| +# c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) # end -# ============================================================================ -# Configuration Best Practices -# ============================================================================ +# Anthropic (dspy-anthropic gem) +# DSPy.configure do |c| +# c.lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', api_key: ENV['ANTHROPIC_API_KEY']) +# end -# 1. Use environment variables for API keys (never hardcode) -# 2. Use different models for different environments -# 3. Use cheaper/faster models for development and testing -# 4. Configure temperature based on use case: -# - 0.0-0.3: Deterministic, factual tasks -# - 0.7-1.0: Balanced creativity -# - 1.0-2.0: High creativity, content generation -# 5. Add observability in production (OpenTelemetry, Langfuse) -# 6. Implement retry logic and fallbacks for reliability -# 7. Track costs and set budgets for production -# 8. Use max_tokens to control response length and costs +# Gemini (dspy-gemini gem) +# DSPy.configure do |c| +# c.lm = DSPy::LM.new('gemini/gemini-2.5-flash', api_key: ENV['GEMINI_API_KEY']) +# end + +# Ollama (dspy-openai gem, local models) +# DSPy.configure do |c| +# c.lm = DSPy::LM.new('ollama/llama3.2', base_url: 'http://localhost:11434') +# end + +# OpenRouter (dspy-openai gem, 200+ models) +# DSPy.configure do |c| +# c.lm = DSPy::LM.new('openrouter/anthropic/claude-3.5-sonnet', +# api_key: ENV['OPENROUTER_API_KEY'], +# base_url: 'https://openrouter.ai/api/v1') +# end + +# ============================================================================= +# VCR Test Configuration — spec/support/dspy.rb +# ============================================================================= + +# VCR.configure do |config| +# config.cassette_library_dir = "spec/vcr_cassettes" +# config.hook_into :webmock +# config.configure_rspec_metadata! +# config.filter_sensitive_data('') { ENV['GEMINI_API_KEY'] } +# config.filter_sensitive_data('') { ENV['OPENAI_API_KEY'] } +# config.filter_sensitive_data('') { ENV['ANTHROPIC_API_KEY'] } +# end + +# ============================================================================= +# Schema Format Configuration (optional) +# ============================================================================= + +# BAML schema format — 84% token reduction for Enhanced Prompting mode +# DSPy.configure do |c| +# c.lm = DSPy::LM.new('openai/gpt-4o-mini', +# api_key: ENV['OPENAI_API_KEY'], +# schema_format: :baml # Requires sorbet-baml gem +# ) +# end + +# TOON schema + data format — table-oriented format +# DSPy.configure do |c| +# c.lm = DSPy::LM.new('openai/gpt-4o-mini', +# api_key: ENV['OPENAI_API_KEY'], +# schema_format: :toon, # How DSPy describes the signature +# data_format: :toon # How inputs/outputs are rendered in prompts +# ) +# end +# +# Note: BAML and TOON apply only when structured_outputs: false. +# With structured_outputs: true, the provider receives JSON Schema directly. diff --git a/plugins/compound-engineering/skills/dspy-ruby/assets/module-template.rb b/plugins/compound-engineering/skills/dspy-ruby/assets/module-template.rb index cc76edb..c7f1122 100644 --- a/plugins/compound-engineering/skills/dspy-ruby/assets/module-template.rb +++ b/plugins/compound-engineering/skills/dspy-ruby/assets/module-template.rb @@ -1,326 +1,300 @@ # frozen_string_literal: true -# Example DSPy Module Template -# This template demonstrates best practices for creating composable modules +# ============================================================================= +# DSPy.rb Module Template — v0.34.3 API +# +# Modules orchestrate predictors, tools, and business logic. +# +# Key patterns: +# - Use .call() to invoke (not .forward()) +# - Access results with result.field (not result[:field]) +# - Use DSPy::Tools::Base for tools (not DSPy::Tool) +# - Use lifecycle callbacks (before/around/after) for cross-cutting concerns +# - Use DSPy.with_lm for temporary model overrides +# - Use configure_predictor for fine-grained agent control +# ============================================================================= -# Basic module with single predictor -class BasicModule < DSPy::Module +# --- Basic Module --- + +class BasicClassifier < DSPy::Module def initialize super - # Initialize predictor with signature - @predictor = DSPy::Predict.new(ExampleSignature) + @predictor = DSPy::Predict.new(ClassificationSignature) end - def forward(input_hash) - # Forward pass through the predictor - @predictor.forward(input_hash) + def forward(text:) + @predictor.call(text: text) end end -# Module with Chain of Thought reasoning -class ChainOfThoughtModule < DSPy::Module +# Usage: +# classifier = BasicClassifier.new +# result = classifier.call(text: "This is a test") +# result.category # => "technical" +# result.confidence # => 0.95 + +# --- Module with Chain of Thought --- + +class ReasoningClassifier < DSPy::Module def initialize super - # ChainOfThought automatically adds reasoning to output - @predictor = DSPy::ChainOfThought.new(EmailClassificationSignature) + @predictor = DSPy::ChainOfThought.new(ClassificationSignature) end - def forward(email_subject:, email_body:) - result = @predictor.forward( - email_subject: email_subject, - email_body: email_body - ) + def forward(text:) + result = @predictor.call(text: text) + # ChainOfThought adds result.reasoning automatically + result + end +end - # Result includes :reasoning field automatically - { - category: result[:category], - priority: result[:priority], - reasoning: result[:reasoning], - confidence: calculate_confidence(result) - } +# --- Module with Lifecycle Callbacks --- + +class InstrumentedModule < DSPy::Module + before :setup_metrics + around :manage_context + after :log_completion + + def initialize + super + @predictor = DSPy::Predict.new(AnalysisSignature) + @start_time = nil + end + + def forward(query:) + @predictor.call(query: query) end private - def calculate_confidence(result) - # Add custom logic to calculate confidence - # For example, based on reasoning length or specificity - result[:confidence] || 0.8 + # Runs before forward + def setup_metrics + @start_time = Time.now + Rails.logger.info "Starting prediction" + end + + # Wraps forward — must call yield + def manage_context + load_user_context + result = yield + save_updated_context(result) + result + end + + # Runs after forward completes + def log_completion + duration = Time.now - @start_time + Rails.logger.info "Prediction completed in #{duration}s" + end + + def load_user_context = nil + def save_updated_context(_result) = nil +end + +# Execution order: before → around (before yield) → forward → around (after yield) → after +# Callbacks are inherited from parent classes and execute in registration order. + +# --- Module with Tools --- + +class SearchTool < DSPy::Tools::Base + tool_name "search" + tool_description "Search for information by query" + + sig { params(query: String, max_results: Integer).returns(T::Array[T::Hash[Symbol, String]]) } + def call(query:, max_results: 5) + # Implementation here + [{ title: "Result 1", url: "https://example.com" }] end end -# Composable module that chains multiple steps -class MultiStepPipeline < DSPy::Module - def initialize - super - # Initialize multiple predictors for different steps - @step1 = DSPy::Predict.new(Step1Signature) - @step2 = DSPy::ChainOfThought.new(Step2Signature) - @step3 = DSPy::Predict.new(Step3Signature) - end +class FinishTool < DSPy::Tools::Base + tool_name "finish" + tool_description "Submit the final answer" - def forward(input) - # Chain predictors together - result1 = @step1.forward(input) - result2 = @step2.forward(result1) - result3 = @step3.forward(result2) - - # Combine results as needed - { - step1_output: result1, - step2_output: result2, - final_result: result3 - } + sig { params(answer: String).returns(String) } + def call(answer:) + answer end end -# Module with conditional logic -class ConditionalModule < DSPy::Module +class ResearchAgent < DSPy::Module def initialize super - @simple_classifier = DSPy::Predict.new(SimpleClassificationSignature) - @complex_analyzer = DSPy::ChainOfThought.new(ComplexAnalysisSignature) + tools = [SearchTool.new, FinishTool.new] + @agent = DSPy::ReAct.new( + ResearchSignature, + tools: tools, + max_iterations: 5 + ) end - def forward(text:, complexity_threshold: 100) - # Use different predictors based on input characteristics - if text.length < complexity_threshold - @simple_classifier.forward(text: text) - else - @complex_analyzer.forward(text: text) - end + def forward(question:) + @agent.call(question: question) end end -# Module with error handling and retry logic -class RobustModule < DSPy::Module - MAX_RETRIES = 3 +# --- Module with Per-Task Model Selection --- +class SmartRouter < DSPy::Module def initialize super - @predictor = DSPy::Predict.new(RobustSignature) - @logger = Logger.new(STDOUT) + @classifier = DSPy::Predict.new(RouteSignature) + @analyzer = DSPy::ChainOfThought.new(AnalysisSignature) end - def forward(input, retry_count: 0) - @logger.info "Processing input: #{input.inspect}" + def forward(text:) + # Use fast model for classification + DSPy.with_lm(fast_model) do + route = @classifier.call(text: text) - begin - result = @predictor.forward(input) - validate_result!(result) - result - rescue DSPy::ValidationError => e - @logger.error "Validation error: #{e.message}" - - if retry_count < MAX_RETRIES - @logger.info "Retrying (#{retry_count + 1}/#{MAX_RETRIES})..." - sleep(2 ** retry_count) # Exponential backoff - forward(input, retry_count: retry_count + 1) + if route.requires_deep_analysis + # Switch to powerful model for analysis + DSPy.with_lm(powerful_model) do + @analyzer.call(text: text) + end else - @logger.error "Max retries exceeded" - raise + route end end end private - def validate_result!(result) - # Add custom validation logic - raise DSPy::ValidationError, "Invalid result" unless result[:category] - raise DSPy::ValidationError, "Low confidence" if result[:confidence] && result[:confidence] < 0.5 + def fast_model + @fast_model ||= DSPy::LM.new( + ENV.fetch("DSPY_SELECTOR_MODEL", "ruby_llm/gemini-2.5-flash-lite"), + structured_outputs: true + ) + end + + def powerful_model + @powerful_model ||= DSPy::LM.new( + ENV.fetch("DSPY_SYNTHESIZER_MODEL", "ruby_llm/gemini-2.5-flash"), + structured_outputs: true + ) end end -# Module with ReAct agent and tools -class AgentModule < DSPy::Module +# --- Module with configure_predictor --- + +class ConfiguredAgent < DSPy::Module def initialize super + tools = [SearchTool.new, FinishTool.new] + @agent = DSPy::ReAct.new(ResearchSignature, tools: tools) - # Define tools for the agent - tools = [ - SearchTool.new, - CalculatorTool.new, - DatabaseQueryTool.new - ] + # Set default model for all internal predictors + @agent.configure { |c| c.lm = DSPy::LM.new('ruby_llm/gemini-2.5-flash', structured_outputs: true) } - # ReAct provides iterative reasoning and tool usage - @agent = DSPy::ReAct.new( - AgentSignature, - tools: tools, - max_iterations: 5 - ) - end - - def forward(task:) - # Agent will autonomously use tools to complete the task - @agent.forward(task: task) - end -end - -# Tool definition example -class SearchTool < DSPy::Tool - def call(query:) - # Implement search functionality - results = perform_search(query) - { results: results } - end - - private - - def perform_search(query) - # Actual search implementation - # Could call external API, database, etc. - ["result1", "result2", "result3"] - end -end - -# Module with state management -class StatefulModule < DSPy::Module - attr_reader :history - - def initialize - super - @predictor = DSPy::ChainOfThought.new(StatefulSignature) - @history = [] - end - - def forward(input) - # Process with context from history - context = build_context_from_history - result = @predictor.forward( - input: input, - context: context - ) - - # Store in history - @history << { - input: input, - result: result, - timestamp: Time.now - } - - result - end - - def reset! - @history.clear - end - - private - - def build_context_from_history - @history.last(5).map { |h| h[:result][:summary] }.join("\n") - end -end - -# Module that uses different LLMs for different tasks -class MultiModelModule < DSPy::Module - def initialize - super - - # Fast, cheap model for simple classification - @fast_predictor = create_predictor( - 'openai/gpt-4o-mini', - SimpleClassificationSignature - ) - - # Powerful model for complex analysis - @powerful_predictor = create_predictor( - 'anthropic/claude-3-5-sonnet-20241022', - ComplexAnalysisSignature - ) - end - - def forward(input, use_complex: false) - if use_complex - @powerful_predictor.forward(input) - else - @fast_predictor.forward(input) + # Override specific predictor with a more capable model + @agent.configure_predictor('thought_generator') do |c| + c.lm = DSPy::LM.new('ruby_llm/claude-sonnet-4-20250514', structured_outputs: true) end end - private - - def create_predictor(model, signature) - lm = DSPy::LM.new(model, api_key: ENV["#{model.split('/').first.upcase}_API_KEY"]) - DSPy::Predict.new(signature, lm: lm) + def forward(question:) + @agent.call(question: question) end end -# Module with caching -class CachedModule < DSPy::Module +# Available internal predictors by agent type: +# DSPy::ReAct → thought_generator, observation_processor +# DSPy::CodeAct → code_generator, observation_processor +# DSPy::DeepSearch → seed_predictor, search_predictor, reader_predictor, reason_predictor + +# --- Module with Event Subscriptions --- + +class TokenTrackingModule < DSPy::Module + subscribe 'lm.tokens', :track_tokens, scope: :descendants + def initialize super - @predictor = DSPy::Predict.new(CachedSignature) - @cache = {} + @predictor = DSPy::Predict.new(AnalysisSignature) + @total_tokens = 0 end - def forward(input) - # Create cache key from input - cache_key = create_cache_key(input) - - # Return cached result if available - if @cache.key?(cache_key) - puts "Cache hit for #{cache_key}" - return @cache[cache_key] - end - - # Compute and cache result - result = @predictor.forward(input) - @cache[cache_key] = result - result + def forward(query:) + @predictor.call(query: query) end - def clear_cache! - @cache.clear + def track_tokens(_event, attrs) + @total_tokens += attrs.fetch(:total_tokens, 0) end - private - - def create_cache_key(input) - # Create deterministic hash from input - Digest::MD5.hexdigest(input.to_s) + def token_usage + @total_tokens end end -# Usage Examples: -# -# Basic usage: -# module = BasicModule.new -# result = module.forward(field_name: "value") -# -# Chain of Thought: -# module = ChainOfThoughtModule.new -# result = module.forward( -# email_subject: "Can't log in", -# email_body: "I'm unable to access my account" -# ) -# puts result[:reasoning] -# -# Multi-step pipeline: -# pipeline = MultiStepPipeline.new -# result = pipeline.forward(input_data) -# -# With error handling: -# module = RobustModule.new -# begin -# result = module.forward(input_data) -# rescue DSPy::ValidationError => e -# puts "Failed after retries: #{e.message}" -# end -# -# Agent with tools: -# agent = AgentModule.new -# result = agent.forward(task: "Find the population of Tokyo") -# -# Stateful processing: -# module = StatefulModule.new -# result1 = module.forward("First input") -# result2 = module.forward("Second input") # Has context from first -# module.reset! # Clear history -# -# With caching: -# module = CachedModule.new -# result1 = module.forward(input) # Computes result -# result2 = module.forward(input) # Returns cached result +# Module-scoped subscriptions automatically scope to the module instance and descendants. +# Use scope: :self_only to restrict delivery to the module itself (ignoring children). + +# --- Tool That Wraps a Prediction --- + +class RerankTool < DSPy::Tools::Base + tool_name "rerank" + tool_description "Score and rank search results by relevance" + + MAX_ITEMS = 200 + MIN_ITEMS_FOR_LLM = 5 + + sig { params(query: String, items: T::Array[T::Hash[Symbol, T.untyped]]).returns(T::Hash[Symbol, T.untyped]) } + def call(query:, items: []) + # Short-circuit: skip LLM for small sets + return { scored_items: items, reranked: false } if items.size < MIN_ITEMS_FOR_LLM + + # Cap to prevent token overflow + capped_items = items.first(MAX_ITEMS) + + predictor = DSPy::Predict.new(RerankSignature) + predictor.configure { |c| c.lm = DSPy::LM.new("ruby_llm/gemini-2.5-flash", structured_outputs: true) } + + result = predictor.call(query: query, items: capped_items) + { scored_items: result.scored_items, reranked: true } + rescue => e + Rails.logger.warn "[RerankTool] LLM rerank failed: #{e.message}" + { error: "Rerank failed: #{e.message}", scored_items: items, reranked: false } + end +end + +# Key patterns for tools wrapping predictions: +# - Short-circuit LLM calls when unnecessary (small data, trivial cases) +# - Cap input size to prevent token overflow +# - Per-tool model selection via configure +# - Graceful error handling with fallback data + +# --- Multi-Step Pipeline --- + +class AnalysisPipeline < DSPy::Module + def initialize + super + @classifier = DSPy::Predict.new(ClassifySignature) + @analyzer = DSPy::ChainOfThought.new(AnalyzeSignature) + @summarizer = DSPy::Predict.new(SummarizeSignature) + end + + def forward(text:) + classification = @classifier.call(text: text) + analysis = @analyzer.call(text: text, category: classification.category) + @summarizer.call(analysis: analysis.reasoning, category: classification.category) + end +end + +# --- Observability with Spans --- + +class TracedModule < DSPy::Module + def initialize + super + @predictor = DSPy::Predict.new(AnalysisSignature) + end + + def forward(query:) + DSPy::Context.with_span( + operation: "traced_module.analyze", + "dspy.module" => self.class.name, + "query.length" => query.length.to_s + ) do + @predictor.call(query: query) + end + end +end diff --git a/plugins/compound-engineering/skills/dspy-ruby/assets/signature-template.rb b/plugins/compound-engineering/skills/dspy-ruby/assets/signature-template.rb index ea13f81..bff2af6 100644 --- a/plugins/compound-engineering/skills/dspy-ruby/assets/signature-template.rb +++ b/plugins/compound-engineering/skills/dspy-ruby/assets/signature-template.rb @@ -1,143 +1,221 @@ # frozen_string_literal: true -# Example DSPy Signature Template -# This template demonstrates best practices for creating type-safe signatures +# ============================================================================= +# DSPy.rb Signature Template — v0.34.3 API +# +# Signatures define the interface between your application and LLMs. +# They specify inputs, outputs, and task descriptions using Sorbet types. +# +# Key patterns: +# - Use T::Enum classes for controlled outputs (not inline T.enum([...])) +# - Use description: kwarg on fields to guide the LLM +# - Use default values for optional fields +# - Use Date/DateTime/Time for temporal data (auto-converted) +# - Access results with result.field (not result[:field]) +# - Invoke with predictor.call() (not predictor.forward()) +# ============================================================================= -class ExampleSignature < DSPy::Signature - # Clear, specific description of what this signature does - # Good: "Classify customer support emails into Technical, Billing, or General categories" - # Avoid: "Classify emails" - description "Describe what this signature accomplishes and what output it produces" +# --- Basic Signature --- - # Input fields: Define what data the LLM receives - input do - # Basic field with description - const :field_name, String, desc: "Clear description of this input field" +class SentimentAnalysis < DSPy::Signature + description "Analyze sentiment of text" - # Numeric fields - const :count, Integer, desc: "Number of items to process" - const :score, Float, desc: "Confidence score between 0.0 and 1.0" - - # Boolean fields - const :is_active, T::Boolean, desc: "Whether the item is currently active" - - # Array fields - const :tags, T::Array[String], desc: "List of tags associated with the item" - - # Optional: Enum for constrained values - const :priority, T.enum(["Low", "Medium", "High"]), desc: "Priority level" + class Sentiment < T::Enum + enums do + Positive = new('positive') + Negative = new('negative') + Neutral = new('neutral') + end + end + + input do + const :text, String end - # Output fields: Define what data the LLM produces output do - # Primary output - const :result, String, desc: "The main result of the operation" - - # Classification result with enum - const :category, T.enum(["Technical", "Billing", "General"]), - desc: "Category classification - must be one of: Technical, Billing, General" - - # Confidence/metadata - const :confidence, Float, desc: "Confidence score (0.0-1.0) for this classification" - - # Optional reasoning (automatically added by ChainOfThought) - # const :reasoning, String, desc: "Step-by-step reasoning for the classification" + const :sentiment, Sentiment + const :score, Float, description: "Confidence score from 0.0 to 1.0" end end -# Example with multimodal input (vision) -class VisionExampleSignature < DSPy::Signature +# Usage: +# predictor = DSPy::Predict.new(SentimentAnalysis) +# result = predictor.call(text: "This product is amazing!") +# result.sentiment # => Sentiment::Positive +# result.score # => 0.92 + +# --- Signature with Date/Time Types --- + +class EventScheduler < DSPy::Signature + description "Schedule events based on requirements" + + input do + const :event_name, String + const :start_date, Date # ISO 8601: YYYY-MM-DD + const :end_date, T.nilable(Date) # Optional date + const :preferred_time, DateTime # ISO 8601 with timezone + const :deadline, Time # Stored as UTC + end + + output do + const :scheduled_date, Date # LLM returns ISO string, auto-converted + const :event_datetime, DateTime # Preserves timezone + const :created_at, Time # Converted to UTC + end +end + +# Date/Time format handling: +# Date → ISO 8601 (YYYY-MM-DD) +# DateTime → ISO 8601 with timezone (YYYY-MM-DDTHH:MM:SS+00:00) +# Time → ISO 8601, automatically converted to UTC + +# --- Signature with Default Values --- + +class SmartSearch < DSPy::Signature + description "Search with intelligent defaults" + + input do + const :query, String + const :max_results, Integer, default: 10 + const :language, String, default: "English" + const :include_metadata, T::Boolean, default: false + end + + output do + const :results, T::Array[String] + const :total_found, Integer + const :search_time_ms, Float, default: 0.0 # Fallback if LLM omits + const :cached, T::Boolean, default: false + end +end + +# Input defaults reduce boilerplate: +# search = DSPy::Predict.new(SmartSearch) +# result = search.call(query: "Ruby programming") +# # max_results=10, language="English", include_metadata=false are applied + +# --- Signature with Nested Structs and Field Descriptions --- + +class EntityExtraction < DSPy::Signature + description "Extract named entities from text" + + class EntityType < T::Enum + enums do + Person = new('person') + Organization = new('organization') + Location = new('location') + DateEntity = new('date') + end + end + + class Entity < T::Struct + const :name, String, description: "The entity text as it appears in the source" + const :type, EntityType + const :confidence, Float, description: "Extraction confidence from 0.0 to 1.0" + const :start_offset, Integer, default: 0 + end + + input do + const :text, String + const :entity_types, T::Array[EntityType], default: [], + description: "Filter to these entity types; empty means all types" + end + + output do + const :entities, T::Array[Entity] + const :total_found, Integer + end +end + +# --- Signature with Union Types --- + +class FlexibleClassification < DSPy::Signature + description "Classify input with flexible result type" + + class Category < T::Enum + enums do + Technical = new('technical') + Business = new('business') + Personal = new('personal') + end + end + + input do + const :text, String + end + + output do + const :category, Category + const :result, T.any(Float, String), + description: "Numeric score or text explanation depending on classification" + const :confidence, Float + end +end + +# --- Signature with Recursive Types --- + +class DocumentParser < DSPy::Signature + description "Parse document into tree structure" + + class NodeType < T::Enum + enums do + Heading = new('heading') + Paragraph = new('paragraph') + List = new('list') + CodeBlock = new('code_block') + end + end + + class TreeNode < T::Struct + const :node_type, NodeType, description: "The type of document element" + const :text, String, default: "", description: "Text content of the node" + const :level, Integer, default: 0 + const :children, T::Array[TreeNode], default: [] # Self-reference → $defs in JSON Schema + end + + input do + const :html, String, description: "Raw HTML to parse" + end + + output do + const :root, TreeNode + const :word_count, Integer + end +end + +# The schema generator creates #/$defs/TreeNode references for recursive types, +# compatible with OpenAI and Gemini structured outputs. +# Use `default: []` instead of `T.nilable(T::Array[...])` for OpenAI compatibility. + +# --- Vision Signature --- + +class ImageAnalysis < DSPy::Signature description "Analyze an image and answer questions about its content" input do - const :image, DSPy::Image, desc: "The image to analyze" - const :question, String, desc: "Question about the image content" + const :image, DSPy::Image, description: "The image to analyze" + const :question, String, description: "Question about the image content" end output do - const :answer, String, desc: "Detailed answer to the question about the image" - const :confidence, Float, desc: "Confidence in the answer (0.0-1.0)" + const :answer, String + const :confidence, Float, description: "Confidence in the answer (0.0-1.0)" end end -# Example for complex analysis task -class SentimentAnalysisSignature < DSPy::Signature - description "Analyze the sentiment of text with nuanced emotion detection" - - input do - const :text, String, desc: "The text to analyze for sentiment" - const :context, String, desc: "Additional context about the text source or situation" - end - - output do - const :sentiment, T.enum(["Positive", "Negative", "Neutral", "Mixed"]), - desc: "Overall sentiment - must be Positive, Negative, Neutral, or Mixed" - - const :emotions, T::Array[String], - desc: "List of specific emotions detected (e.g., joy, anger, sadness, fear)" - - const :intensity, T.enum(["Low", "Medium", "High"]), - desc: "Intensity of the detected sentiment" - - const :confidence, Float, - desc: "Confidence in the sentiment classification (0.0-1.0)" - end -end - -# Example for code generation task -class CodeGenerationSignature < DSPy::Signature - description "Generate Ruby code based on natural language requirements" - - input do - const :requirements, String, - desc: "Natural language description of what the code should do" - - const :constraints, String, - desc: "Any specific requirements or constraints (e.g., libraries to use, style preferences)" - end - - output do - const :code, String, - desc: "Complete, working Ruby code that fulfills the requirements" - - const :explanation, String, - desc: "Brief explanation of how the code works and any important design decisions" - - const :dependencies, T::Array[String], - desc: "List of required gems or dependencies" - end -end - -# Usage Examples: -# -# Basic usage with Predict: -# predictor = DSPy::Predict.new(ExampleSignature) -# result = predictor.forward( -# field_name: "example value", -# count: 5, -# score: 0.85, -# is_active: true, -# tags: ["tag1", "tag2"], -# priority: "High" -# ) -# puts result[:result] -# puts result[:category] -# puts result[:confidence] -# -# With Chain of Thought reasoning: -# predictor = DSPy::ChainOfThought.new(SentimentAnalysisSignature) -# result = predictor.forward( -# text: "I absolutely love this product! It exceeded all my expectations.", -# context: "Product review on e-commerce site" -# ) -# puts result[:reasoning] # See the LLM's step-by-step thinking -# puts result[:sentiment] -# puts result[:emotions] -# -# With Vision: -# predictor = DSPy::Predict.new(VisionExampleSignature) -# result = predictor.forward( +# Vision usage: +# predictor = DSPy::Predict.new(ImageAnalysis) +# result = predictor.call( # image: DSPy::Image.from_file("path/to/image.jpg"), -# question: "What objects are visible in this image?" +# question: "What objects are visible?" # ) -# puts result[:answer] +# result.answer # => "The image shows..." + +# --- Accessing Schemas Programmatically --- +# +# SentimentAnalysis.input_json_schema # => { type: "object", properties: { ... } } +# SentimentAnalysis.output_json_schema # => { type: "object", properties: { ... } } +# +# # Field descriptions propagate to JSON Schema +# Entity.field_descriptions[:name] # => "The entity text as it appears in the source" +# Entity.field_descriptions[:confidence] # => "Extraction confidence from 0.0 to 1.0" diff --git a/plugins/compound-engineering/skills/dspy-ruby/references/core-concepts.md b/plugins/compound-engineering/skills/dspy-ruby/references/core-concepts.md index 66f0b02..f8fb006 100644 --- a/plugins/compound-engineering/skills/dspy-ruby/references/core-concepts.md +++ b/plugins/compound-engineering/skills/dspy-ruby/references/core-concepts.md @@ -1,265 +1,674 @@ # DSPy.rb Core Concepts -## Philosophy - -DSPy.rb enables developers to **program LLMs, not prompt them**. Instead of manually crafting prompts, define application requirements through code using type-safe, composable modules. - ## Signatures -Signatures define type-safe input/output contracts for LLM operations. They specify what data goes in and what data comes out, with runtime type checking. +Signatures define the interface between application code and language models. They specify inputs, outputs, and a task description using Sorbet types for compile-time and runtime type safety. -### Basic Signature Structure +### Structure ```ruby -class TaskSignature < DSPy::Signature - description "Brief description of what this signature does" +class ClassifyEmail < DSPy::Signature + description "Classify customer support emails by urgency and category" input do - const :field_name, String, desc: "Description of this input field" - const :another_field, Integer, desc: "Another input field" + const :subject, String + const :body, String end output do - const :result_field, String, desc: "Description of the output" - const :confidence, Float, desc: "Confidence score (0.0-1.0)" + const :category, String + const :urgency, String end end ``` -### Type Safety +### Supported Types -Signatures support Sorbet types including: -- `String` - Text data -- `Integer`, `Float` - Numeric data -- `T::Boolean` - Boolean values -- `T::Array[Type]` - Arrays of specific types -- Custom enums and classes +| Type | JSON Schema | Notes | +|------|-------------|-------| +| `String` | `string` | Required string | +| `Integer` | `integer` | Whole numbers | +| `Float` | `number` | Decimal numbers | +| `T::Boolean` | `boolean` | true/false | +| `T::Array[X]` | `array` | Typed arrays | +| `T::Hash[K, V]` | `object` | Typed key-value maps | +| `T.nilable(X)` | nullable | Optional fields | +| `Date` | `string` (ISO 8601) | Auto-converted | +| `DateTime` | `string` (ISO 8601) | Preserves timezone | +| `Time` | `string` (ISO 8601) | Converted to UTC | + +### Date and Time Types + +Date, DateTime, and Time fields serialize to ISO 8601 strings and auto-convert back to Ruby objects on output. + +```ruby +class EventScheduler < DSPy::Signature + description "Schedule events based on requirements" + + input do + const :start_date, Date # ISO 8601: YYYY-MM-DD + const :preferred_time, DateTime # ISO 8601 with timezone + const :deadline, Time # Converted to UTC + const :end_date, T.nilable(Date) # Optional date + end + + output do + const :scheduled_date, Date # String from LLM, auto-converted to Date + const :event_datetime, DateTime # Preserves timezone info + const :created_at, Time # Converted to UTC + end +end + +predictor = DSPy::Predict.new(EventScheduler) +result = predictor.call( + start_date: "2024-01-15", + preferred_time: "2024-01-15T10:30:45Z", + deadline: Time.now, + end_date: nil +) + +result.scheduled_date.class # => Date +result.event_datetime.class # => DateTime +``` + +Timezone conventions follow ActiveRecord: Time objects convert to UTC, DateTime objects preserve timezone, Date objects are timezone-agnostic. + +### Enums with T::Enum + +Define constrained output values using `T::Enum` classes. Do not use inline `T.enum([...])` syntax. + +```ruby +class SentimentAnalysis < DSPy::Signature + description "Analyze sentiment of text" + + class Sentiment < T::Enum + enums do + Positive = new('positive') + Negative = new('negative') + Neutral = new('neutral') + end + end + + input do + const :text, String + end + + output do + const :sentiment, Sentiment + const :confidence, Float + end +end + +predictor = DSPy::Predict.new(SentimentAnalysis) +result = predictor.call(text: "This product is amazing!") + +result.sentiment # => # +result.sentiment.serialize # => "positive" +result.confidence # => 0.92 +``` + +Enum matching is case-insensitive. The LLM returning `"POSITIVE"` matches `new('positive')`. + +### Default Values + +Default values work on both inputs and outputs. Input defaults reduce caller boilerplate. Output defaults provide fallbacks when the LLM omits optional fields. + +```ruby +class SmartSearch < DSPy::Signature + description "Search with intelligent defaults" + + input do + const :query, String + const :max_results, Integer, default: 10 + const :language, String, default: "English" + end + + output do + const :results, T::Array[String] + const :total_found, Integer + const :cached, T::Boolean, default: false + end +end + +search = DSPy::Predict.new(SmartSearch) +result = search.call(query: "Ruby programming") +# max_results defaults to 10, language defaults to "English" +# If LLM omits `cached`, it defaults to false +``` ### Field Descriptions -Always provide clear field descriptions using the `desc:` parameter. These descriptions: -- Guide the LLM on expected input/output format -- Serve as documentation for developers -- Improve prediction accuracy +Add `description:` to any field to guide the LLM on expected content. These descriptions appear in the generated JSON schema sent to the model. + +```ruby +class ASTNode < T::Struct + const :node_type, String, description: "The type of AST node (heading, paragraph, code_block)" + const :text, String, default: "", description: "Text content of the node" + const :level, Integer, default: 0, description: "Heading level 1-6, only for heading nodes" + const :children, T::Array[ASTNode], default: [] +end + +ASTNode.field_descriptions[:node_type] # => "The type of AST node ..." +ASTNode.field_descriptions[:children] # => nil (no description set) +``` + +Field descriptions also work inside signature `input` and `output` blocks: + +```ruby +class ExtractEntities < DSPy::Signature + description "Extract named entities from text" + + input do + const :text, String, description: "Raw text to analyze" + const :language, String, default: "en", description: "ISO 639-1 language code" + end + + output do + const :entities, T::Array[String], description: "List of extracted entity names" + const :count, Integer, description: "Total number of unique entities found" + end +end +``` + +### Schema Formats + +DSPy.rb supports three schema formats for communicating type structure to LLMs. + +#### JSON Schema (default) + +Verbose but universally supported. Access via `YourSignature.output_json_schema`. + +#### BAML Schema + +Compact format that reduces schema tokens by 80-85%. Requires the `sorbet-baml` gem. + +```ruby +DSPy.configure do |c| + c.lm = DSPy::LM.new('openai/gpt-4o-mini', + api_key: ENV['OPENAI_API_KEY'], + schema_format: :baml + ) +end +``` + +BAML applies only in Enhanced Prompting mode (`structured_outputs: false`). When `structured_outputs: true`, the provider receives JSON Schema directly. + +#### TOON Schema + Data Format + +Table-oriented text format that shrinks both schema definitions and prompt values. + +```ruby +DSPy.configure do |c| + c.lm = DSPy::LM.new('openai/gpt-4o-mini', + api_key: ENV['OPENAI_API_KEY'], + schema_format: :toon, + data_format: :toon + ) +end +``` + +`schema_format: :toon` replaces the schema block in the system prompt. `data_format: :toon` renders input values and output templates inside `toon` fences. Only works with Enhanced Prompting mode. The `sorbet-toon` gem is included automatically as a dependency. + +### Recursive Types + +Structs that reference themselves produce `$defs` entries in the generated JSON schema, using `$ref` pointers to avoid infinite recursion. + +```ruby +class ASTNode < T::Struct + const :node_type, String + const :text, String, default: "" + const :children, T::Array[ASTNode], default: [] +end +``` + +The schema generator detects the self-reference in `T::Array[ASTNode]` and emits: + +```json +{ + "$defs": { + "ASTNode": { "type": "object", "properties": { ... } } + }, + "properties": { + "children": { + "type": "array", + "items": { "$ref": "#/$defs/ASTNode" } + } + } +} +``` + +Access the schema with accumulated definitions via `YourSignature.output_json_schema_with_defs`. + +### Union Types with T.any() + +Specify fields that accept multiple types: + +```ruby +output do + const :result, T.any(Float, String) +end +``` + +For struct unions, DSPy.rb automatically adds a `_type` discriminator field to each struct's JSON schema. The LLM returns `_type` in its response, and DSPy converts the hash to the correct struct instance. + +```ruby +class CreateTask < T::Struct + const :title, String + const :priority, String +end + +class DeleteTask < T::Struct + const :task_id, String + const :reason, T.nilable(String) +end + +class TaskRouter < DSPy::Signature + description "Route user request to the appropriate task action" + + input do + const :request, String + end + + output do + const :action, T.any(CreateTask, DeleteTask) + end +end + +result = DSPy::Predict.new(TaskRouter).call(request: "Create a task for Q4 review") +result.action.class # => CreateTask +result.action.title # => "Q4 Review" +``` + +Pattern matching works on the result: + +```ruby +case result.action +when CreateTask then puts "Creating: #{result.action.title}" +when DeleteTask then puts "Deleting: #{result.action.task_id}" +end +``` + +Union types also work inside arrays for heterogeneous collections: + +```ruby +output do + const :events, T::Array[T.any(LoginEvent, PurchaseEvent)] +end +``` + +Limit unions to 2-4 types for reliable LLM comprehension. Use clear struct names since they become the `_type` discriminator values. + +--- ## Modules -Modules are composable building blocks that use signatures to perform LLM operations. They can be chained together to create complex workflows. +Modules are composable building blocks that wrap predictors. Define a `forward` method; invoke the module with `.call()`. -### Basic Module Structure +### Basic Structure ```ruby -class MyModule < DSPy::Module +class SentimentAnalyzer < DSPy::Module def initialize super - @predictor = DSPy::Predict.new(MySignature) + @predictor = DSPy::Predict.new(SentimentSignature) end - def forward(input_hash) - @predictor.forward(input_hash) + def forward(text:) + @predictor.call(text: text) end end + +analyzer = SentimentAnalyzer.new +result = analyzer.call(text: "I love this product!") + +result.sentiment # => "positive" +result.confidence # => 0.9 ``` +**API rules:** +- Invoke modules and predictors with `.call()`, not `.forward()`. +- Access result fields with `result.field`, not `result[:field]`. + ### Module Composition -Modules can call other modules to create pipelines: +Combine multiple modules through explicit method calls in `forward`: ```ruby -class ComplexWorkflow < DSPy::Module +class DocumentProcessor < DSPy::Module def initialize super - @step1 = FirstModule.new - @step2 = SecondModule.new + @classifier = DocumentClassifier.new + @summarizer = DocumentSummarizer.new end - def forward(input) - result1 = @step1.forward(input) - result2 = @step2.forward(result1) - result2 + def forward(document:) + classification = @classifier.call(content: document) + summary = @summarizer.call(content: document) + + { + document_type: classification.document_type, + summary: summary.summary + } end end ``` +### Lifecycle Callbacks + +Modules support `before`, `after`, and `around` callbacks on `forward`. Declare them as class-level macros referencing private methods. + +#### Execution order + +1. `before` callbacks (in registration order) +2. `around` callbacks (before `yield`) +3. `forward` method +4. `around` callbacks (after `yield`) +5. `after` callbacks (in registration order) + +```ruby +class InstrumentedModule < DSPy::Module + before :setup_metrics + after :log_metrics + around :manage_context + + def initialize + super + @predictor = DSPy::Predict.new(MySignature) + @metrics = {} + end + + def forward(question:) + @predictor.call(question: question) + end + + private + + def setup_metrics + @metrics[:start_time] = Time.now + end + + def manage_context + load_context + result = yield + save_context + result + end + + def log_metrics + @metrics[:duration] = Time.now - @metrics[:start_time] + end +end +``` + +Multiple callbacks of the same type execute in registration order. Callbacks inherit from parent classes; parent callbacks run first. + +#### Around callbacks + +Around callbacks must call `yield` to execute the wrapped method and return the result: + +```ruby +def with_retry + retries = 0 + begin + yield + rescue StandardError => e + retries += 1 + retry if retries < 3 + raise e + end +end +``` + +### Instruction Update Contract + +Teleprompters (GEPA, MIPROv2) require modules to expose immutable update hooks. Include `DSPy::Mixins::InstructionUpdatable` and implement `with_instruction` and `with_examples`, each returning a new instance: + +```ruby +class SentimentPredictor < DSPy::Module + include DSPy::Mixins::InstructionUpdatable + + def initialize + super + @predictor = DSPy::Predict.new(SentimentSignature) + end + + def with_instruction(instruction) + clone = self.class.new + clone.instance_variable_set(:@predictor, @predictor.with_instruction(instruction)) + clone + end + + def with_examples(examples) + clone = self.class.new + clone.instance_variable_set(:@predictor, @predictor.with_examples(examples)) + clone + end +end +``` + +If a module omits these hooks, teleprompters raise `DSPy::InstructionUpdateError` instead of silently mutating state. + +--- + ## Predictors -Predictors are the core execution engines that take signatures and perform LLM inference. DSPy.rb provides several predictor types. +Predictors are execution engines that take a signature and produce structured results from a language model. DSPy.rb provides four predictor types. ### Predict -Basic LLM inference with type-safe inputs and outputs. +Direct LLM call with typed input/output. Fastest option, lowest token usage. ```ruby -predictor = DSPy::Predict.new(TaskSignature) -result = predictor.forward(field_name: "value", another_field: 42) -# Returns: { result_field: "...", confidence: 0.85 } +classifier = DSPy::Predict.new(ClassifyText) +result = classifier.call(text: "Technical document about APIs") + +result.sentiment # => # +result.topics # => ["APIs", "technical"] +result.confidence # => 0.92 ``` ### ChainOfThought -Automatically adds a reasoning field to the output, improving accuracy for complex tasks. +Adds a `reasoning` field to the output automatically. The model generates step-by-step reasoning before the final answer. Do not define a `:reasoning` field in the signature output when using ChainOfThought. ```ruby -class EmailClassificationSignature < DSPy::Signature - description "Classify customer support emails" +class SolveMathProblem < DSPy::Signature + description "Solve mathematical word problems step by step" input do - const :email_subject, String - const :email_body, String + const :problem, String end output do - const :category, String # "Technical", "Billing", or "General" - const :priority, String # "High", "Medium", or "Low" + const :answer, String + # :reasoning is added automatically by ChainOfThought end end -predictor = DSPy::ChainOfThought.new(EmailClassificationSignature) -result = predictor.forward( - email_subject: "Can't log in to my account", - email_body: "I've been trying to access my account for hours..." -) -# Returns: { -# reasoning: "This appears to be a technical issue...", -# category: "Technical", -# priority: "High" -# } +solver = DSPy::ChainOfThought.new(SolveMathProblem) +result = solver.call(problem: "Sarah has 15 apples. She gives 7 away and buys 12 more.") + +result.reasoning # => "Step by step: 15 - 7 = 8, then 8 + 12 = 20" +result.answer # => "20 apples" ``` +Use ChainOfThought for complex analysis, multi-step reasoning, or when explainability matters. + ### ReAct -Tool-using agents with iterative reasoning. Enables autonomous problem-solving by allowing the LLM to use external tools. +Reasoning + Action agent that uses tools in an iterative loop. Define tools by subclassing `DSPy::Tools::Base`. Group related tools with `DSPy::Tools::Toolset`. ```ruby -class SearchTool < DSPy::Tool - def call(query:) - # Perform search and return results - { results: search_database(query) } +class WeatherTool < DSPy::Tools::Base + extend T::Sig + + tool_name "weather" + tool_description "Get weather information for a location" + + sig { params(location: String).returns(String) } + def call(location:) + { location: location, temperature: 72, condition: "sunny" }.to_json end end -predictor = DSPy::ReAct.new( - TaskSignature, - tools: [SearchTool.new], +class TravelSignature < DSPy::Signature + description "Help users plan travel" + + input do + const :destination, String + end + + output do + const :recommendations, String + end +end + +agent = DSPy::ReAct.new( + TravelSignature, + tools: [WeatherTool.new], max_iterations: 5 ) + +result = agent.call(destination: "Tokyo, Japan") +result.recommendations # => "Visit Senso-ji Temple early morning..." +result.history # => Array of reasoning steps, actions, observations +result.iterations # => 3 +result.tools_used # => ["weather"] +``` + +Use toolsets to expose multiple tool methods from a single class: + +```ruby +text_tools = DSPy::Tools::TextProcessingToolset.to_tools +agent = DSPy::ReAct.new(MySignature, tools: text_tools) ``` ### CodeAct -Dynamic code generation for solving problems programmatically. Requires the optional `dspy-code_act` gem. +Think-Code-Observe agent that synthesizes and executes Ruby code. Ships as a separate gem. ```ruby -predictor = DSPy::CodeAct.new(TaskSignature) -result = predictor.forward(task: "Calculate the factorial of 5") -# The LLM generates and executes Ruby code to solve the task +# Gemfile +gem 'dspy-code_act', '~> 0.29' ``` -## Multimodal Support +```ruby +programmer = DSPy::CodeAct.new(ProgrammingSignature, max_iterations: 10) +result = programmer.call(task: "Calculate the factorial of 20") +``` -DSPy.rb supports vision capabilities across compatible models using the unified `DSPy::Image` interface. +### Predictor Comparison + +| Predictor | Speed | Token Usage | Best For | +|-----------|-------|-------------|----------| +| Predict | Fastest | Low | Classification, extraction | +| ChainOfThought | Moderate | Medium-High | Complex reasoning, analysis | +| ReAct | Slower | High | Multi-step tasks with tools | +| CodeAct | Slowest | Very High | Dynamic programming, calculations | + +### Concurrent Predictions + +Process multiple independent predictions simultaneously using `Async::Barrier`: ```ruby -class VisionSignature < DSPy::Signature - description "Describe what's in an image" +require 'async' +require 'async/barrier' - input do - const :image, DSPy::Image - const :question, String +analyzer = DSPy::Predict.new(ContentAnalyzer) +documents = ["Text one", "Text two", "Text three"] + +Async do + barrier = Async::Barrier.new + + tasks = documents.map do |doc| + barrier.async { analyzer.call(content: doc) } end - output do - const :description, String - end -end + barrier.wait + predictions = tasks.map(&:wait) -predictor = DSPy::Predict.new(VisionSignature) -result = predictor.forward( - image: DSPy::Image.from_file("path/to/image.jpg"), - question: "What objects are visible in this image?" -) -``` - -### Image Input Methods - -```ruby -# From file path -DSPy::Image.from_file("path/to/image.jpg") - -# From URL (OpenAI only) -DSPy::Image.from_url("https://example.com/image.jpg") - -# From base64-encoded data -DSPy::Image.from_base64(base64_string, mime_type: "image/jpeg") -``` - -## Best Practices - -### 1. Clear Signature Descriptions - -Always provide clear, specific descriptions for signatures and fields: - -```ruby -# Good -description "Classify customer support emails into Technical, Billing, or General categories" - -# Avoid -description "Classify emails" -``` - -### 2. Type Safety - -Use specific types rather than generic String when possible: - -```ruby -# Good - Use enums for constrained outputs -output do - const :category, T.enum(["Technical", "Billing", "General"]) -end - -# Less ideal - Generic string -output do - const :category, String, desc: "Must be Technical, Billing, or General" + predictions.each { |p| puts p.sentiment } end ``` -### 3. Composable Architecture - -Build complex workflows from simple, reusable modules: +Add `gem 'async', '~> 2.29'` to the Gemfile. Handle errors within each `barrier.async` block to prevent one failure from cancelling others: ```ruby -class EmailPipeline < DSPy::Module - def initialize - super - @classifier = EmailClassifier.new - @prioritizer = EmailPrioritizer.new - @responder = EmailResponder.new - end - - def forward(email) - classification = @classifier.forward(email) - priority = @prioritizer.forward(classification) - @responder.forward(classification.merge(priority)) +barrier.async do + begin + analyzer.call(content: doc) + rescue StandardError => e + nil end end ``` -### 4. Error Handling - -Always handle potential type validation errors: +### Few-Shot Examples and Instruction Tuning ```ruby -begin - result = predictor.forward(input_data) -rescue DSPy::ValidationError => e - # Handle validation error - logger.error "Invalid output from LLM: #{e.message}" +classifier = DSPy::Predict.new(SentimentAnalysis) + +examples = [ + DSPy::FewShotExample.new( + input: { text: "Love it!" }, + output: { sentiment: "positive", confidence: 0.95 } + ) +] + +optimized = classifier.with_examples(examples) +tuned = classifier.with_instruction("Be precise and confident.") +``` + +--- + +## Type System + +### Automatic Type Conversion + +DSPy.rb v0.9.0+ automatically converts LLM JSON responses to typed Ruby objects: + +- **Enums**: String values become `T::Enum` instances (case-insensitive) +- **Structs**: Nested hashes become `T::Struct` objects +- **Arrays**: Elements convert recursively +- **Defaults**: Missing fields use declared defaults + +### Discriminators for Union Types + +When a field uses `T.any()` with struct types, DSPy adds a `_type` field to each struct's schema. On deserialization, `_type` selects the correct struct class: + +```json +{ + "action": { + "_type": "CreateTask", + "title": "Review Q4 Report" + } +} +``` + +DSPy matches `"CreateTask"` against the union members and instantiates the correct struct. No manual discriminator field is needed. + +### Recursive Types + +Structs referencing themselves are supported. The schema generator tracks visited types and produces `$ref` pointers under `$defs`: + +```ruby +class TreeNode < T::Struct + const :label, String + const :children, T::Array[TreeNode], default: [] end ``` -## Limitations +The generated schema uses `"$ref": "#/$defs/TreeNode"` for the children array items, preventing infinite schema expansion. -Current constraints to be aware of: -- No streaming support (single-request processing only) -- Limited multimodal support through Ollama for local deployments -- Vision capabilities vary by provider (see providers.md for compatibility matrix) +### Nesting Depth + +- 1-2 levels: reliable across all providers. +- 3-4 levels: works but increases schema complexity. +- 5+ levels: may trigger OpenAI depth validation warnings and reduce LLM accuracy. Flatten deeply nested structures or split into multiple signatures. + +### Tips + +- Prefer `T::Array[X], default: []` over `T.nilable(T::Array[X])` -- the nilable form causes schema issues with OpenAI structured outputs. +- Use clear struct names for union types since they become `_type` discriminator values. +- Limit union types to 2-4 members for reliable model comprehension. +- Check schema compatibility with `DSPy::OpenAI::LM::SchemaConverter.validate_compatibility(schema)`. diff --git a/plugins/compound-engineering/skills/dspy-ruby/references/observability.md b/plugins/compound-engineering/skills/dspy-ruby/references/observability.md new file mode 100644 index 0000000..76bd83f --- /dev/null +++ b/plugins/compound-engineering/skills/dspy-ruby/references/observability.md @@ -0,0 +1,366 @@ +# DSPy.rb Observability + +DSPy.rb provides an event-driven observability system built on OpenTelemetry. The system replaces monkey-patching with structured event emission, pluggable listeners, automatic span creation, and non-blocking Langfuse export. + +## Event System + +### Emitting Events + +Emit structured events with `DSPy.event`: + +```ruby +DSPy.event('lm.tokens', { + 'gen_ai.system' => 'openai', + 'gen_ai.request.model' => 'gpt-4', + input_tokens: 150, + output_tokens: 50, + total_tokens: 200 +}) +``` + +Event names are **strings** with dot-separated namespaces (e.g., `'llm.generate'`, `'react.iteration_complete'`, `'chain_of_thought.reasoning_complete'`). Do not use symbols for event names. + +Attributes must be JSON-serializable. DSPy automatically merges context (trace ID, module stack) and creates OpenTelemetry spans. + +### Global Subscriptions + +Subscribe to events across the entire application with `DSPy.events.subscribe`: + +```ruby +# Exact event name +subscription_id = DSPy.events.subscribe('lm.tokens') do |event_name, attrs| + puts "Tokens used: #{attrs[:total_tokens]}" +end + +# Wildcard pattern -- matches llm.generate, llm.stream, etc. +DSPy.events.subscribe('llm.*') do |event_name, attrs| + track_llm_usage(attrs) +end + +# Catch-all wildcard +DSPy.events.subscribe('*') do |event_name, attrs| + log_everything(event_name, attrs) +end +``` + +Use global subscriptions for cross-cutting concerns: observability exporters (Langfuse, Datadog), centralized logging, metrics collection. + +### Module-Scoped Subscriptions + +Declare listeners inside a `DSPy::Module` subclass. Subscriptions automatically scope to the module instance and its descendants: + +```ruby +class ResearchReport < DSPy::Module + subscribe 'lm.tokens', :track_tokens, scope: :descendants + + def initialize + super + @outliner = DSPy::Predict.new(OutlineSignature) + @writer = DSPy::Predict.new(SectionWriterSignature) + @token_count = 0 + end + + def forward(question:) + outline = @outliner.call(question: question) + outline.sections.map do |title| + draft = @writer.call(question: question, section_title: title) + { title: title, body: draft.paragraph } + end + end + + def track_tokens(_event, attrs) + @token_count += attrs.fetch(:total_tokens, 0) + end +end +``` + +The `scope:` parameter accepts: +- `:descendants` (default) -- receives events from the module **and** every nested module invoked inside it. +- `DSPy::Module::SubcriptionScope::SelfOnly` -- restricts delivery to events emitted by the module instance itself; ignores descendants. + +Inspect active subscriptions with `registered_module_subscriptions`. Tear down with `unsubscribe_module_events`. + +### Unsubscribe and Cleanup + +Remove a global listener by subscription ID: + +```ruby +id = DSPy.events.subscribe('llm.*') { |name, attrs| } +DSPy.events.unsubscribe(id) +``` + +Build tracker classes that manage their own subscription lifecycle: + +```ruby +class TokenBudgetTracker + def initialize(budget:) + @budget = budget + @usage = 0 + @subscriptions = [] + @subscriptions << DSPy.events.subscribe('lm.tokens') do |_event, attrs| + @usage += attrs.fetch(:total_tokens, 0) + warn("Budget hit") if @usage >= @budget + end + end + + def unsubscribe + @subscriptions.each { |id| DSPy.events.unsubscribe(id) } + @subscriptions.clear + end +end +``` + +### Clearing Listeners in Tests + +Call `DSPy.events.clear_listeners` in `before`/`after` blocks to prevent cross-contamination between test cases: + +```ruby +RSpec.configure do |config| + config.after(:each) { DSPy.events.clear_listeners } +end +``` + +## dspy-o11y Gems + +Three gems compose the observability stack: + +| Gem | Purpose | +|---|---| +| `dspy` | Core event bus (`DSPy.event`, `DSPy.events`) -- always available | +| `dspy-o11y` | OpenTelemetry spans, `AsyncSpanProcessor`, `DSPy::Context.with_span` helpers | +| `dspy-o11y-langfuse` | Langfuse adapter -- configures OTLP exporter targeting Langfuse endpoints | + +### Installation + +```ruby +# Gemfile +gem 'dspy' +gem 'dspy-o11y' # core spans + helpers +gem 'dspy-o11y-langfuse' # Langfuse/OpenTelemetry adapter (optional) +``` + +If the optional gems are absent, DSPy falls back to logging-only mode with no errors. + +## Langfuse Integration + +### Environment Variables + +```bash +# Required +export LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key +export LANGFUSE_SECRET_KEY=sk-lf-your-secret-key + +# Optional (defaults to https://cloud.langfuse.com) +export LANGFUSE_HOST=https://us.cloud.langfuse.com + +# Tuning (optional) +export DSPY_TELEMETRY_BATCH_SIZE=100 # spans per export batch (default 100) +export DSPY_TELEMETRY_QUEUE_SIZE=1000 # max queued spans (default 1000) +export DSPY_TELEMETRY_EXPORT_INTERVAL=60 # seconds between timed exports (default 60) +export DSPY_TELEMETRY_SHUTDOWN_TIMEOUT=10 # seconds to drain on shutdown (default 10) +``` + +### Automatic Configuration + +Call `DSPy::Observability.configure!` once at boot (it is already called automatically when `require 'dspy'` runs and Langfuse env vars are present): + +```ruby +require 'dspy' +# If LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are set, +# DSPy::Observability.configure! runs automatically and: +# 1. Configures the OpenTelemetry SDK with an OTLP exporter +# 2. Creates dual output: structured logs AND OpenTelemetry spans +# 3. Exports spans to Langfuse using proper authentication +# 4. Falls back gracefully if gems are missing +``` + +Verify status with `DSPy::Observability.enabled?`. + +### Automatic Tracing + +With observability enabled, every `DSPy::Module#forward` call, LM request, and tool invocation creates properly nested spans. Langfuse receives hierarchical traces: + +``` +Trace: abc-123-def ++-- ChainOfThought.forward [2000ms] (observation type: chain) + +-- llm.generate [1000ms] (observation type: generation) + Model: gpt-4-0613 + Tokens: 100 in / 50 out / 150 total +``` + +DSPy maps module classes to Langfuse observation types automatically via `DSPy::ObservationType.for_module_class`: + +| Module | Observation Type | +|---|---| +| `DSPy::LM` (raw chat) | `generation` | +| `DSPy::ChainOfThought` | `chain` | +| `DSPy::ReAct` | `agent` | +| Tool invocations | `tool` | +| Memory/retrieval | `retriever` | +| Embedding engines | `embedding` | +| Evaluation modules | `evaluator` | +| Generic operations | `span` | + +## Score Reporting + +### DSPy.score API + +Report evaluation scores with `DSPy.score`: + +```ruby +# Numeric (default) +DSPy.score('accuracy', 0.95) + +# With comment +DSPy.score('relevance', 0.87, comment: 'High semantic similarity') + +# Boolean +DSPy.score('is_valid', 1, data_type: DSPy::Scores::DataType::Boolean) + +# Categorical +DSPy.score('sentiment', 'positive', data_type: DSPy::Scores::DataType::Categorical) + +# Explicit trace binding +DSPy.score('accuracy', 0.95, trace_id: 'custom-trace-id') +``` + +Available data types: `DSPy::Scores::DataType::Numeric`, `::Boolean`, `::Categorical`. + +### score.create Events + +Every `DSPy.score` call emits a `'score.create'` event. Subscribe to react: + +```ruby +DSPy.events.subscribe('score.create') do |event_name, attrs| + puts "#{attrs[:score_name]} = #{attrs[:score_value]}" + # Also available: attrs[:score_id], attrs[:score_data_type], + # attrs[:score_comment], attrs[:trace_id], attrs[:observation_id], + # attrs[:timestamp] +end +``` + +### Async Langfuse Export with DSPy::Scores::Exporter + +Configure the exporter to send scores to Langfuse in the background: + +```ruby +exporter = DSPy::Scores::Exporter.configure( + public_key: ENV['LANGFUSE_PUBLIC_KEY'], + secret_key: ENV['LANGFUSE_SECRET_KEY'], + host: 'https://cloud.langfuse.com' +) + +# Scores are now exported automatically via a background Thread::Queue +DSPy.score('accuracy', 0.95) + +# Shut down gracefully (waits up to 5 seconds by default) +exporter.shutdown +``` + +The exporter subscribes to `'score.create'` events internally, queues them for async processing, and retries with exponential backoff on failure. + +### Automatic Export with DSPy::Evals + +Pass `export_scores: true` to `DSPy::Evals` to export per-example scores and an aggregate batch score automatically: + +```ruby +evaluator = DSPy::Evals.new( + program, + metric: my_metric, + export_scores: true, + score_name: 'qa_accuracy' +) + +result = evaluator.evaluate(test_examples) +``` + +## DSPy::Context.with_span + +Create manual spans for custom operations. Requires `dspy-o11y`. + +```ruby +DSPy::Context.with_span(operation: 'custom.retrieval', 'retrieval.source' => 'pinecone') do |span| + results = pinecone_client.query(embedding) + span&.set_attribute('retrieval.count', results.size) if span + results +end +``` + +Pass semantic attributes as keyword arguments alongside `operation:`. The block receives an OpenTelemetry span object (or `nil` when observability is disabled). The span automatically nests under the current parent span and records `duration.ms`, `langfuse.observation.startTime`, and `langfuse.observation.endTime`. + +Assign a Langfuse observation type to custom spans: + +```ruby +DSPy::Context.with_span( + operation: 'evaluate.batch', + **DSPy::ObservationType::Evaluator.langfuse_attributes, + 'batch.size' => examples.length +) do |span| + run_evaluation(examples) +end +``` + +Scores reported inside a `with_span` block automatically inherit the current trace context. + +## Module Stack Metadata + +When `DSPy::Module#forward` runs, the context layer maintains a module stack. Every event includes: + +```ruby +{ + module_path: [ + { id: "root_uuid", class: "DeepSearch", label: nil }, + { id: "planner_uuid", class: "DSPy::Predict", label: "planner" } + ], + module_root: { id: "root_uuid", class: "DeepSearch", label: nil }, + module_leaf: { id: "planner_uuid", class: "DSPy::Predict", label: "planner" }, + module_scope: { + ancestry_token: "root_uuid>planner_uuid", + depth: 2 + } +} +``` + +| Key | Meaning | +|---|---| +| `module_path` | Ordered array of `{id, class, label}` entries from root to leaf | +| `module_root` | The outermost module in the current call chain | +| `module_leaf` | The innermost (currently executing) module | +| `module_scope.ancestry_token` | Stable string of joined UUIDs representing the nesting path | +| `module_scope.depth` | Integer depth of the current module in the stack | + +Labels are set via `module_scope_label=` on a module instance or derived automatically from named predictors. Use this metadata to power Langfuse filters, scoped metrics, or custom event routing. + +## Dedicated Export Worker + +The `DSPy::Observability::AsyncSpanProcessor` (from `dspy-o11y`) keeps telemetry export off the hot path: + +- Runs on a `Concurrent::SingleThreadExecutor` -- LLM workflows never compete with OTLP networking. +- Buffers finished spans in a `Thread::Queue` (max size configurable via `DSPY_TELEMETRY_QUEUE_SIZE`). +- Drains spans in batches of `DSPY_TELEMETRY_BATCH_SIZE` (default 100). When the queue reaches batch size, an immediate async export fires. +- A background timer thread triggers periodic export every `DSPY_TELEMETRY_EXPORT_INTERVAL` seconds (default 60). +- Applies exponential backoff (`0.1 * 2^attempt` seconds) on export failures, up to `DEFAULT_MAX_RETRIES` (3). +- On shutdown, flushes all remaining spans within `DSPY_TELEMETRY_SHUTDOWN_TIMEOUT` seconds, then terminates the executor. +- Drops the oldest span when the queue is full, logging `'observability.span_dropped'`. + +No application code interacts with the processor directly. Configure it entirely through environment variables. + +## Built-in Events Reference + +| Event Name | Emitted By | Key Attributes | +|---|---|---| +| `lm.tokens` | `DSPy::LM` | `gen_ai.system`, `gen_ai.request.model`, `input_tokens`, `output_tokens`, `total_tokens` | +| `chain_of_thought.reasoning_complete` | `DSPy::ChainOfThought` | `dspy.signature`, `cot.reasoning_steps`, `cot.reasoning_length`, `cot.has_reasoning` | +| `react.iteration_complete` | `DSPy::ReAct` | `iteration`, `thought`, `action`, `observation` | +| `codeact.iteration_complete` | `dspy-code_act` gem | `iteration`, `code_executed`, `execution_result` | +| `optimization.trial_complete` | Teleprompters (MIPROv2) | `trial_number`, `score` | +| `score.create` | `DSPy.score` | `score_name`, `score_value`, `score_data_type`, `trace_id` | +| `span.start` | `DSPy::Context.with_span` | `trace_id`, `span_id`, `parent_span_id`, `operation` | + +## Best Practices + +- Use dot-separated string names for events. Follow OpenTelemetry `gen_ai.*` conventions for LLM attributes. +- Always call `unsubscribe` (or `unsubscribe_module_events` for scoped subscriptions) when a tracker is no longer needed to prevent memory leaks. +- Call `DSPy.events.clear_listeners` in test teardown to avoid cross-contamination. +- Wrap risky listener logic in a rescue block. The event system isolates listener failures, but explicit rescue prevents silent swallowing of domain errors. +- Prefer module-scoped `subscribe` for agent internals. Reserve global `DSPy.events.subscribe` for infrastructure-level concerns. diff --git a/plugins/compound-engineering/skills/dspy-ruby/references/optimization.md b/plugins/compound-engineering/skills/dspy-ruby/references/optimization.md index 7ff5466..0f2e8e7 100644 --- a/plugins/compound-engineering/skills/dspy-ruby/references/optimization.md +++ b/plugins/compound-engineering/skills/dspy-ruby/references/optimization.md @@ -1,623 +1,603 @@ -# DSPy.rb Testing, Optimization & Observability +# DSPy.rb Optimization -## Testing +## MIPROv2 -DSPy.rb enables standard RSpec testing patterns for LLM logic, making your AI applications testable and maintainable. +MIPROv2 (Multi-prompt Instruction Proposal with Retrieval Optimization) is the primary instruction tuner in DSPy.rb. It proposes new instructions and few-shot demonstrations per predictor, evaluates them on mini-batches, and retains candidates that improve the metric. It ships as a separate gem to keep the Gaussian Process dependency tree out of apps that do not need it. -### Basic Testing Setup +### Installation ```ruby -require 'rspec' -require 'dspy' +# Gemfile +gem "dspy" +gem "dspy-miprov2" +``` -RSpec.describe EmailClassifier do - before do - DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) - end - end +Bundler auto-requires `dspy/miprov2`. No additional `require` statement is needed. - describe '#classify' do - it 'classifies technical support emails correctly' do - classifier = EmailClassifier.new - result = classifier.forward( - email_subject: "Can't log in", - email_body: "I'm unable to access my account" - ) +### AutoMode presets - expect(result[:category]).to eq('Technical') - expect(result[:priority]).to be_in(['High', 'Medium', 'Low']) - end - end +Use `DSPy::Teleprompt::MIPROv2::AutoMode` for preconfigured optimizers: + +```ruby +light = DSPy::Teleprompt::MIPROv2::AutoMode.light(metric: metric) # 6 trials, greedy +medium = DSPy::Teleprompt::MIPROv2::AutoMode.medium(metric: metric) # 12 trials, adaptive +heavy = DSPy::Teleprompt::MIPROv2::AutoMode.heavy(metric: metric) # 18 trials, Bayesian +``` + +| Preset | Trials | Strategy | Use case | +|----------|--------|------------|-----------------------------------------------------| +| `light` | 6 | `:greedy` | Quick wins on small datasets or during prototyping. | +| `medium` | 12 | `:adaptive`| Balanced exploration vs. runtime for most pilots. | +| `heavy` | 18 | `:bayesian`| Highest accuracy targets or multi-stage programs. | + +### Manual configuration with dry-configurable + +`DSPy::Teleprompt::MIPROv2` includes `Dry::Configurable`. Configure at the class level (defaults for all instances) or instance level (overrides class defaults). + +**Class-level defaults:** + +```ruby +DSPy::Teleprompt::MIPROv2.configure do |config| + config.optimization_strategy = :bayesian + config.num_trials = 30 + config.bootstrap_sets = 10 end ``` -### Mocking LLM Responses - -Test your modules without making actual API calls: +**Instance-level overrides:** ```ruby -RSpec.describe MyModule do - it 'handles mock responses correctly' do - # Create a mock predictor that returns predetermined results - mock_predictor = instance_double(DSPy::Predict) - allow(mock_predictor).to receive(:forward).and_return({ - category: 'Technical', - priority: 'High', - confidence: 0.95 - }) - - # Inject mock into your module - module_instance = MyModule.new - module_instance.instance_variable_set(:@predictor, mock_predictor) - - result = module_instance.forward(input: 'test data') - expect(result[:category]).to eq('Technical') - end +optimizer = DSPy::Teleprompt::MIPROv2.new(metric: metric) +optimizer.configure do |config| + config.num_trials = 15 + config.num_instruction_candidates = 6 + config.bootstrap_sets = 5 + config.max_bootstrapped_examples = 4 + config.max_labeled_examples = 16 + config.optimization_strategy = :adaptive # :greedy, :adaptive, :bayesian + config.early_stopping_patience = 3 + config.init_temperature = 1.0 + config.final_temperature = 0.1 + config.minibatch_size = nil # nil = auto + config.auto_seed = 42 end ``` -### Testing Type Safety +The `optimization_strategy` setting accepts symbols (`:greedy`, `:adaptive`, `:bayesian`) and coerces them internally to `DSPy::Teleprompt::OptimizationStrategy` T::Enum values. -Verify that signatures enforce type constraints: +The old `config:` constructor parameter is removed. Passing `config:` raises `ArgumentError`. + +### Auto presets via configure + +Instead of `AutoMode`, set the preset through the configure block: ```ruby -RSpec.describe EmailClassificationSignature do - it 'validates output types' do - predictor = DSPy::Predict.new(EmailClassificationSignature) - - # This should work - result = predictor.forward( - email_subject: 'Test', - email_body: 'Test body' - ) - expect(result[:category]).to be_a(String) - - # Test that invalid types are caught - expect { - # Simulate LLM returning invalid type - predictor.send(:validate_output, { category: 123 }) - }.to raise_error(DSPy::ValidationError) - end +optimizer = DSPy::Teleprompt::MIPROv2.new(metric: metric) +optimizer.configure do |config| + config.auto_preset = DSPy::Teleprompt::AutoPreset.deserialize("medium") end ``` -### Testing Edge Cases - -Always test boundary conditions and error scenarios: +### Compile and inspect ```ruby -RSpec.describe EmailClassifier do - it 'handles empty emails' do - classifier = EmailClassifier.new - result = classifier.forward( - email_subject: '', - email_body: '' - ) - # Define expected behavior for edge case - expect(result[:category]).to eq('General') - end +program = DSPy::Predict.new(MySignature) - it 'handles very long emails' do - long_body = 'word ' * 10000 - classifier = EmailClassifier.new - - expect { - classifier.forward( - email_subject: 'Test', - email_body: long_body - ) - }.not_to raise_error - end - - it 'handles special characters' do - classifier = EmailClassifier.new - result = classifier.forward( - email_subject: 'Test ', - email_body: 'Body with émojis 🎉 and spëcial çharacters' - ) - - expect(result[:category]).to be_in(['Technical', 'Billing', 'General']) - end -end -``` - -### Integration Testing - -Test complete workflows end-to-end: - -```ruby -RSpec.describe EmailProcessingPipeline do - it 'processes email through complete pipeline' do - pipeline = EmailProcessingPipeline.new - - result = pipeline.forward( - email_subject: 'Billing question', - email_body: 'How do I update my payment method?' - ) - - # Verify the complete pipeline output - expect(result[:classification]).to eq('Billing') - expect(result[:priority]).to eq('Medium') - expect(result[:suggested_response]).to include('payment') - expect(result[:assigned_team]).to eq('billing_support') - end -end -``` - -### VCR for Deterministic Tests - -Use VCR to record and replay API responses: - -```ruby -require 'vcr' - -VCR.configure do |config| - config.cassette_library_dir = 'spec/vcr_cassettes' - config.hook_into :webmock - config.filter_sensitive_data('') { ENV['OPENAI_API_KEY'] } -end - -RSpec.describe EmailClassifier do - it 'classifies emails consistently', :vcr do - VCR.use_cassette('email_classification') do - classifier = EmailClassifier.new - result = classifier.forward( - email_subject: 'Test subject', - email_body: 'Test body' - ) - - expect(result[:category]).to eq('Technical') - end - end -end -``` - -## Optimization - -DSPy.rb provides powerful optimization capabilities to automatically improve your prompts and modules. - -### MIPROv2 Optimization - -MIPROv2 is an advanced multi-prompt optimization technique that uses bootstrap sampling, instruction generation, and Bayesian optimization. - -```ruby -require 'dspy/mipro' - -# Define your module to optimize -class EmailClassifier < DSPy::Module - def initialize - super - @predictor = DSPy::ChainOfThought.new(EmailClassificationSignature) - end - - def forward(input) - @predictor.forward(input) - end -end - -# Prepare training data -training_examples = [ - { - input: { email_subject: "Can't log in", email_body: "Password reset not working" }, - expected_output: { category: 'Technical', priority: 'High' } - }, - { - input: { email_subject: "Billing question", email_body: "How much does premium cost?" }, - expected_output: { category: 'Billing', priority: 'Medium' } - }, - # Add more examples... -] - -# Define evaluation metric -def accuracy_metric(example, prediction) - (example[:expected_output][:category] == prediction[:category]) ? 1.0 : 0.0 -end - -# Run optimization -optimizer = DSPy::MIPROv2.new( - metric: method(:accuracy_metric), - num_candidates: 10, - num_threads: 4 +result = optimizer.compile( + program, + trainset: train_examples, + valset: val_examples ) -optimized_module = optimizer.compile( - EmailClassifier.new, - trainset: training_examples -) - -# Use optimized module -result = optimized_module.forward( - email_subject: "New email", - email_body: "New email content" -) +optimized_program = result.optimized_program +puts "Best score: #{result.best_score_value}" ``` -### Bootstrap Few-Shot Learning +The `result` object exposes: +- `optimized_program` -- ready-to-use predictor with updated instruction and demos. +- `optimization_trace[:trial_logs]` -- per-trial record of instructions, demos, and scores. +- `metadata[:optimizer]` -- `"MIPROv2"`, useful when persisting experiments from multiple optimizers. -Automatically generate few-shot examples from your training data: +### Multi-stage programs + +MIPROv2 generates dataset summaries for each predictor and proposes per-stage instructions. For a ReAct agent with `thought_generator` and `observation_processor` predictors, the optimizer handles credit assignment internally. The metric only needs to evaluate the final output. + +### Bootstrap sampling + +During the bootstrap phase MIPROv2: +1. Generates dataset summaries from the training set. +2. Bootstraps few-shot demonstrations by running the baseline program. +3. Proposes candidate instructions grounded in the summaries and bootstrapped examples. +4. Evaluates each candidate on mini-batches drawn from the validation set. + +Control the bootstrap phase with `bootstrap_sets`, `max_bootstrapped_examples`, and `max_labeled_examples`. + +### Bayesian optimization + +When `optimization_strategy` is `:bayesian` (or when using the `heavy` preset), MIPROv2 fits a Gaussian Process surrogate over past trial scores to select the next candidate. This replaces random search with informed exploration, reducing the number of trials needed to find high-scoring instructions. + +--- + +## GEPA + +GEPA (Genetic-Pareto Reflective Prompt Evolution) is a feedback-driven optimizer. It runs the program on a small batch, collects scores and textual feedback, and asks a reflection LM to rewrite the instruction. Improved candidates are retained on a Pareto frontier. + +### Installation ```ruby -require 'dspy/teleprompt' - -# Create a teleprompter for few-shot optimization -teleprompter = DSPy::BootstrapFewShot.new( - metric: method(:accuracy_metric), - max_bootstrapped_demos: 5, - max_labeled_demos: 3 -) - -# Compile the optimized module -optimized = teleprompter.compile( - MyModule.new, - trainset: training_examples -) +# Gemfile +gem "dspy" +gem "dspy-gepa" ``` -### Custom Optimization Metrics +The `dspy-gepa` gem depends on the `gepa` core optimizer gem automatically. -Define custom metrics for your specific use case: +### Metric contract + +GEPA metrics return `DSPy::Prediction` with both a numeric score and a feedback string. Do not return a plain boolean. ```ruby -def custom_metric(example, prediction) - score = 0.0 +metric = lambda do |example, prediction| + expected = example.expected_values[:label] + predicted = prediction.label - # Category accuracy (60% weight) - score += 0.6 if example[:expected_output][:category] == prediction[:category] - - # Priority accuracy (40% weight) - score += 0.4 if example[:expected_output][:priority] == prediction[:priority] - - score -end - -# Use in optimization -optimizer = DSPy::MIPROv2.new( - metric: method(:custom_metric), - num_candidates: 10 -) -``` - -### A/B Testing Different Approaches - -Compare different module implementations: - -```ruby -# Approach A: ChainOfThought -class ApproachA < DSPy::Module - def initialize - super - @predictor = DSPy::ChainOfThought.new(EmailClassificationSignature) - end - - def forward(input) - @predictor.forward(input) - end -end - -# Approach B: ReAct with tools -class ApproachB < DSPy::Module - def initialize - super - @predictor = DSPy::ReAct.new( - EmailClassificationSignature, - tools: [KnowledgeBaseTool.new] - ) - end - - def forward(input) - @predictor.forward(input) - end -end - -# Evaluate both approaches -def evaluate_approach(approach_class, test_set) - approach = approach_class.new - scores = test_set.map do |example| - prediction = approach.forward(example[:input]) - accuracy_metric(example, prediction) - end - scores.sum / scores.size -end - -approach_a_score = evaluate_approach(ApproachA, test_examples) -approach_b_score = evaluate_approach(ApproachB, test_examples) - -puts "Approach A accuracy: #{approach_a_score}" -puts "Approach B accuracy: #{approach_b_score}" -``` - -## Observability - -Track your LLM application's performance, token usage, and behavior in production. - -### OpenTelemetry Integration - -DSPy.rb automatically integrates with OpenTelemetry when configured: - -```ruby -require 'opentelemetry/sdk' -require 'dspy' - -# Configure OpenTelemetry -OpenTelemetry::SDK.configure do |c| - c.service_name = 'my-dspy-app' - c.use_all # Use all available instrumentation -end - -# DSPy automatically creates traces for predictions -predictor = DSPy::Predict.new(MySignature) -result = predictor.forward(input: 'data') -# Traces are automatically sent to your OpenTelemetry collector -``` - -### Langfuse Integration - -Track detailed LLM execution traces with Langfuse: - -```ruby -require 'dspy/langfuse' - -# Configure Langfuse -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) - c.langfuse = { - public_key: ENV['LANGFUSE_PUBLIC_KEY'], - secret_key: ENV['LANGFUSE_SECRET_KEY'], - host: ENV['LANGFUSE_HOST'] || 'https://cloud.langfuse.com' - } -end - -# All predictions are automatically traced -predictor = DSPy::Predict.new(MySignature) -result = predictor.forward(input: 'data') -# View detailed traces in Langfuse dashboard -``` - -### Manual Token Tracking - -Track token usage without external services: - -```ruby -class TokenTracker - def initialize - @total_tokens = 0 - @request_count = 0 - end - - def track_prediction(predictor, input) - start_time = Time.now - result = predictor.forward(input) - duration = Time.now - start_time - - # Get token usage from response metadata - tokens = result.metadata[:usage][:total_tokens] rescue 0 - @total_tokens += tokens - @request_count += 1 - - puts "Request ##{@request_count}: #{tokens} tokens in #{duration}s" - puts "Total tokens used: #{@total_tokens}" - - result - end -end - -# Usage -tracker = TokenTracker.new -predictor = DSPy::Predict.new(MySignature) - -result = tracker.track_prediction(predictor, { input: 'data' }) -``` - -### Custom Logging - -Add detailed logging to your modules: - -```ruby -class EmailClassifier < DSPy::Module - def initialize - super - @predictor = DSPy::ChainOfThought.new(EmailClassificationSignature) - @logger = Logger.new(STDOUT) - end - - def forward(input) - @logger.info "Classifying email: #{input[:email_subject]}" - - start_time = Time.now - result = @predictor.forward(input) - duration = Time.now - start_time - - @logger.info "Classification: #{result[:category]} (#{duration}s)" - - if result[:reasoning] - @logger.debug "Reasoning: #{result[:reasoning]}" - end - - result - rescue => e - @logger.error "Classification failed: #{e.message}" - raise - end -end -``` - -### Performance Monitoring - -Monitor latency and performance metrics: - -```ruby -class PerformanceMonitor - def initialize - @metrics = { - total_requests: 0, - total_duration: 0.0, - errors: 0, - success_count: 0 - } - end - - def monitor_request - start_time = Time.now - @metrics[:total_requests] += 1 - - begin - result = yield - @metrics[:success_count] += 1 - result - rescue => e - @metrics[:errors] += 1 - raise - ensure - duration = Time.now - start_time - @metrics[:total_duration] += duration - - if @metrics[:total_requests] % 10 == 0 - print_stats - end - end - end - - def print_stats - avg_duration = @metrics[:total_duration] / @metrics[:total_requests] - success_rate = @metrics[:success_count].to_f / @metrics[:total_requests] - - puts "\n=== Performance Stats ===" - puts "Total requests: #{@metrics[:total_requests]}" - puts "Average duration: #{avg_duration.round(3)}s" - puts "Success rate: #{(success_rate * 100).round(2)}%" - puts "Errors: #{@metrics[:errors]}" - puts "========================\n" - end -end - -# Usage -monitor = PerformanceMonitor.new -predictor = DSPy::Predict.new(MySignature) - -result = monitor.monitor_request do - predictor.forward(input: 'data') -end -``` - -### Error Rate Tracking - -Monitor and alert on error rates: - -```ruby -class ErrorRateMonitor - def initialize(alert_threshold: 0.1) - @alert_threshold = alert_threshold - @recent_results = [] - @window_size = 100 - end - - def track_result(success:) - @recent_results << success - @recent_results.shift if @recent_results.size > @window_size - - error_rate = calculate_error_rate - alert_if_needed(error_rate) - - error_rate - end - - private - - def calculate_error_rate - failures = @recent_results.count(false) - failures.to_f / @recent_results.size - end - - def alert_if_needed(error_rate) - if error_rate > @alert_threshold - puts "⚠️ ALERT: Error rate #{(error_rate * 100).round(2)}% exceeds threshold!" - # Send notification, page oncall, etc. - end - end -end -``` - -## Best Practices - -### 1. Start with Tests - -Write tests before optimizing: - -```ruby -# Define test cases first -test_cases = [ - { input: {...}, expected: {...} }, - # More test cases... -] - -# Ensure baseline functionality -test_cases.each do |tc| - result = module.forward(tc[:input]) - assert result[:category] == tc[:expected][:category] -end - -# Then optimize -optimized = optimizer.compile(module, trainset: test_cases) -``` - -### 2. Use Meaningful Metrics - -Define metrics that align with business goals: - -```ruby -def business_aligned_metric(example, prediction) - # High-priority errors are more costly - if example[:expected_output][:priority] == 'High' - return prediction[:priority] == 'High' ? 1.0 : 0.0 + score = predicted == expected ? 1.0 : 0.0 + feedback = if score == 1.0 + "Correct (#{expected}) for: \"#{example.input_values[:text][0..60]}\"" else - return prediction[:category] == example[:expected_output][:category] ? 0.8 : 0.0 + "Misclassified (expected #{expected}, got #{predicted}) for: \"#{example.input_values[:text][0..60]}\"" end + + DSPy::Prediction.new(score: score, feedback: feedback) end ``` -### 3. Monitor in Production +Keep the score in `[0, 1]`. Always include a short feedback message explaining what happened -- GEPA hands this text to the reflection model so it can reason about failures. -Always track production performance: +### Feedback maps + +`feedback_map` targets individual predictors inside a composite module. Each entry receives keyword arguments and returns a `DSPy::Prediction`: ```ruby -class ProductionModule < DSPy::Module - def initialize - super - @predictor = DSPy::ChainOfThought.new(MySignature) - @monitor = PerformanceMonitor.new - @error_tracker = ErrorRateMonitor.new - end +feedback_map = { + 'self' => lambda do |predictor_output:, predictor_inputs:, module_inputs:, module_outputs:, captured_trace:| + expected = module_inputs.expected_values[:label] + predicted = predictor_output.label - def forward(input) - @monitor.monitor_request do - result = @predictor.forward(input) - @error_tracker.track_result(success: true) - result - rescue => e - @error_tracker.track_result(success: false) - raise - end + DSPy::Prediction.new( + score: predicted == expected ? 1.0 : 0.0, + feedback: "Classifier saw \"#{predictor_inputs[:text][0..80]}\" -> #{predicted} (expected #{expected})" + ) end -end +} ``` -### 4. Version Your Modules +For single-predictor programs, key the map with `'self'`. For multi-predictor chains, add entries per component so the reflection LM sees localized context at each step. Omit `feedback_map` entirely if the top-level metric already covers the basics. -Track which version of your module is deployed: +### Configuring the teleprompter ```ruby -class EmailClassifierV2 < DSPy::Module - VERSION = '2.1.0' +teleprompter = DSPy::Teleprompt::GEPA.new( + metric: metric, + reflection_lm: DSPy::ReflectionLM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']), + feedback_map: feedback_map, + config: { + max_metric_calls: 600, + minibatch_size: 6, + skip_perfect_score: false + } +) +``` - def initialize - super - @predictor = DSPy::ChainOfThought.new(EmailClassificationSignature) - end +Key configuration knobs: - def forward(input) - result = @predictor.forward(input) - result.merge(model_version: VERSION) - end +| Knob | Purpose | +|----------------------|-------------------------------------------------------------------------------------------| +| `max_metric_calls` | Hard budget on evaluation calls. Set to at least the validation set size plus a few minibatches. | +| `minibatch_size` | Examples per reflective replay batch. Smaller = cheaper iterations, noisier scores. | +| `skip_perfect_score` | Set `true` to stop early when a candidate reaches score `1.0`. | + +### Minibatch sizing + +| Goal | Suggested size | Rationale | +|-------------------------------------------------|----------------|------------------------------------------------------------| +| Explore many candidates within a tight budget | 3--6 | Cheap iterations, more prompt variants, noisier metrics. | +| Stable metrics when each rollout is costly | 8--12 | Smoother scores, fewer candidates unless budget is raised. | +| Investigate specific failure modes | 3--4 then 8+ | Start with breadth, increase once patterns emerge. | + +### Compile and evaluate + +```ruby +program = DSPy::Predict.new(MySignature) + +result = teleprompter.compile(program, trainset: train, valset: val) +optimized_program = result.optimized_program + +test_metrics = evaluate(optimized_program, test) +``` + +The `result` object exposes: +- `optimized_program` -- predictor with updated instruction and few-shot examples. +- `best_score_value` -- validation score for the best candidate. +- `metadata` -- candidate counts, trace hashes, and telemetry IDs. + +### Reflection LM + +Swap `DSPy::ReflectionLM` for any callable object that accepts the reflection prompt hash and returns a string. The default reflection signature extracts the new instruction from triple backticks in the response. + +### Experiment tracking + +Plug `GEPA::Logging::ExperimentTracker` into a persistence layer: + +```ruby +tracker = GEPA::Logging::ExperimentTracker.new +tracker.with_subscriber { |event| MyModel.create!(payload: event) } + +teleprompter = DSPy::Teleprompt::GEPA.new( + metric: metric, + reflection_lm: reflection_lm, + experiment_tracker: tracker, + config: { max_metric_calls: 900 } +) +``` + +The tracker emits Pareto update events, merge decisions, and candidate evolution records as JSONL. + +### Pareto frontier + +GEPA maintains a diverse candidate pool and samples from the Pareto frontier instead of mutating only the top-scoring program. This balances exploration and prevents the search from collapsing onto a single lineage. + +Enable the merge proposer after multiple strong lineages emerge: + +```ruby +config: { + max_metric_calls: 900, + enable_merge_proposer: true +} +``` + +Premature merges eat budget without meaningful gains. Gate merge on having several validated candidates first. + +### Advanced options + +- `acceptance_strategy:` -- plug in bespoke Pareto filters or early-stop heuristics. +- Telemetry spans emit via `GEPA::Telemetry`. Enable global observability with `DSPy.configure { |c| c.observability = true }` to stream spans to an OpenTelemetry exporter. + +--- + +## Evaluation Framework + +`DSPy::Evals` provides batch evaluation of predictors against test datasets with built-in and custom metrics. + +### Basic usage + +```ruby +metric = proc do |example, prediction| + prediction.answer == example.expected_values[:answer] +end + +evaluator = DSPy::Evals.new(predictor, metric: metric) + +result = evaluator.evaluate( + test_examples, + display_table: true, + display_progress: true +) + +puts "Pass rate: #{(result.pass_rate * 100).round(1)}%" +puts "Passed: #{result.passed_examples}/#{result.total_examples}" +``` + +### DSPy::Example + +Convert raw data into `DSPy::Example` instances before passing to optimizers or evaluators. Each example carries `input_values` and `expected_values`: + +```ruby +examples = rows.map do |row| + DSPy::Example.new( + input_values: { text: row[:text] }, + expected_values: { label: row[:label] } + ) +end + +train, val, test = split_examples(examples, train_ratio: 0.6, val_ratio: 0.2, seed: 42) +``` + +Hold back a test set from the optimization loop. Optimizers work on train/val; only the test set proves generalization. + +### Built-in metrics + +```ruby +# Exact match -- prediction must exactly equal expected value +metric = DSPy::Metrics.exact_match(field: :answer, case_sensitive: true) + +# Contains -- prediction must contain expected substring +metric = DSPy::Metrics.contains(field: :answer, case_sensitive: false) + +# Numeric difference -- numeric output within tolerance +metric = DSPy::Metrics.numeric_difference(field: :answer, tolerance: 0.01) + +# Composite AND -- all sub-metrics must pass +metric = DSPy::Metrics.composite_and( + DSPy::Metrics.exact_match(field: :answer), + DSPy::Metrics.contains(field: :reasoning) +) +``` + +### Custom metrics + +```ruby +quality_metric = lambda do |example, prediction| + return false unless prediction + + score = 0.0 + score += 0.5 if prediction.answer == example.expected_values[:answer] + score += 0.3 if prediction.explanation && prediction.explanation.length > 50 + score += 0.2 if prediction.confidence && prediction.confidence > 0.8 + score >= 0.7 +end + +evaluator = DSPy::Evals.new(predictor, metric: quality_metric) +``` + +Access prediction fields with dot notation (`prediction.answer`), not hash notation. + +### Observability hooks + +Register callbacks without editing the evaluator: + +```ruby +DSPy::Evals.before_example do |payload| + example = payload[:example] + DSPy.logger.info("Evaluating example #{example.id}") if example.respond_to?(:id) +end + +DSPy::Evals.after_batch do |payload| + result = payload[:result] + Langfuse.event( + name: 'eval.batch', + metadata: { + total: result.total_examples, + passed: result.passed_examples, + score: result.score + } + ) end ``` + +Available hooks: `before_example`, `after_example`, `before_batch`, `after_batch`. + +### Langfuse score export + +Enable `export_scores: true` to emit `score.create` events for each evaluated example and a batch score at the end: + +```ruby +evaluator = DSPy::Evals.new( + predictor, + metric: metric, + export_scores: true, + score_name: 'qa_accuracy' # default: 'evaluation' +) + +result = evaluator.evaluate(test_examples) +# Emits per-example scores + overall batch score via DSPy::Scores::Exporter +``` + +Scores attach to the current trace context automatically and flow to Langfuse asynchronously. + +### Evaluation results + +```ruby +result = evaluator.evaluate(test_examples) + +result.score # Overall score (0.0 to 1.0) +result.passed_count # Examples that passed +result.failed_count # Examples that failed +result.error_count # Examples that errored + +result.results.each do |r| + r.passed # Boolean + r.score # Numeric score + r.error # Error message if the example errored +end +``` + +### Integration with optimizers + +```ruby +metric = proc do |example, prediction| + expected = example.expected_values[:answer].to_s.strip.downcase + predicted = prediction.answer.to_s.strip.downcase + !expected.empty? && predicted.include?(expected) +end + +optimizer = DSPy::Teleprompt::MIPROv2::AutoMode.medium(metric: metric) + +result = optimizer.compile( + DSPy::Predict.new(QASignature), + trainset: train_examples, + valset: val_examples +) + +evaluator = DSPy::Evals.new(result.optimized_program, metric: metric) +test_result = evaluator.evaluate(test_examples, display_table: true) +puts "Test accuracy: #{(test_result.pass_rate * 100).round(2)}%" +``` + +--- + +## Storage System + +`DSPy::Storage` persists optimization results, tracks history, and manages multiple versions of optimized programs. + +### ProgramStorage (low-level) + +```ruby +storage = DSPy::Storage::ProgramStorage.new(storage_path: "./dspy_storage") + +# Save +saved = storage.save_program( + result.optimized_program, + result, + metadata: { + signature_class: 'ClassifyText', + optimizer: 'MIPROv2', + examples_count: examples.size + } +) +puts "Stored with ID: #{saved.program_id}" + +# Load +saved = storage.load_program(program_id) +predictor = saved.program +score = saved.optimization_result[:best_score_value] + +# List +storage.list_programs.each do |p| + puts "#{p[:program_id]} -- score: #{p[:best_score]} -- saved: #{p[:saved_at]}" +end +``` + +### StorageManager (recommended) + +```ruby +manager = DSPy::Storage::StorageManager.new + +# Save with tags +saved = manager.save_optimization_result( + result, + tags: ['production', 'sentiment-analysis'], + description: 'Optimized sentiment classifier v2' +) + +# Find programs +programs = manager.find_programs( + optimizer: 'MIPROv2', + min_score: 0.85, + tags: ['production'] +) + +recent = manager.find_programs( + max_age_days: 7, + signature_class: 'ClassifyText' +) + +# Get best program for a signature +best = manager.get_best_program('ClassifyText') +predictor = best.program +``` + +Global shorthand: + +```ruby +DSPy::Storage::StorageManager.save(result, metadata: { version: '2.0' }) +DSPy::Storage::StorageManager.load(program_id) +DSPy::Storage::StorageManager.best('ClassifyText') +``` + +### Checkpoints + +Create and restore checkpoints during long-running optimizations: + +```ruby +# Save a checkpoint +manager.create_checkpoint( + current_result, + 'iteration_50', + metadata: { iteration: 50, current_score: 0.87 } +) + +# Restore +restored = manager.restore_checkpoint('iteration_50') +program = restored.program + +# Auto-checkpoint every N iterations +if iteration % 10 == 0 + manager.create_checkpoint(current_result, "auto_checkpoint_#{iteration}") +end +``` + +### Import and export + +Share programs between environments: + +```ruby +storage = DSPy::Storage::ProgramStorage.new + +# Export +storage.export_programs(['abc123', 'def456'], './export_backup.json') + +# Import +imported = storage.import_programs('./export_backup.json') +puts "Imported #{imported.size} programs" +``` + +### Optimization history + +```ruby +history = manager.get_optimization_history + +history[:summary][:total_programs] +history[:summary][:avg_score] + +history[:optimizer_stats].each do |optimizer, stats| + puts "#{optimizer}: #{stats[:count]} programs, best: #{stats[:best_score]}" +end + +history[:trends][:improvement_percentage] +``` + +### Program comparison + +```ruby +comparison = manager.compare_programs(id_a, id_b) +comparison[:comparison][:score_difference] +comparison[:comparison][:better_program] +comparison[:comparison][:age_difference_hours] +``` + +### Storage configuration + +```ruby +config = DSPy::Storage::StorageManager::StorageConfig.new +config.storage_path = Rails.root.join('dspy_storage') +config.auto_save = true +config.save_intermediate_results = false +config.max_stored_programs = 100 + +manager = DSPy::Storage::StorageManager.new(config: config) +``` + +### Cleanup + +Remove old programs. Cleanup retains the best performing and most recent programs using a weighted score (70% performance, 30% recency): + +```ruby +deleted_count = manager.cleanup_old_programs +``` + +### Storage events + +The storage system emits structured log events for monitoring: +- `dspy.storage.save_start`, `dspy.storage.save_complete`, `dspy.storage.save_error` +- `dspy.storage.load_start`, `dspy.storage.load_complete`, `dspy.storage.load_error` +- `dspy.storage.delete`, `dspy.storage.export`, `dspy.storage.import`, `dspy.storage.cleanup` + +### File layout + +``` +dspy_storage/ + programs/ + abc123def456.json + 789xyz012345.json + history.json +``` + +--- + +## API rules + +- Call predictors with `.call()`, not `.forward()`. +- Access prediction fields with dot notation (`result.answer`), not hash notation (`result[:answer]`). +- GEPA metrics return `DSPy::Prediction.new(score:, feedback:)`, not a boolean. +- MIPROv2 metrics may return `true`/`false`, a numeric score, or `DSPy::Prediction`. diff --git a/plugins/compound-engineering/skills/dspy-ruby/references/providers.md b/plugins/compound-engineering/skills/dspy-ruby/references/providers.md index 5dd56f3..31bf1a1 100644 --- a/plugins/compound-engineering/skills/dspy-ruby/references/providers.md +++ b/plugins/compound-engineering/skills/dspy-ruby/references/providers.md @@ -1,338 +1,418 @@ # DSPy.rb LLM Providers -## Supported Providers +## Adapter Architecture -DSPy.rb provides unified support across multiple LLM providers through adapter gems that automatically load when installed. - -### Provider Overview - -- **OpenAI**: GPT-4, GPT-4o, GPT-4o-mini, GPT-3.5-turbo -- **Anthropic**: Claude 3 family (Sonnet, Opus, Haiku), Claude 3.5 Sonnet -- **Google Gemini**: Gemini 1.5 Pro, Gemini 1.5 Flash, other versions -- **Ollama**: Local model support via OpenAI compatibility layer -- **OpenRouter**: Unified multi-provider API for 200+ models - -## Configuration - -### Basic Setup +DSPy.rb ships provider SDKs as separate adapter gems. Install only the adapters the project needs. Each adapter gem depends on the official SDK for its provider and auto-loads when present -- no explicit `require` necessary. ```ruby -require 'dspy' - -DSPy.configure do |c| - c.lm = DSPy::LM.new('provider/model-name', api_key: ENV['API_KEY']) -end +# Gemfile +gem 'dspy' # core framework (no provider SDKs) +gem 'dspy-openai' # OpenAI, OpenRouter, Ollama +gem 'dspy-anthropic' # Claude +gem 'dspy-gemini' # Gemini +gem 'dspy-ruby_llm' # RubyLLM unified adapter (12+ providers) ``` -### OpenAI Configuration +--- -**Required gem**: `dspy-openai` +## Per-Provider Adapters + +### dspy-openai + +Covers any endpoint that speaks the OpenAI chat-completions protocol: OpenAI itself, OpenRouter, and Ollama. + +**SDK dependency:** `openai ~> 0.17` ```ruby -DSPy.configure do |c| - # GPT-4o Mini (recommended for development) - c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) +# OpenAI +lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) - # GPT-4o (more capable) - c.lm = DSPy::LM.new('openai/gpt-4o', api_key: ENV['OPENAI_API_KEY']) +# OpenRouter -- access 200+ models behind a single key +lm = DSPy::LM.new('openrouter/x-ai/grok-4-fast:free', + api_key: ENV['OPENROUTER_API_KEY'] +) - # GPT-4 Turbo - c.lm = DSPy::LM.new('openai/gpt-4-turbo', api_key: ENV['OPENAI_API_KEY']) -end -``` +# Ollama -- local models, no API key required +lm = DSPy::LM.new('ollama/llama3.2') -**Environment variable**: `OPENAI_API_KEY` - -### Anthropic Configuration - -**Required gem**: `dspy-anthropic` - -```ruby -DSPy.configure do |c| - # Claude 3.5 Sonnet (latest, most capable) - c.lm = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY']) - - # Claude 3 Opus (most capable in Claude 3 family) - c.lm = DSPy::LM.new('anthropic/claude-3-opus-20240229', - api_key: ENV['ANTHROPIC_API_KEY']) - - # Claude 3 Sonnet (balanced) - c.lm = DSPy::LM.new('anthropic/claude-3-sonnet-20240229', - api_key: ENV['ANTHROPIC_API_KEY']) - - # Claude 3 Haiku (fast, cost-effective) - c.lm = DSPy::LM.new('anthropic/claude-3-haiku-20240307', - api_key: ENV['ANTHROPIC_API_KEY']) -end -``` - -**Environment variable**: `ANTHROPIC_API_KEY` - -### Google Gemini Configuration - -**Required gem**: `dspy-gemini` - -```ruby -DSPy.configure do |c| - # Gemini 1.5 Pro (most capable) - c.lm = DSPy::LM.new('gemini/gemini-1.5-pro', - api_key: ENV['GOOGLE_API_KEY']) - - # Gemini 1.5 Flash (faster, cost-effective) - c.lm = DSPy::LM.new('gemini/gemini-1.5-flash', - api_key: ENV['GOOGLE_API_KEY']) -end -``` - -**Environment variable**: `GOOGLE_API_KEY` or `GEMINI_API_KEY` - -### Ollama Configuration - -**Required gem**: None (uses OpenAI compatibility layer) - -```ruby -DSPy.configure do |c| - # Local Ollama instance - c.lm = DSPy::LM.new('ollama/llama3.1', - base_url: 'http://localhost:11434') - - # Other Ollama models - c.lm = DSPy::LM.new('ollama/mistral') - c.lm = DSPy::LM.new('ollama/codellama') -end -``` - -**Note**: Ensure Ollama is running locally: `ollama serve` - -### OpenRouter Configuration - -**Required gem**: `dspy-openai` (uses OpenAI adapter) - -```ruby -DSPy.configure do |c| - # Access 200+ models through OpenRouter - c.lm = DSPy::LM.new('openrouter/anthropic/claude-3.5-sonnet', - api_key: ENV['OPENROUTER_API_KEY'], - base_url: 'https://openrouter.ai/api/v1') - - # Other examples - c.lm = DSPy::LM.new('openrouter/google/gemini-pro') - c.lm = DSPy::LM.new('openrouter/meta-llama/llama-3.1-70b-instruct') -end -``` - -**Environment variable**: `OPENROUTER_API_KEY` - -## Provider Compatibility Matrix - -### Feature Support - -| Feature | OpenAI | Anthropic | Gemini | Ollama | -|---------|--------|-----------|--------|--------| -| Structured Output | ✅ | ✅ | ✅ | ✅ | -| Vision (Images) | ✅ | ✅ | ✅ | ⚠️ Limited | -| Image URLs | ✅ | ❌ | ❌ | ❌ | -| Tool Calling | ✅ | ✅ | ✅ | Varies | -| Streaming | ❌ | ❌ | ❌ | ❌ | -| Function Calling | ✅ | ✅ | ✅ | Varies | - -**Legend**: ✅ Full support | ⚠️ Partial support | ❌ Not supported - -### Vision Capabilities - -**Image URLs**: Only OpenAI supports direct URL references. For other providers, load images as base64 or from files. - -```ruby -# OpenAI - supports URLs -DSPy::Image.from_url("https://example.com/image.jpg") - -# Anthropic, Gemini - use file or base64 -DSPy::Image.from_file("path/to/image.jpg") -DSPy::Image.from_base64(base64_data, mime_type: "image/jpeg") -``` - -**Ollama**: Limited multimodal functionality. Check specific model capabilities. - -## Advanced Configuration - -### Custom Parameters - -Pass provider-specific parameters during configuration: - -```ruby -DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o', - api_key: ENV['OPENAI_API_KEY'], - temperature: 0.7, - max_tokens: 2000, - top_p: 0.9 - ) -end -``` - -### Multiple Providers - -Use different models for different tasks: - -```ruby -# Fast model for simple tasks -fast_lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) - -# Powerful model for complex tasks -powerful_lm = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY']) - -# Use different models in different modules -class SimpleClassifier < DSPy::Module - def initialize - super - DSPy.configure { |c| c.lm = fast_lm } - @predictor = DSPy::Predict.new(SimpleSignature) - end -end - -class ComplexAnalyzer < DSPy::Module - def initialize - super - DSPy.configure { |c| c.lm = powerful_lm } - @predictor = DSPy::ChainOfThought.new(ComplexSignature) - end -end -``` - -### Per-Request Configuration - -Override configuration for specific predictions: - -```ruby -predictor = DSPy::Predict.new(MySignature) - -# Use default configuration -result1 = predictor.forward(input: "data") - -# Override temperature for this request -result2 = predictor.forward( - input: "data", - config: { temperature: 0.2 } # More deterministic +# Remote Ollama instance +lm = DSPy::LM.new('ollama/llama3.2', + base_url: 'https://my-ollama.example.com/v1', + api_key: 'optional-auth-token' ) ``` -## Cost Optimization +All three sub-adapters share the same request handling, structured-output support, and error reporting. Swap providers without changing higher-level DSPy code. -### Model Selection Strategy - -1. **Development**: Use cheaper, faster models (gpt-4o-mini, claude-3-haiku, gemini-1.5-flash) -2. **Production Simple Tasks**: Continue with cheaper models if quality is sufficient -3. **Production Complex Tasks**: Upgrade to more capable models (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) -4. **Local Development**: Use Ollama for privacy and zero API costs - -### Example Cost-Conscious Setup +For OpenRouter models that lack native structured-output support, disable it explicitly: ```ruby -# Development environment -if Rails.env.development? - DSPy.configure do |c| - c.lm = DSPy::LM.new('ollama/llama3.1') # Free, local - end -elsif Rails.env.test? - DSPy.configure do |c| - c.lm = DSPy::LM.new('openai/gpt-4o-mini', # Cheap for testing - api_key: ENV['OPENAI_API_KEY']) - end -else # production - DSPy.configure do |c| - c.lm = DSPy::LM.new('anthropic/claude-3-5-sonnet-20241022', - api_key: ENV['ANTHROPIC_API_KEY']) - end +lm = DSPy::LM.new('openrouter/deepseek/deepseek-chat-v3.1:free', + api_key: ENV['OPENROUTER_API_KEY'], + structured_outputs: false +) +``` + +### dspy-anthropic + +Provides the Claude adapter. Install it for any `anthropic/*` model id. + +**SDK dependency:** `anthropic ~> 1.12` + +```ruby +lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', + api_key: ENV['ANTHROPIC_API_KEY'] +) +``` + +Structured outputs default to tool-based JSON extraction (`structured_outputs: true`). Set `structured_outputs: false` to use enhanced-prompting extraction instead. + +```ruby +# Tool-based extraction (default, most reliable) +lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', + api_key: ENV['ANTHROPIC_API_KEY'], + structured_outputs: true +) + +# Enhanced prompting extraction +lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', + api_key: ENV['ANTHROPIC_API_KEY'], + structured_outputs: false +) +``` + +### dspy-gemini + +Provides the Gemini adapter. Install it for any `gemini/*` model id. + +**SDK dependency:** `gemini-ai ~> 4.3` + +```ruby +lm = DSPy::LM.new('gemini/gemini-2.5-flash', + api_key: ENV['GEMINI_API_KEY'] +) +``` + +**Environment variable:** `GEMINI_API_KEY` (also accepts `GOOGLE_API_KEY`). + +--- + +## RubyLLM Unified Adapter + +The `dspy-ruby_llm` gem provides a single adapter that routes to 12+ providers through [RubyLLM](https://rubyllm.com). Use it when a project talks to multiple providers or needs access to Bedrock, VertexAI, DeepSeek, or Mistral without dedicated adapter gems. + +**SDK dependency:** `ruby_llm ~> 1.3` + +### Model ID Format + +Prefix every model id with `ruby_llm/`: + +```ruby +lm = DSPy::LM.new('ruby_llm/gpt-4o-mini') +lm = DSPy::LM.new('ruby_llm/claude-sonnet-4-20250514') +lm = DSPy::LM.new('ruby_llm/gemini-2.5-flash') +``` + +The adapter detects the provider from RubyLLM's model registry automatically. For models not in the registry, pass `provider:` explicitly: + +```ruby +lm = DSPy::LM.new('ruby_llm/llama3.2', provider: 'ollama') +lm = DSPy::LM.new('ruby_llm/anthropic/claude-3-opus', + api_key: ENV['OPENROUTER_API_KEY'], + provider: 'openrouter' +) +``` + +### Using Existing RubyLLM Configuration + +When RubyLLM is already configured globally, omit the `api_key:` argument. DSPy reuses the global config automatically: + +```ruby +RubyLLM.configure do |config| + config.openai_api_key = ENV['OPENAI_API_KEY'] + config.anthropic_api_key = ENV['ANTHROPIC_API_KEY'] +end + +# No api_key needed -- picks up the global config +DSPy.configure do |c| + c.lm = DSPy::LM.new('ruby_llm/gpt-4o-mini') end ``` -## Provider-Specific Best Practices +When an `api_key:` (or any of `base_url:`, `timeout:`, `max_retries:`) is passed, DSPy creates a **scoped context** instead of reusing the global config. -### OpenAI +### Cloud-Hosted Providers (Bedrock, VertexAI) -- Use `gpt-4o-mini` for development and simple tasks -- Use `gpt-4o` for production complex tasks -- Best vision support including URL loading -- Excellent function calling capabilities - -### Anthropic - -- Claude 3.5 Sonnet is currently the most capable model -- Excellent for complex reasoning and analysis -- Strong safety features and helpful outputs -- Requires base64 for images (no URL support) - -### Google Gemini - -- Gemini 1.5 Pro for complex tasks, Flash for speed -- Strong multimodal capabilities -- Good balance of cost and performance -- Requires base64 for images - -### Ollama - -- Best for privacy-sensitive applications -- Zero API costs -- Requires local hardware resources -- Limited multimodal support depending on model -- Good for development and testing - -## Troubleshooting - -### API Key Issues +Configure RubyLLM globally first, then reference the model: ```ruby -# Verify API key is set -if ENV['OPENAI_API_KEY'].nil? - raise "OPENAI_API_KEY environment variable not set" +# AWS Bedrock +RubyLLM.configure do |c| + c.bedrock_api_key = ENV['AWS_ACCESS_KEY_ID'] + c.bedrock_secret_key = ENV['AWS_SECRET_ACCESS_KEY'] + c.bedrock_region = 'us-east-1' end +lm = DSPy::LM.new('ruby_llm/anthropic.claude-3-5-sonnet', provider: 'bedrock') -# Test connection -begin - DSPy.configure { |c| c.lm = DSPy::LM.new('openai/gpt-4o-mini', - api_key: ENV['OPENAI_API_KEY']) } - predictor = DSPy::Predict.new(TestSignature) - predictor.forward(test: "data") - puts "✅ Connection successful" -rescue => e - puts "❌ Connection failed: #{e.message}" +# Google VertexAI +RubyLLM.configure do |c| + c.vertexai_project_id = 'your-project-id' + c.vertexai_location = 'us-central1' end +lm = DSPy::LM.new('ruby_llm/gemini-pro', provider: 'vertexai') ``` -### Rate Limiting +### Supported Providers Table -Handle rate limits gracefully: +| Provider | Example Model ID | Notes | +|-------------|--------------------------------------------|---------------------------------| +| OpenAI | `ruby_llm/gpt-4o-mini` | Auto-detected from registry | +| Anthropic | `ruby_llm/claude-sonnet-4-20250514` | Auto-detected from registry | +| Gemini | `ruby_llm/gemini-2.5-flash` | Auto-detected from registry | +| DeepSeek | `ruby_llm/deepseek-chat` | Auto-detected from registry | +| Mistral | `ruby_llm/mistral-large` | Auto-detected from registry | +| Ollama | `ruby_llm/llama3.2` | Use `provider: 'ollama'` | +| AWS Bedrock | `ruby_llm/anthropic.claude-3-5-sonnet` | Configure RubyLLM globally | +| VertexAI | `ruby_llm/gemini-pro` | Configure RubyLLM globally | +| OpenRouter | `ruby_llm/anthropic/claude-3-opus` | Use `provider: 'openrouter'` | +| Perplexity | `ruby_llm/llama-3.1-sonar-large` | Use `provider: 'perplexity'` | +| GPUStack | `ruby_llm/model-name` | Use `provider: 'gpustack'` | + +--- + +## Rails Initializer Pattern + +Configure DSPy inside an `after_initialize` block so Rails credentials and environment are fully loaded: ```ruby -def call_with_retry(predictor, input, max_retries: 3) - retries = 0 - begin - predictor.forward(input) - rescue RateLimitError => e - retries += 1 - if retries < max_retries - sleep(2 ** retries) # Exponential backoff - retry +# config/initializers/dspy.rb +Rails.application.config.after_initialize do + return if Rails.env.test? # skip in test -- use VCR cassettes instead + + DSPy.configure do |config| + config.lm = DSPy::LM.new( + 'openai/gpt-4o-mini', + api_key: Rails.application.credentials.openai_api_key, + structured_outputs: true + ) + + config.logger = if Rails.env.production? + Dry.Logger(:dspy, formatter: :json) do |logger| + logger.add_backend(stream: Rails.root.join("log/dspy.log")) + end else - raise + Dry.Logger(:dspy) do |logger| + logger.add_backend(level: :debug, stream: $stdout) + end end end end ``` -### Model Not Found +Key points: -Ensure the correct gem is installed: +- Wrap in `after_initialize` so `Rails.application.credentials` is available. +- Return early in the test environment. Rely on VCR cassettes for deterministic LLM responses. +- Set `structured_outputs: true` (the default) for provider-native JSON extraction. +- Use `Dry.Logger` with `:json` formatter in production for structured log parsing. + +--- + +## Fiber-Local LM Context + +`DSPy.with_lm` sets a temporary language-model override scoped to the current Fiber. Every predictor call inside the block uses the override; outside the block the previous LM takes effect again. + +```ruby +fast = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) +powerful = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', api_key: ENV['ANTHROPIC_API_KEY']) + +classifier = Classifier.new + +# Uses the global LM +result = classifier.call(text: "Hello") + +# Temporarily switch to the fast model +DSPy.with_lm(fast) do + result = classifier.call(text: "Hello") # uses gpt-4o-mini +end + +# Temporarily switch to the powerful model +DSPy.with_lm(powerful) do + result = classifier.call(text: "Hello") # uses claude-sonnet-4 +end +``` + +### LM Resolution Hierarchy + +DSPy resolves the active language model in this order: + +1. **Instance-level LM** -- set directly on a module instance via `configure` +2. **Fiber-local LM** -- set via `DSPy.with_lm` +3. **Global LM** -- set via `DSPy.configure` + +Instance-level configuration always wins, even inside a `DSPy.with_lm` block: + +```ruby +classifier = Classifier.new +classifier.configure { |c| c.lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', api_key: ENV['ANTHROPIC_API_KEY']) } + +fast = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) + +DSPy.with_lm(fast) do + classifier.call(text: "Test") # still uses claude-sonnet-4 (instance-level wins) +end +``` + +### configure_predictor for Fine-Grained Agent Control + +Complex agents (`ReAct`, `CodeAct`, `DeepResearch`, `DeepSearch`) contain internal predictors. Use `configure` for a blanket override and `configure_predictor` to target a specific sub-predictor: + +```ruby +agent = DSPy::ReAct.new(MySignature, tools: tools) + +# Set a default LM for the agent and all its children +agent.configure { |c| c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) } + +# Override just the reasoning predictor with a more capable model +agent.configure_predictor('thought_generator') do |c| + c.lm = DSPy::LM.new('anthropic/claude-sonnet-4-20250514', api_key: ENV['ANTHROPIC_API_KEY']) +end + +result = agent.call(question: "Summarize the report") +``` + +Both methods support chaining: + +```ruby +agent + .configure { |c| c.lm = cheap_model } + .configure_predictor('thought_generator') { |c| c.lm = expensive_model } +``` + +#### Available Predictors by Agent Type + +| Agent | Internal Predictors | +|----------------------|------------------------------------------------------------------| +| `DSPy::ReAct` | `thought_generator`, `observation_processor` | +| `DSPy::CodeAct` | `code_generator`, `observation_processor` | +| `DSPy::DeepResearch` | `planner`, `synthesizer`, `qa_reviewer`, `reporter` | +| `DSPy::DeepSearch` | `seed_predictor`, `search_predictor`, `reader_predictor`, `reason_predictor` | + +#### Propagation Rules + +- Configuration propagates recursively to children and grandchildren. +- Children with an already-configured LM are **not** overwritten by a later parent `configure` call. +- Configure the parent first, then override specific children. + +--- + +## Feature-Flagged Model Selection + +Use a `FeatureFlags` module backed by ENV vars to centralize model selection. Each tool or agent reads its model from the flags, falling back to a global default. + +```ruby +module FeatureFlags + module_function + + def default_model + ENV.fetch('DSPY_DEFAULT_MODEL', 'openai/gpt-4o-mini') + end + + def default_api_key + ENV.fetch('DSPY_DEFAULT_API_KEY') { ENV.fetch('OPENAI_API_KEY', nil) } + end + + def model_for(tool_name) + env_key = "DSPY_MODEL_#{tool_name.upcase}" + ENV.fetch(env_key, default_model) + end + + def api_key_for(tool_name) + env_key = "DSPY_API_KEY_#{tool_name.upcase}" + ENV.fetch(env_key, default_api_key) + end +end +``` + +### Per-Tool Model Override + +Override an individual tool's model without touching application code: ```bash -# For OpenAI -gem install dspy-openai +# .env +DSPY_DEFAULT_MODEL=openai/gpt-4o-mini +DSPY_DEFAULT_API_KEY=sk-... -# For Anthropic -gem install dspy-anthropic +# Override the classifier to use Claude +DSPY_MODEL_CLASSIFIER=anthropic/claude-sonnet-4-20250514 +DSPY_API_KEY_CLASSIFIER=sk-ant-... -# For Gemini -gem install dspy-gemini +# Override the summarizer to use Gemini +DSPY_MODEL_SUMMARIZER=gemini/gemini-2.5-flash +DSPY_API_KEY_SUMMARIZER=... ``` + +Wire each agent to its flag at initialization: + +```ruby +class ClassifierAgent < DSPy::Module + def initialize + super + model = FeatureFlags.model_for('classifier') + api_key = FeatureFlags.api_key_for('classifier') + + @predictor = DSPy::Predict.new(ClassifySignature) + configure { |c| c.lm = DSPy::LM.new(model, api_key: api_key) } + end + + def forward(text:) + @predictor.call(text: text) + end +end +``` + +This pattern keeps model routing declarative and avoids scattering `DSPy::LM.new` calls across the codebase. + +--- + +## Compatibility Matrix + +Feature support across direct adapter gems. All features listed assume `structured_outputs: true` (the default). + +| Feature | OpenAI | Anthropic | Gemini | Ollama | OpenRouter | RubyLLM | +|----------------------|--------|-----------|--------|----------|------------|-------------| +| Structured Output | Native JSON mode | Tool-based extraction | Native JSON schema | OpenAI-compatible JSON | Varies by model | Via `with_schema` | +| Vision (Images) | File + URL | File + Base64 | File + Base64 | Limited | Varies | Delegates to underlying provider | +| Image URLs | Yes | No | No | No | Varies | Depends on provider | +| Tool Calling | Yes | Yes | Yes | Varies | Varies | Yes | +| Streaming | Yes | Yes | Yes | Yes | Yes | Yes | + +**Notes:** + +- **Structured Output** is enabled by default on every adapter. Set `structured_outputs: false` to fall back to enhanced-prompting extraction. +- **Vision / Image URLs:** Only OpenAI supports passing a URL directly. For Anthropic and Gemini, load images from file or Base64: + ```ruby + DSPy::Image.from_url("https://example.com/img.jpg") # OpenAI only + DSPy::Image.from_file("path/to/image.jpg") # all providers + DSPy::Image.from_base64(data, mime_type: "image/jpeg") # all providers + ``` +- **RubyLLM** delegates to the underlying provider, so feature support matches the provider column in the table. + +### Choosing an Adapter Strategy + +| Scenario | Recommended Adapter | +|-------------------------------------------|--------------------------------| +| Single provider (OpenAI, Claude, or Gemini) | Dedicated gem (`dspy-openai`, `dspy-anthropic`, `dspy-gemini`) | +| Multi-provider with per-agent model routing | `dspy-ruby_llm` | +| AWS Bedrock or Google VertexAI | `dspy-ruby_llm` | +| Local development with Ollama | `dspy-openai` (Ollama sub-adapter) or `dspy-ruby_llm` | +| OpenRouter for cost optimization | `dspy-openai` (OpenRouter sub-adapter) | + +### Current Recommended Models + +| Provider | Model ID | Use Case | +|-----------|---------------------------------------|-----------------------| +| OpenAI | `openai/gpt-4o-mini` | Fast, cost-effective | +| Anthropic | `anthropic/claude-sonnet-4-20250514` | Balanced reasoning | +| Gemini | `gemini/gemini-2.5-flash` | Fast, cost-effective | +| Ollama | `ollama/llama3.2` | Local, zero API cost | diff --git a/plugins/compound-engineering/skills/dspy-ruby/references/toolsets.md b/plugins/compound-engineering/skills/dspy-ruby/references/toolsets.md new file mode 100644 index 0000000..8c41dcd --- /dev/null +++ b/plugins/compound-engineering/skills/dspy-ruby/references/toolsets.md @@ -0,0 +1,502 @@ +# DSPy.rb Toolsets + +## Tools::Base + +`DSPy::Tools::Base` is the base class for single-purpose tools. Each subclass exposes one operation to an LLM agent through a `call` method. + +### Defining a Tool + +Set the tool's identity with the `tool_name` and `tool_description` class-level DSL methods. Define the `call` instance method with a Sorbet `sig` declaration so DSPy.rb can generate the JSON schema the LLM uses to invoke the tool. + +```ruby +class WeatherLookup < DSPy::Tools::Base + extend T::Sig + + tool_name "weather_lookup" + tool_description "Look up current weather for a given city" + + sig { params(city: String, units: T.nilable(String)).returns(String) } + def call(city:, units: nil) + # Fetch weather data and return a string summary + "72F and sunny in #{city}" + end +end +``` + +Key points: + +- Inherit from `DSPy::Tools::Base`, not `DSPy::Tool`. +- Use `tool_name` (class method) to set the name the LLM sees. Without it, the class name is lowercased as a fallback. +- Use `tool_description` (class method) to set the human-readable description surfaced in the tool schema. +- The `call` method must use **keyword arguments**. Positional arguments are supported but keyword arguments produce better schemas. +- Always attach a Sorbet `sig` to `call`. Without a signature, the generated schema has empty properties and the LLM cannot determine parameter types. + +### Schema Generation + +`call_schema_object` introspects the Sorbet signature on `call` and returns a hash representing the JSON Schema `parameters` object: + +```ruby +WeatherLookup.call_schema_object +# => { +# type: "object", +# properties: { +# city: { type: "string", description: "Parameter city" }, +# units: { type: "string", description: "Parameter units (optional)" } +# }, +# required: ["city"] +# } +``` + +`call_schema` wraps this in the full LLM tool-calling format: + +```ruby +WeatherLookup.call_schema +# => { +# type: "function", +# function: { +# name: "call", +# description: "Call the WeatherLookup tool", +# parameters: { ... } +# } +# } +``` + +### Using Tools with ReAct + +Pass tool instances in an array to `DSPy::ReAct`: + +```ruby +agent = DSPy::ReAct.new( + MySignature, + tools: [WeatherLookup.new, AnotherTool.new] +) + +result = agent.call(question: "What is the weather in Berlin?") +puts result.answer +``` + +Access output fields with dot notation (`result.answer`), not hash access (`result[:answer]`). + +--- + +## Tools::Toolset + +`DSPy::Tools::Toolset` groups multiple related methods into a single class. Each exposed method becomes an independent tool from the LLM's perspective. + +### Defining a Toolset + +```ruby +class DatabaseToolset < DSPy::Tools::Toolset + extend T::Sig + + toolset_name "db" + + tool :query, description: "Run a read-only SQL query" + tool :insert, description: "Insert a record into a table" + tool :delete, description: "Delete a record by ID" + + sig { params(sql: String).returns(String) } + def query(sql:) + # Execute read query + end + + sig { params(table: String, data: T::Hash[String, String]).returns(String) } + def insert(table:, data:) + # Insert record + end + + sig { params(table: String, id: Integer).returns(String) } + def delete(table:, id:) + # Delete record + end +end +``` + +### DSL Methods + +**`toolset_name(name)`** -- Set the prefix for all generated tool names. If omitted, the class name minus `Toolset` suffix is lowercased (e.g., `DatabaseToolset` becomes `database`). + +```ruby +toolset_name "db" +# tool :query produces a tool named "db_query" +``` + +**`tool(method_name, tool_name:, description:)`** -- Expose a method as a tool. + +- `method_name` (Symbol, required) -- the instance method to expose. +- `tool_name:` (String, optional) -- override the default `_` naming. +- `description:` (String, optional) -- description shown to the LLM. Defaults to a humanized version of the method name. + +```ruby +tool :word_count, tool_name: "text_wc", description: "Count lines, words, and characters" +# Produces a tool named "text_wc" instead of "text_word_count" +``` + +### Converting to a Tool Array + +Call `to_tools` on the class (not an instance) to get an array of `ToolProxy` objects compatible with `DSPy::Tools::Base`: + +```ruby +agent = DSPy::ReAct.new( + AnalyzeText, + tools: DatabaseToolset.to_tools +) +``` + +Each `ToolProxy` wraps one method, delegates `call` to the underlying toolset instance, and generates its own JSON schema from the method's Sorbet signature. + +### Shared State + +All tool proxies from a single `to_tools` call share one toolset instance. Store shared state (connections, caches, configuration) in the toolset's `initialize`: + +```ruby +class ApiToolset < DSPy::Tools::Toolset + extend T::Sig + + toolset_name "api" + + tool :get, description: "Make a GET request" + tool :post, description: "Make a POST request" + + sig { params(base_url: String).void } + def initialize(base_url:) + @base_url = base_url + @client = HTTP.persistent(base_url) + end + + sig { params(path: String).returns(String) } + def get(path:) + @client.get("#{@base_url}#{path}").body.to_s + end + + sig { params(path: String, body: String).returns(String) } + def post(path:, body:) + @client.post("#{@base_url}#{path}", body: body).body.to_s + end +end +``` + +--- + +## Type Safety + +Sorbet signatures on tool methods drive both JSON schema generation and automatic type coercion of LLM responses. + +### Basic Types + +```ruby +sig { params( + text: String, + count: Integer, + score: Float, + enabled: T::Boolean, + threshold: Numeric +).returns(String) } +def analyze(text:, count:, score:, enabled:, threshold:) + # ... +end +``` + +| Sorbet Type | JSON Schema | +|------------------|----------------------------------------------------| +| `String` | `{"type": "string"}` | +| `Integer` | `{"type": "integer"}` | +| `Float` | `{"type": "number"}` | +| `Numeric` | `{"type": "number"}` | +| `T::Boolean` | `{"type": "boolean"}` | +| `T::Enum` | `{"type": "string", "enum": [...]}` | +| `T::Struct` | `{"type": "object", "properties": {...}}` | +| `T::Array[Type]` | `{"type": "array", "items": {...}}` | +| `T::Hash[K, V]` | `{"type": "object", "additionalProperties": {...}}`| +| `T.nilable(Type)`| `{"type": [original, "null"]}` | +| `T.any(T1, T2)` | `{"oneOf": [{...}, {...}]}` | +| `T.class_of(X)` | `{"type": "string"}` | + +### T::Enum Parameters + +Define a `T::Enum` and reference it in a tool signature. DSPy.rb generates a JSON Schema `enum` constraint and automatically deserializes the LLM's string response into the correct enum instance. + +```ruby +class Priority < T::Enum + enums do + Low = new('low') + Medium = new('medium') + High = new('high') + Critical = new('critical') + end +end + +class Status < T::Enum + enums do + Pending = new('pending') + InProgress = new('in-progress') + Completed = new('completed') + end +end + +sig { params(priority: Priority, status: Status).returns(String) } +def update_task(priority:, status:) + "Updated to #{priority.serialize} / #{status.serialize}" +end +``` + +The generated schema constrains the parameter to valid values: + +```json +{ + "priority": { + "type": "string", + "enum": ["low", "medium", "high", "critical"] + } +} +``` + +**Case-insensitive matching**: When the LLM returns `"HIGH"` or `"High"` instead of `"high"`, DSPy.rb first tries an exact `try_deserialize`, then falls back to a case-insensitive lookup. This prevents failures caused by LLM casing variations. + +### T::Struct Parameters + +Use `T::Struct` for complex nested objects. DSPy.rb generates nested JSON Schema properties and recursively coerces the LLM's hash response into struct instances. + +```ruby +class TaskMetadata < T::Struct + prop :id, String + prop :priority, Priority + prop :tags, T::Array[String] + prop :estimated_hours, T.nilable(Float), default: nil +end + +class TaskRequest < T::Struct + prop :title, String + prop :description, String + prop :status, Status + prop :metadata, TaskMetadata + prop :assignees, T::Array[String] +end + +sig { params(task: TaskRequest).returns(String) } +def create_task(task:) + "Created: #{task.title} (#{task.status.serialize})" +end +``` + +The LLM sees the full nested object schema and DSPy.rb reconstructs the struct tree from the JSON response, including enum fields inside nested structs. + +### Nilable Parameters + +Mark optional parameters with `T.nilable(...)` and provide a default value of `nil` in the method signature. These parameters are excluded from the JSON Schema `required` array. + +```ruby +sig { params( + query: String, + max_results: T.nilable(Integer), + filter: T.nilable(String) +).returns(String) } +def search(query:, max_results: nil, filter: nil) + # query is required; max_results and filter are optional +end +``` + +### Collections + +Typed arrays and hashes generate precise item/value schemas: + +```ruby +sig { params( + tags: T::Array[String], + priorities: T::Array[Priority], + config: T::Hash[String, T.any(String, Integer, Float)] +).returns(String) } +def configure(tags:, priorities:, config:) + # Array elements and hash values are validated and coerced +end +``` + +### Union Types + +`T.any(...)` generates a `oneOf` JSON Schema. When one of the union members is a `T::Struct`, DSPy.rb uses the `_type` discriminator field to select the correct struct class during coercion. + +```ruby +sig { params(value: T.any(String, Integer, Float)).returns(String) } +def handle_flexible(value:) + # Accepts multiple types +end +``` + +--- + +## Built-in Toolsets + +### TextProcessingToolset + +`DSPy::Tools::TextProcessingToolset` provides Unix-style text analysis and manipulation operations. Toolset name prefix: `text`. + +| Tool Name | Method | Description | +|-----------------------------------|-------------------|--------------------------------------------| +| `text_grep` | `grep` | Search for patterns with optional case-insensitive and count-only modes | +| `text_wc` | `word_count` | Count lines, words, and characters | +| `text_rg` | `ripgrep` | Fast pattern search with context lines | +| `text_extract_lines` | `extract_lines` | Extract a range of lines by number | +| `text_filter_lines` | `filter_lines` | Keep or reject lines matching a regex | +| `text_unique_lines` | `unique_lines` | Deduplicate lines, optionally preserving order | +| `text_sort_lines` | `sort_lines` | Sort lines alphabetically or numerically | +| `text_summarize_text` | `summarize_text` | Produce a statistical summary (counts, averages, frequent words) | + +Usage: + +```ruby +agent = DSPy::ReAct.new( + AnalyzeText, + tools: DSPy::Tools::TextProcessingToolset.to_tools +) + +result = agent.call(text: log_contents, question: "How many error lines are there?") +puts result.answer +``` + +### GitHubCLIToolset + +`DSPy::Tools::GitHubCLIToolset` wraps the `gh` CLI for read-oriented GitHub operations. Toolset name prefix: `github`. + +| Tool Name | Method | Description | +|------------------------|-------------------|---------------------------------------------------| +| `github_list_issues` | `list_issues` | List issues filtered by state, labels, assignee | +| `github_list_prs` | `list_prs` | List pull requests filtered by state, author, base| +| `github_get_issue` | `get_issue` | Retrieve details of a single issue | +| `github_get_pr` | `get_pr` | Retrieve details of a single pull request | +| `github_api_request` | `api_request` | Make an arbitrary GET request to the GitHub API | +| `github_traffic_views` | `traffic_views` | Fetch repository traffic view counts | +| `github_traffic_clones`| `traffic_clones` | Fetch repository traffic clone counts | + +This toolset uses `T::Enum` parameters (`IssueState`, `PRState`, `ReviewState`) for state filters, demonstrating enum-based tool signatures in practice. + +```ruby +agent = DSPy::ReAct.new( + RepoAnalysis, + tools: DSPy::Tools::GitHubCLIToolset.to_tools +) +``` + +--- + +## Testing + +### Unit Testing Individual Tools + +Test `DSPy::Tools::Base` subclasses by instantiating and calling `call` directly: + +```ruby +RSpec.describe WeatherLookup do + subject(:tool) { described_class.new } + + it "returns weather for a city" do + result = tool.call(city: "Berlin") + expect(result).to include("Berlin") + end + + it "exposes the correct tool name" do + expect(tool.name).to eq("weather_lookup") + end + + it "generates a valid schema" do + schema = described_class.call_schema_object + expect(schema[:required]).to include("city") + expect(schema[:properties]).to have_key(:city) + end +end +``` + +### Unit Testing Toolsets + +Test toolset methods directly on an instance. Verify tool generation with `to_tools`: + +```ruby +RSpec.describe DatabaseToolset do + subject(:toolset) { described_class.new } + + it "executes a query" do + result = toolset.query(sql: "SELECT 1") + expect(result).to be_a(String) + end + + it "generates tools with correct names" do + tools = described_class.to_tools + names = tools.map(&:name) + expect(names).to contain_exactly("db_query", "db_insert", "db_delete") + end + + it "generates tool descriptions" do + tools = described_class.to_tools + query_tool = tools.find { |t| t.name == "db_query" } + expect(query_tool.description).to eq("Run a read-only SQL query") + end +end +``` + +### Mocking Predictions Inside Tools + +When a tool calls a DSPy predictor internally, stub the predictor to isolate tool logic from LLM calls: + +```ruby +class SmartSearchTool < DSPy::Tools::Base + extend T::Sig + + tool_name "smart_search" + tool_description "Search with query expansion" + + sig { void } + def initialize + @expander = DSPy::Predict.new(QueryExpansionSignature) + end + + sig { params(query: String).returns(String) } + def call(query:) + expanded = @expander.call(query: query) + perform_search(expanded.expanded_query) + end + + private + + def perform_search(query) + # actual search logic + end +end + +RSpec.describe SmartSearchTool do + subject(:tool) { described_class.new } + + before do + expansion_result = double("result", expanded_query: "expanded test query") + allow_any_instance_of(DSPy::Predict).to receive(:call).and_return(expansion_result) + end + + it "expands the query before searching" do + allow(tool).to receive(:perform_search).with("expanded test query").and_return("found 3 results") + result = tool.call(query: "test") + expect(result).to eq("found 3 results") + end +end +``` + +### Testing Enum Coercion + +Verify that string values from LLM responses deserialize into the correct enum instances: + +```ruby +RSpec.describe "enum coercion" do + it "handles case-insensitive enum values" do + toolset = GitHubCLIToolset.new + # The LLM may return "OPEN" instead of "open" + result = toolset.list_issues(state: IssueState::Open) + expect(result).to be_a(String) + end +end +``` + +--- + +## Constraints + +- All exposed tool methods must use **keyword arguments**. Positional-only parameters generate schemas but keyword arguments produce more reliable LLM interactions. +- Each exposed method becomes a **separate, independent tool**. Method chaining or multi-step sequences within a single tool call are not supported. +- Shared state across tool proxies is scoped to a single `to_tools` call. Separate `to_tools` invocations create separate toolset instances. +- Methods without a Sorbet `sig` produce an empty parameter schema. The LLM will not know what arguments to pass.