Skip to content

SuperagenticAI/gepa-eval

Repository files navigation

GEPA Evaluation: Advanced AI Agent Optimization

Demonstration repository showcasing GEPA with SuperOptiX - achieving high accuracy improvements through reflective prompt evolution.

License: MIT Python 3.11+ SuperOptiX

What is GEPA?

GEPA is a breakthrough optimization technique that uses reflective prompt evolution to dramatically improve AI agent performance. Unlike traditional optimizers that rely on trial-and-error, GEPA acts like an expert tutor - analyzing what went wrong and writing better instructions.

⚠️ Hardware Requirements & Cost Warning

MINIMUM SYSTEM REQUIREMENTS:

  • RAM: 8GB minimum, 16GB+ recommended, 32GB+ for production
  • CPU: 4+ cores recommended
  • Storage: 20GB+ free space for models
  • Time: 2-10 minutes depending on configuration

CLOUD COST WARNING:

  • Local execution strongly recommended for cost control
  • Cloud instances require 8GB+ RAM minimum
  • Estimated cost: $0.50-$5.00 per optimization run depending on instance size
  • Extended optimization sessions can become expensive quickly

RESPONSIBLE AI USAGE:

  • GEPA is computationally intensive - monitor resource usage
  • Consider environmental impact of extended optimization runs
  • Start with lightweight demos before full optimization
  • Use appropriate hardware tier for your use case

🚀 Quick Start Options

Option 1: Lightweight Demo (Low-end Machines)

Perfect for testing on laptops or systems with 8GB+ RAM:

# 1. Clone and setup
git clone https://github.com/SuperagenticAI/gepa_eval.git
cd gepa_eval
./scripts/setup.sh

# 2. Run lightweight demo (2-3 minutes, 8GB+ RAM)
./scripts/run_light_demo.sh

# 3. Quick results with minimal resource usage!

Option 2: Full Demo (High-end Machines)

For systems with 16GB+ RAM and better hardware:

# 1. Clone and setup
git clone https://github.com/SuperagenticAI/gepa_eval.git
cd gepa_eval
./scripts/setup.sh

# 2. Run complete demo (5-10 minutes, 16GB+ RAM)
./scripts/run_demo.sh

# 3. See the full GEPA transformation!

What You'll See

Before GEPA Optimization:

Input: "Solve x² - 5x + 6 = 0"
Output: "Using the quadratic formula: x = 2 or x = 3"

After GEPA Optimization:

Input: "Solve x² - 5x + 6 = 0"
Output: "**Method 1: Factoring**
Step 1: Look for two numbers that multiply to 6 and add to -5
These numbers are -2 and -3
Factor: (x - 2)(x - 3) = 0
Solutions: x = 2 or x = 3

**Method 2: Quadratic Formula**
For ax² + bx + c = 0, x = (-b ± √(b²-4ac)) / 2a
Here: a=1, b=-5, c=6
x = (5 ± √(25-24)) / 2 = (5 ± 1) / 2
x = 3 or x = 2

**Verification:**
x = 2: (2)² - 5(2) + 6 = 4 - 10 + 6 = 0 ✓
x = 3: (3)² - 5(3) + 6 = 9 - 15 + 6 = 0 ✓"

🎓 Featured Demonstrations

1. Advanced Math Agent

Domain: Mathematical problem solving Scenarios: Quadratic equations, geometry, calculus optimization Key Feature: Multi-method solutions with verification

cd gepa_eval
super agent compile advanced_math_gepa
super agent optimize advanced_math_gepa
super agent run advanced_math_gepa --goal "Find the maximum area of a rectangle with perimeter 20"

🔬 GEPA vs. Other Optimization

GEPA Advantages

Aspect Other Optimizers GEPA
Sample Efficiency Needs 100+ examples Works with 3-10 examples
Domain Adaptation Generic optimization Domain-specific feedback
Interpretability Black box improvements Human-readable prompt evolution
Quality Focus Quantity-driven Quality-driven with reflection

When to Use GEPA

Perfect for:

  • Specialized domains (math, medicine, law, security)
  • Limited training data
  • Quality over speed requirements
  • Interpretable improvements needed

⚠️ Consider alternatives for:

  • Simple, general-purpose tasks
  • Large datasets (>100 examples)
  • Tight resource constraints
  • Speed-critical applications

🔧 Hardware Tier Configurations

Tier 1: Lightweight (8GB+ RAM)

  • Model: llama3.2:1b (lightweight)
  • Optimization: auto: minimal
  • Time: 2-3 minutes
  • Use case: Testing, learning, low-end machines

Tier 2: Standard (16GB+ RAM)

  • Model: llama3.1:8b + qwen3:8b
  • Optimization: auto: light
  • Time: 5-8 minutes
  • Use case: Development, good balance of speed/quality

Tier 3: Production (32GB+ RAM)

  • Model: llama3.1:8b + qwen3:8b
  • Optimization: auto: heavy
  • Time: 15-30 minutes
  • Use case: Production deployments, best quality

🛠️ Manual Setup (Alternative)

Prerequisites

  • Python 3.11+
  • 8GB+ RAM minimum (16GB+ recommended)
  • SuperOptiX framework

Installation

Option 1: Conda (Recommended)

conda env create -f environment.yml
conda activate gepa-eval
pip install -e .

Option 2: UV (Fastest)

uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install -e .

Local Model Setup

# Install required models
ollama pull llama3.1:8b      # Main processing
ollama pull qwen3:8b         # GEPA reflection
ollama pull llama3.2:1b      # Lightweight testing

Command-Line Exploration

# Step-by-step agent optimization
super agent evaluate advanced_math_gepa    # Baseline
super agent optimize advanced_math_gepa    # GEPA optimization
super agent evaluate advanced_math_gepa    # Measure improvement

# Test with custom problems
super agent run advanced_math_gepa --goal "Solve the system: x + 2y = 7, 3x - y = 4"

🧠 Understanding GEPA

How GEPA Works

  1. Execute & Analyze: Run agent on training examples
  2. Reflect: Reflection LM analyzes failures and successes
  3. Evolve: Generate improved prompt candidates
  4. Select: Choose best performers using Pareto optimization
  5. Iterate: Repeat to build a tree of improvements

Key Components

# GEPA Configuration Example
optimization:
  optimizer:
    name: GEPA
    params:
      metric: advanced_math_feedback    # Domain-specific feedback
      auto: light                       # Budget control
      reflection_lm: qwen3:8b          # Reflection model
      reflection_minibatch_size: 3     # Efficiency tuning

Advanced Feedback Metrics

GEPA includes specialized metrics for different domains: Refer Docs for more details.

  • advanced_math_feedback - Mathematical problem solving
  • multi_component_enterprise_feedback - Business document analysis
  • vulnerability_detection_feedback - Security analysis
  • privacy_preservation_feedback - Data privacy protection
  • medical_accuracy_feedback - Healthcare applications
  • legal_analysis_feedback - Legal document processing

🔧 Customization Guide

Creating Custom Agents

# Create new GEPA-enabled agent
super agent design my_custom_agent

# Add GEPA optimization
# Edit playbook to include GEPA configuration
super agent compile my_custom_agent
super agent optimize my_custom_agent

Custom Feedback Metrics

def custom_domain_feedback(example, pred, trace=None, *args, **kwargs):
    """Implement domain-specific feedback for GEPA."""
    from dspy.primitives import Prediction

    # Analyze prediction quality
    score = analyze_prediction(example, pred)
    feedback = generate_domain_feedback(example, pred)

    return Prediction(score=score, feedback=feedback)

📈 Performance Optimization

Memory-Efficient Setup

For systems with limited resources:

# Optimized for 16GB+ systems
language_model:
  model: llama3.1:8b        # ~8GB
optimization:
  optimizer:
    reflection_lm: qwen3:8b  # ~8GB
    auto: light              # Conservative budget

Budget Control

# Budget options for different needs
auto: light    # 3-5 minutes, good results
auto: medium   # 8-12 minutes, better results
auto: heavy    # 15-30 minutes, best results

🐛 Troubleshooting

Common Issues

GEPA Timeout (Normal Behavior)

Error: Command timed out after 2m 0.0s

Solution: GEPA typically needs 3-5 minutes. This is expected behavior.

super agent optimize agent_name --timeout 300  # 5 minutes

Memory Issues

# Reduce memory usage
# Edit playbook: reflection_minibatch_size: 2
# Edit playbook: auto: light

Model Availability

# Ensure models are available
ollama list
ollama pull llama3.1:8b
ollama pull qwen3:8b

📚 Documentation & Resources

🤝 Contributing

We welcome contributions! See our contribution guide for details.

Ways to contribute:

  • Add new domain-specific agents
  • Implement custom feedback metrics
  • Improve benchmark coverage
  • Enhance documentation
  • Report issues and bugs

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • GEPA Research Team - Original algorithm development
  • DSPy Framework - Core optimization infrastructure

Ready to see GEPA in action? Run ./scripts/run_demo.sh and watch AI agent optimization revolution! 🚀

About

GEPA: Deep evaluation and Optimization using SuperOptiX Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published