GEPA Evaluation: Advanced AI Agent Optimization

Demonstration repository showcasing GEPA with SuperOptiX - achieving high accuracy improvements through reflective prompt evolution.

What is GEPA?

GEPA is a breakthrough optimization technique that uses reflective prompt evolution to dramatically improve AI agent performance. Unlike traditional optimizers that rely on trial-and-error, GEPA acts like an expert tutor - analyzing what went wrong and writing better instructions.

⚠️ Hardware Requirements & Cost Warning

MINIMUM SYSTEM REQUIREMENTS:

RAM: 8GB minimum, 16GB+ recommended, 32GB+ for production
CPU: 4+ cores recommended
Storage: 20GB+ free space for models
Time: 2-10 minutes depending on configuration

CLOUD COST WARNING:

Local execution strongly recommended for cost control
Cloud instances require 8GB+ RAM minimum
Estimated cost: $0.50-$5.00 per optimization run depending on instance size
Extended optimization sessions can become expensive quickly

RESPONSIBLE AI USAGE:

GEPA is computationally intensive - monitor resource usage
Consider environmental impact of extended optimization runs
Start with lightweight demos before full optimization
Use appropriate hardware tier for your use case

🚀 Quick Start Options

Option 1: Lightweight Demo (Low-end Machines)

Perfect for testing on laptops or systems with 8GB+ RAM:

# 1. Clone and setup
git clone https://github.com/SuperagenticAI/gepa_eval.git
cd gepa_eval
./scripts/setup.sh

# 2. Run lightweight demo (2-3 minutes, 8GB+ RAM)
./scripts/run_light_demo.sh

# 3. Quick results with minimal resource usage!

Option 2: Full Demo (High-end Machines)

For systems with 16GB+ RAM and better hardware:

# 1. Clone and setup
git clone https://github.com/SuperagenticAI/gepa_eval.git
cd gepa_eval
./scripts/setup.sh

# 2. Run complete demo (5-10 minutes, 16GB+ RAM)
./scripts/run_demo.sh

# 3. See the full GEPA transformation!

What You'll See

Before GEPA Optimization:

Input: "Solve x² - 5x + 6 = 0"
Output: "Using the quadratic formula: x = 2 or x = 3"

After GEPA Optimization:

Input: "Solve x² - 5x + 6 = 0"
Output: "**Method 1: Factoring**
Step 1: Look for two numbers that multiply to 6 and add to -5
These numbers are -2 and -3
Factor: (x - 2)(x - 3) = 0
Solutions: x = 2 or x = 3

**Method 2: Quadratic Formula**
For ax² + bx + c = 0, x = (-b ± √(b²-4ac)) / 2a
Here: a=1, b=-5, c=6
x = (5 ± √(25-24)) / 2 = (5 ± 1) / 2
x = 3 or x = 2

**Verification:**
x = 2: (2)² - 5(2) + 6 = 4 - 10 + 6 = 0 ✓
x = 3: (3)² - 5(3) + 6 = 9 - 15 + 6 = 0 ✓"

🎓 Featured Demonstrations

1. Advanced Math Agent

Domain: Mathematical problem solving Scenarios: Quadratic equations, geometry, calculus optimization Key Feature: Multi-method solutions with verification

cd gepa_eval
super agent compile advanced_math_gepa
super agent optimize advanced_math_gepa
super agent run advanced_math_gepa --goal "Find the maximum area of a rectangle with perimeter 20"

🔬 GEPA vs. Other Optimization

GEPA Advantages

Aspect	Other Optimizers	GEPA
Sample Efficiency	Needs 100+ examples	Works with 3-10 examples
Domain Adaptation	Generic optimization	Domain-specific feedback
Interpretability	Black box improvements	Human-readable prompt evolution
Quality Focus	Quantity-driven	Quality-driven with reflection

When to Use GEPA

✅ Perfect for:

Specialized domains (math, medicine, law, security)
Limited training data
Quality over speed requirements
Interpretable improvements needed

⚠️ Consider alternatives for:

Simple, general-purpose tasks
Large datasets (>100 examples)
Tight resource constraints
Speed-critical applications

🔧 Hardware Tier Configurations

Tier 1: Lightweight (8GB+ RAM)

Model: llama3.2:1b (lightweight)
Optimization: auto: minimal
Time: 2-3 minutes
Use case: Testing, learning, low-end machines

Tier 2: Standard (16GB+ RAM)

Model: llama3.1:8b + qwen3:8b
Optimization: auto: light
Time: 5-8 minutes
Use case: Development, good balance of speed/quality

Tier 3: Production (32GB+ RAM)

Model: llama3.1:8b + qwen3:8b
Optimization: auto: heavy
Time: 15-30 minutes
Use case: Production deployments, best quality

🛠️ Manual Setup (Alternative)

Prerequisites

Python 3.11+
8GB+ RAM minimum (16GB+ recommended)
SuperOptiX framework

Installation

Option 1: Conda (Recommended)

conda env create -f environment.yml
conda activate gepa-eval
pip install -e .

Option 2: UV (Fastest)

uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install -e .

Local Model Setup

# Install required models
ollama pull llama3.1:8b      # Main processing
ollama pull qwen3:8b         # GEPA reflection
ollama pull llama3.2:1b      # Lightweight testing

Command-Line Exploration

# Step-by-step agent optimization
super agent evaluate advanced_math_gepa    # Baseline
super agent optimize advanced_math_gepa    # GEPA optimization
super agent evaluate advanced_math_gepa    # Measure improvement

# Test with custom problems
super agent run advanced_math_gepa --goal "Solve the system: x + 2y = 7, 3x - y = 4"

🧠 Understanding GEPA

How GEPA Works

Execute & Analyze: Run agent on training examples
Reflect: Reflection LM analyzes failures and successes
Evolve: Generate improved prompt candidates
Select: Choose best performers using Pareto optimization
Iterate: Repeat to build a tree of improvements

Key Components

# GEPA Configuration Example
optimization:
  optimizer:
    name: GEPA
    params:
      metric: advanced_math_feedback    # Domain-specific feedback
      auto: light                       # Budget control
      reflection_lm: qwen3:8b          # Reflection model
      reflection_minibatch_size: 3     # Efficiency tuning

Advanced Feedback Metrics

GEPA includes specialized metrics for different domains: Refer Docs for more details.

advanced_math_feedback - Mathematical problem solving
multi_component_enterprise_feedback - Business document analysis
vulnerability_detection_feedback - Security analysis
privacy_preservation_feedback - Data privacy protection
medical_accuracy_feedback - Healthcare applications
legal_analysis_feedback - Legal document processing

🔧 Customization Guide

Creating Custom Agents

# Create new GEPA-enabled agent
super agent design my_custom_agent

# Add GEPA optimization
# Edit playbook to include GEPA configuration
super agent compile my_custom_agent
super agent optimize my_custom_agent

Custom Feedback Metrics

def custom_domain_feedback(example, pred, trace=None, *args, **kwargs):
    """Implement domain-specific feedback for GEPA."""
    from dspy.primitives import Prediction

    # Analyze prediction quality
    score = analyze_prediction(example, pred)
    feedback = generate_domain_feedback(example, pred)

    return Prediction(score=score, feedback=feedback)

📈 Performance Optimization

Memory-Efficient Setup

For systems with limited resources:

# Optimized for 16GB+ systems
language_model:
  model: llama3.1:8b        # ~8GB
optimization:
  optimizer:
    reflection_lm: qwen3:8b  # ~8GB
    auto: light              # Conservative budget

Budget Control

# Budget options for different needs
auto: light    # 3-5 minutes, good results
auto: medium   # 8-12 minutes, better results
auto: heavy    # 15-30 minutes, best results

🐛 Troubleshooting

Common Issues

GEPA Timeout (Normal Behavior)

Error: Command timed out after 2m 0.0s

Solution: GEPA typically needs 3-5 minutes. This is expected behavior.

super agent optimize agent_name --timeout 300  # 5 minutes

Memory Issues

# Reduce memory usage
# Edit playbook: reflection_minibatch_size: 2
# Edit playbook: auto: light

Model Availability

# Ensure models are available
ollama list
ollama pull llama3.1:8b
ollama pull qwen3:8b

📚 Documentation & Resources

GEPA Paper - Original research
DSPy GEPA Tutorial - Technical guide
SuperOptiX Docs - Framework documentation
GEPA Optimization Guide - Comprehensive guide

🤝 Contributing

We welcome contributions! See our contribution guide for details.

Ways to contribute:

Add new domain-specific agents
Implement custom feedback metrics
Improve benchmark coverage
Enhance documentation
Report issues and bugs

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

GEPA Research Team - Original algorithm development
DSPy Framework - Core optimization infrastructure

Ready to see GEPA in action? Run ./scripts/run_demo.sh and watch AI agent optimization revolution! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
gepa_eval		gepa_eval
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.super		.super
QUICK_START.md		QUICK_START.md
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

SuperagenticAI/gepa-eval

Folders and files

Latest commit

History

Repository files navigation