Multi-Agent Research System

A minimal multi-agent research system, built with DSPy, inspired by Anthropic’s multi-agent research system.

Documentation:

architecture-agent.md — flexible, subagent-spawning design (primary direction)

AGENTS.md — contributor setup, coding style, and testing checklist

Core ideas

Separation of concerns: Lead plans, tracks To-Dos, reads artifacts, synthesizes. Subagents research and optionally write a report.
Least privilege: Subagents use a fixed tool set: web_search and optional fs_write_report. No filesystem reads for subagents.
Minimal contracts: Subagent return surface is a single field:
{"summary": "<brief findings; may include 'artifact: <path>'>"}
Citations (refs) can be added later without changing this surface.
To-Do–driven planning: High-level plans are written and tracked via a To-Do List tool to manage decomposition and progress.
Flexibility: Loops and tactics are examples, not prescriptions. Prompting guidelines will evolve through optimization (e.g., SIMBA/GEPA).

Conceptual flow

Create a To-Do of high-level plans.
Spawn Subagents for tasks (a subagent may decompose a single To-Do into multiple tasks or handle multiple To-Dos in parallel).
Subagents use web_search and may fs_write_report(...); each returns a summary.
Lead reads artifacts, updates the To-Do, and synthesizes the answer; iterate if gaps remain.

Evaluation philosophy

The default efficiency metric drives lean correctness — right answers with minimal waste:

efficiency = accuracy / (time × cost)

This penalizes over-search (20 queries vs 5), over-decomposition (10 subagents vs 3), and slow synthesis. Wrong answers score zero regardless of speed.

GEPA optimization discovers prompts that maximize efficiency. Patterns like "use 3-5 focused tasks" and "stop when returns diminish" emerge naturally from optimizing accuracy / (time × cost) — they're not hardcoded.

Trade-off: optimizing for accuracy alone ignores cost and produces verbose reports.

Quick CLI run

Run the lead agent from the repository root.

uv run python agent.py --query "Summarize recent innovations in AI agent collaboration frameworks."

Swap models with the preset flag when you want to test different providers.

uv run python agent.py --model kimi-k2 --query "Summarize recent innovations in AI agent collaboration frameworks."

Logs

CLI runs emit structured traces under logs/ by default.

Use TRACE_LOG_FILENAME to pick an easy-to-remember name.

TRACE_LOG_FILENAME=trace-ai-collab.log LOG_LEVEL=DEBUG \
  uv run python agent.py --query "Summarize recent innovations in AI agent collaboration frameworks."

The example above writes logs/trace-ai-collab.log. Create directories ahead of time if you specify a path.

Evaluation

Run BrowseComp evaluation with efficiency metrics and GEPA optimization.

# Basic evaluation
uv run python eval.py

# Custom settings
uv run python eval.py --num-examples 20 --metric accuracy

# GEPA optimization (auto train/test split)
uv run python eval.py --optimize --optimize-steps 10

# Save results
uv run python eval.py --save-metrics results.json

Metrics:

accuracy: Binary correctness (1.0 or 0.0)
efficiency: accuracy / (time × cost) - default

Cost config (.env):

WEBSEARCH_COST_PER_CALL_USD=0.005
LM_COST_PER_1K_TOKENS_JSON='{"openai/gpt-4o": 0.005}'

See .env.template for full example.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.claude/commands		.claude/commands
source_docs		source_docs
tests		tests
.env.template		.env.template
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
agent.py		agent.py
architecture-agent.md		architecture-agent.md
config.py		config.py
dataset.py		dataset.py
eval.py		eval.py
models.py		models.py
pyproject.toml		pyproject.toml
tools.py		tools.py
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Agent Research System

Core ideas

Conceptual flow

Evaluation philosophy

Quick CLI run

Logs

Evaluation

License

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Ju-usc/multi-agent-research-system

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Research System

Core ideas

Conceptual flow

Evaluation philosophy

Quick CLI run

Logs

Evaluation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages