Skip to content

Pull requests: eval-protocol/python-sdk

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Klavis Sandbox on Fireworks EP
#388 opened Dec 25, 2025 by zihaolin96 Loading…
enforce single evaluator upload per command
#387 opened Dec 24, 2025 by benjibc Loading…
updated tests
#382 opened Dec 18, 2025 by shreymodi1 Loading…
Trail proxy setup
#380 opened Dec 17, 2025 by xiaoyifan Loading…
support extra headers
#373 opened Dec 15, 2025 by benjibc Loading…
warn if large datasets + force 1 run
#365 opened Dec 12, 2025 by xzrderek Loading…
Shrey/modelquality
#353 opened Dec 2, 2025 by shreymodi1 Loading…
18 tasks
support for tokenids logprobs
#350 opened Nov 26, 2025 by shreymodi1 Loading…
18 tasks
calibration evaluator
#345 opened Nov 24, 2025 by benjibc Draft
18 tasks
adding response quality validation for retry
#344 opened Nov 24, 2025 by morgendave Loading…
10 tasks
tests fix
#341 opened Nov 21, 2025 by shreymodi1 Loading…
18 tasks
Shrey/trl
#335 opened Nov 17, 2025 by shreymodi1 Loading…
18 tasks
Update Klavis MCP use case
#330 opened Nov 14, 2025 by LLiuZheng Loading…
Text to SQL RFT example
#324 opened Nov 10, 2025 by benjibc Loading…
swe-bench
#280 opened Oct 15, 2025 by shreymodi1 Loading…
reasoning effort string change
#267 opened Oct 10, 2025 by shreymodi1 Loading…
18 tasks
reuse pydantic example for local model picking
#251 opened Oct 5, 2025 by benjibc Loading…
pyyaml removal step 1
#247 opened Oct 3, 2025 by benjibc Loading…
directly hit enter to select
#245 opened Oct 2, 2025 by benjibc Loading…
auto convert from dict
#239 opened Sep 30, 2025 by mayinghan Loading…
18 tasks
Route benchmark datasets through data loaders codex
#229 opened Sep 27, 2025 by benjibc Loading…
ProTip! Exclude everything labeled bug with -label:bug.