Skip to content

Conversation

@benjibc
Copy link
Contributor

@benjibc benjibc commented Dec 16, 2025

Summary

  • point the local filesystem dataset logger at the shared directory utils so it can be imported without sqlite
  • update the GEval logprob example to run with a filesystem logger and Fireworks/OpenAI completion parameters
  • capture JSONL artifacts from running the example with OpenAI and Fireworks models

Testing

  • HOME=/workspace/python-sdk/.ep_home_openai EP_COMPLETION_PARAMS='[{"model":"gpt-3.5-turbo","logprobs":true,"top_logprobs":3}]' python -m pytest examples/deepeval/test_geval_with_logprobs.py -k test_geval_with_logprobs -vv --disable-warnings --maxfail=1
  • HOME=/workspace/python-sdk/.ep_home_fireworks EP_COMPLETION_PARAMS='[{"model":"accounts/fireworks/models/qwen3-8b","logprobs":true,"api_base":"https://api.fireworks.ai/inference/v1","custom_llm_provider":"fireworks_ai"}]' python -m pytest examples/deepeval/test_geval_with_logprobs.py -k test_geval_with_logprobs -vv --disable-warnings --maxfail=1
  • pre-commit run --files eval_protocol/dataset_logger/local_fs_dataset_logger_adapter.py examples/deepeval/test_geval_with_logprobs.py

Codex Task


Note

Captures provider logprobs on assistant messages during rollouts, updates models and logger import, and adds a GEval example with JSONL artifacts plus tests.

  • Evaluation pipeline:
    • Attach provider logprobs to assistant Message during single-turn rollouts (eval_protocol/pytest/default_single_turn_rollout_process.py), with _serialize_logprobs for JSON-safe payloads.
    • Extend Message model with optional logprobs and exclude it from request payloads (eval_protocol/models.py).
  • Dataset logger:
    • Point local FS logger to shared directory utils (eval_protocol/dataset_logger/local_fs_dataset_logger_adapter.py).
  • Examples & Artifacts:
    • Add GEval example that logs to local FS and forwards final-message logprobs into metric data (examples/deepeval/test_geval_with_logprobs.py).
    • Include JSONL artifacts for OpenAI, Fireworks, and combined runs (examples/deepeval/artifacts/*.jsonl).
  • Tests:
    • Add unit test ensuring rollout captures and stores logprobs (tests/test_rollout_logprobs.py).

Written by Cursor Bugbot for commit ff210d4. This will update automatically on new commits. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants