Skip to content

Conversation

@vincentkoc
Copy link
Member

Warning

Large base64 data such as video should ideally be moved to a blob store and appropriate routes, handlers and retention configured. For test purposes this will work but in large volumes could be unstable.

Note

The UI rendering in the FE is part of #3988

Details

Add support for video based online evals in SDK (LLMaaj).

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-2880

Testing

Testing requires a local inference model, for this I have been using Ollama with vLLM endpoint. For cloud testing you can tunnel using ngrok. Ollama support added here ollama/ollama#12962 local build can be provided if not merged.

Documentation

Updated in #3988 (OPIK-2940)

Examples

In the UI:
Screenshot 2025-11-07 at 14 59 13

In the CLI:
Screenshot 2025-11-07 at 15 08 37

vincentkoc and others added 30 commits November 7, 2025 10:06
…to feat/video-eval

* 'feat/video-eval' of https://github.com/comet-ml/opik: (31 commits)
  [NA] Add global error handler for unhandled exceptions and return JSON response (#3835)
  [NA] [BE] Add configurable HTTP ports for application and admin connectors (#3984)
  Update base version to 1.9.4
  Update TypeScript SDK version to 1.9.3
  [NA] Update base version to 1.9.3 (#3979)
  [OPIK-2833] [FE] Revert recent changes (#3978)
  [OPIK-2714] Opik REST API parity with Python SDK - find_dataset_items_with_experiment_items (#3966)
  Update TypeScript SDK version to 1.9.2
  Update version.txt (#3976)
  [NA] [CLI] Added opik usage-report (#3968)
  Disable metrics loggign in pydantic-ai configuration guide (#3975)
  Update TypeScript SDK version to 1.9.1
  [NA] [BE] Update model prices file (#3971)
  [NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code (#3970)
  [NA] Add infrastructure check to --quick-restart in dev-runner scripts (#3963)
  [OPIK-0000] [P SDK] Alexkuzmik / make litellm chat model use new decorator instead of callback (#3950)
  [OPIK-2496] [BE] [SDK] Add provider info to token usage section in traces metadata (#3731)
  [OPIK-2794] [FE] Add tooltips to Annotation Queues UI elements (#3949)
  Update base version to 1.9.1
  [OPIK-2856] [BE] Add UUIDv7 time-based filtering for trace threads (#3953)
  ...
@vincentkoc vincentkoc changed the title [OPIK-2880] [FE][BE][DOCS] Add support for Video LLM-as-a-judge Python SDK [OPIK-2880] [FE][BE][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK Nov 8, 2025
@comet-ml comet-ml deleted a comment from github-actions bot Nov 8, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2025

Backend Tests Results

  283 files    283 suites   49m 40s ⏱️
5 424 tests 5 416 ✅ 8 💤 0 ❌
5 423 runs  5 415 ✅ 8 💤 0 ❌

Results for commit 6af3039.

@vincentkoc vincentkoc changed the title [OPIK-2880] [FE][BE][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK [OPIK-2880] [SDK][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK Nov 8, 2025
@vincentkoc vincentkoc marked this pull request as ready for review November 8, 2025 01:04
@vincentkoc vincentkoc requested a review from a team as a code owner November 8, 2025 01:04
Copilot AI review requested due to automatic review settings November 8, 2025 01:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for video-based LLM-as-a-judge evaluations in the Python SDK. The implementation extends the existing multimodal content system to handle video URLs alongside image URLs, enabling video-capable models to process video content in prompts.

Key changes:

  • Added video capability detection for models (similar to existing vision detection)
  • Extended prompt template system to handle video_url content parts
  • Added video modality support throughout the rendering pipeline

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
model_capabilities.py Added video_capability_detector function and registry integration for detecting video-capable models
evaluator.py Extended prompt evaluation to include video capability detection alongside vision
types.py Added "video" to the ModalityName literal type
chat_prompt_template.py Added render_video_url_part function and registered it as a default renderer
chat_content_renderer_registry.py Added video placeholder configuration and extraction logic for video URLs
test_model_capabilities.py Added unit tests for video capability detection
test_message_renderer.py Added comprehensive tests for video content rendering and flattening

@vincentkoc vincentkoc changed the title [OPIK-2880] [SDK][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK [OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK Nov 8, 2025
@comet-ml comet-ml deleted a comment from github-actions bot Nov 8, 2025
@alexkuzmik alexkuzmik merged commit bf72eb1 into main Nov 12, 2025
99 checks passed
@alexkuzmik alexkuzmik deleted the feat/video-eval-sdk branch November 12, 2025 11:06
vincentkoc added a commit that referenced this pull request Nov 13, 2025
…/opik into feat/video-costtracking

* 'feat/video-costtracking' of https://github.com/comet-ml/opik: (60 commits)
  Update base version to 1.9.10
  [OPIK-2856] [FE] Hide All time option in metrics tab and support optional date filtering (#4052)
  Update TypeScript SDK version to 1.9.9
  Issue-4033: add option to override secretStoreRef name and kind (#4049)
  [OPIK-2987][FE] Alexkuzmik/change default experiments grouping (#3962)
  [NA] [SDK] Relax litellm dependencies (#4045)
  Update base version to 1.9.9
  [OPIK-2856] [BE] Use batch calls to reduce test duration (#4034)
  Update TypeScript SDK version to 1.9.8
  [OPIK-3050] [FE] Update LangGraph and LangChain message prettification to use last human message (#4043)
  [OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK (#3993)
  Bump minimum version for Opik SDK (#4042)
  bump optimizer version (#4041)
  [NA] [BE] Migrate to Jetty CrossOriginHandler and fix CORS behavior parity (#4040)
  [OPIK-2856] [FE] Add datetime picker to traces, spans, and threads tabs (#3977)
  [OPIK-2856] [BE] Implement UUIDv7-based time filtering for project metrics (#3969)
  [NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code (#4037)
  [NA] [BE] Update model prices file (#4038)
  Update base version to 1.9.8
  Update doc links (#4035)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants