[OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK #3993

vincentkoc · 2025-11-08T00:04:54Z

Warning

Large base64 data such as video should ideally be moved to a blob store and appropriate routes, handlers and retention configured. For test purposes this will work but in large volumes could be unstable.

Note

The UI rendering in the FE is part of #3988

Details

Add support for video based online evals in SDK (LLMaaj).

Change checklist

User facing
Documentation update

Issues

OPIK-2880

Testing

Testing requires a local inference model, for this I have been using Ollama with vLLM endpoint. For cloud testing you can tunnel using ngrok. Ollama support added here ollama/ollama#12962 local build can be provided if not merged.

Documentation

Updated in #3988 (OPIK-2940)

Examples

In the UI:

In the CLI:

…to feat/video-eval * 'feat/video-eval' of https://github.com/comet-ml/opik: (31 commits) [NA] Add global error handler for unhandled exceptions and return JSON response (#3835) [NA] [BE] Add configurable HTTP ports for application and admin connectors (#3984) Update base version to 1.9.4 Update TypeScript SDK version to 1.9.3 [NA] Update base version to 1.9.3 (#3979) [OPIK-2833] [FE] Revert recent changes (#3978) [OPIK-2714] Opik REST API parity with Python SDK - find_dataset_items_with_experiment_items (#3966) Update TypeScript SDK version to 1.9.2 Update version.txt (#3976) [NA] [CLI] Added opik usage-report (#3968) Disable metrics loggign in pydantic-ai configuration guide (#3975) Update TypeScript SDK version to 1.9.1 [NA] [BE] Update model prices file (#3971) [NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code (#3970) [NA] Add infrastructure check to --quick-restart in dev-runner scripts (#3963) [OPIK-0000] [P SDK] Alexkuzmik / make litellm chat model use new decorator instead of callback (#3950) [OPIK-2496] [BE] [SDK] Add provider info to token usage section in traces metadata (#3731) [OPIK-2794] [FE] Add tooltips to Annotation Queues UI elements (#3949) Update base version to 1.9.1 [OPIK-2856] [BE] Add UUIDv7 time-based filtering for trace threads (#3953) ...

Co-authored-by: Copilot <[email protected]>

…to feat/video-eval * 'feat/video-eval' of https://github.com/comet-ml/opik: Update apps/opik-frontend/src/lib/modelCapabilities.ts

github-actions · 2025-11-08T00:34:12Z

Backend Tests Results

283 files 283 suites 49m 40s ⏱️
5 424 tests 5 416 ✅ 8 💤 0 ❌
5 423 runs 5 415 ✅ 8 💤 0 ❌

Results for commit 6af3039.

Copilot

Pull Request Overview

This PR adds support for video-based LLM-as-a-judge evaluations in the Python SDK. The implementation extends the existing multimodal content system to handle video URLs alongside image URLs, enabling video-capable models to process video content in prompts.

Key changes:

Added video capability detection for models (similar to existing vision detection)
Extended prompt template system to handle video_url content parts
Added video modality support throughout the rendering pipeline

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`model_capabilities.py`	Added `video_capability_detector` function and registry integration for detecting video-capable models
`evaluator.py`	Extended prompt evaluation to include video capability detection alongside vision
`types.py`	Added "video" to the `ModalityName` literal type
`chat_prompt_template.py`	Added `render_video_url_part` function and registered it as a default renderer
`chat_content_renderer_registry.py`	Added video placeholder configuration and extraction logic for video URLs
`test_model_capabilities.py`	Added unit tests for video capability detection
`test_message_renderer.py`	Added comprehensive tests for video content rendering and flattening

sdks/python/src/opik/evaluation/models/model_capabilities.py

sdks/python/src/opik/api_objects/prompt/chat_prompt_template.py

…/opik into feat/video-costtracking * 'feat/video-costtracking' of https://github.com/comet-ml/opik: (60 commits) Update base version to 1.9.10 [OPIK-2856] [FE] Hide All time option in metrics tab and support optional date filtering (#4052) Update TypeScript SDK version to 1.9.9 Issue-4033: add option to override secretStoreRef name and kind (#4049) [OPIK-2987][FE] Alexkuzmik/change default experiments grouping (#3962) [NA] [SDK] Relax litellm dependencies (#4045) Update base version to 1.9.9 [OPIK-2856] [BE] Use batch calls to reduce test duration (#4034) Update TypeScript SDK version to 1.9.8 [OPIK-3050] [FE] Update LangGraph and LangChain message prettification to use last human message (#4043) [OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK (#3993) Bump minimum version for Opik SDK (#4042) bump optimizer version (#4041) [NA] [BE] Migrate to Jetty CrossOriginHandler and fix CORS behavior parity (#4040) [OPIK-2856] [FE] Add datetime picker to traces, spans, and threads tabs (#3977) [OPIK-2856] [BE] Implement UUIDv7-based time filtering for project metrics (#3969) [NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code (#4037) [NA] [BE] Update model prices file (#4038) Update base version to 1.9.8 Update doc links (#4035) ...

vincentkoc and others added 30 commits November 7, 2025 10:06

Update ModelCostData.java

1bf7f07

feat: BE support online eval with video

3240de3

Update ModelCapabilitiesTest.java

3c6e831

Update MessageContentNormalizerTest.java

7f075ec

feat: BE llm message type for video

2af6e18

chore: type for images in FE

620cb49

Update modelCapabilities.ts

914764e

feat: FE llm video type

0521e58

Update useMessageContent.ts

40765d9

chore: media on prompt improvement panel

1e4e7eb

feat: video on prompts page

248d7e2

feat: video on playground

fcd3f62

chore: video datasets

1895068

chore: video on experiments

fee3276

feat: attachments for video

1919f1e

feat: judge using video

03276d5

Update chat_content_renderer_registry.py

405c099

Update chat_prompt_template.py

63e3715

Update types.py

dcbdf97

Update evaluator.py

7e82ba9

feat: SDK model capabilities

e56867b

Update OnlineScoringEngine.java

6420e9f

Update MessageContentNormalizer.java

6ab3021

chore: supports vision is mapped

d059f51

Merge branch 'main' into feat/video-eval

2eb230a

Update MessageContentNormalizer.java

8b2c358

chore: lint

5cd8534

Update model_capabilities.py

a48c229

Update apps/opik-frontend/src/lib/modelCapabilities.ts

4fdbf20

Co-authored-by: Copilot <[email protected]>

vincentkoc added 3 commits November 7, 2025 15:37

chore: copilot fixes

081f456

Merge branch 'feat/video-eval' of https://github.com/comet-ml/opik in…

0d6ab81

…to feat/video-eval * 'feat/video-eval' of https://github.com/comet-ml/opik: Update apps/opik-frontend/src/lib/modelCapabilities.ts

Update evaluate_multimodal.mdx

6af3039

github-actions bot assigned vincentkoc Nov 8, 2025

vincentkoc changed the title ~~[OPIK-2880] [FE][BE][DOCS] Add support for Video LLM-as-a-judge Python SDK~~ [OPIK-2880] [FE][BE][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK Nov 8, 2025

vincentkoc mentioned this pull request Nov 8, 2025

[OPIK-3045] [BE] Video: Add Cost Tracking for Video Models #3995

Merged

2 tasks

comet-ml deleted a comment from github-actions bot Nov 8, 2025

vincentkoc changed the title ~~[OPIK-2880] [FE][BE][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK~~ [OPIK-2880] [SDK][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK Nov 8, 2025

vincentkoc and others added 2 commits November 7, 2025 17:00

chore: reset apps from main

6bf83dd

Merge branch 'main' into feat/video-eval-sdk

4c1a263

vincentkoc marked this pull request as ready for review November 8, 2025 01:04

vincentkoc requested a review from a team as a code owner November 8, 2025 01:04

Copilot AI review requested due to automatic review settings November 8, 2025 01:04

Copilot AI reviewed Nov 8, 2025

View reviewed changes

sdks/python/src/opik/evaluation/models/model_capabilities.py Show resolved Hide resolved

sdks/python/src/opik/api_objects/prompt/chat_prompt_template.py Show resolved Hide resolved

vincentkoc changed the title ~~[OPIK-2880] [SDK][DOCS] Video: Add support for Video LLM-as-a-judge Python SDK~~ [OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK Nov 8, 2025

comet-ml deleted a comment from github-actions bot Nov 8, 2025

docs: explain video metadata usage

3e53782

vincentkoc mentioned this pull request Nov 10, 2025

[NA] [SDK] Multimodal Opik Optimizer #4003

Closed

2 tasks

alexkuzmik approved these changes Nov 12, 2025

View reviewed changes

alexkuzmik merged commit bf72eb1 into main Nov 12, 2025
99 checks passed

alexkuzmik deleted the feat/video-eval-sdk branch November 12, 2025 11:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK #3993

[OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK #3993

Uh oh!

vincentkoc commented Nov 8, 2025

Uh oh!

github-actions bot commented Nov 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK #3993

[OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK #3993

Uh oh!

Conversation

vincentkoc commented Nov 8, 2025

Details

Change checklist

Issues

Testing

Documentation

Examples

Uh oh!

github-actions bot commented Nov 8, 2025

Backend Tests Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants