-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK #3993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…to feat/video-eval * 'feat/video-eval' of https://github.com/comet-ml/opik: (31 commits) [NA] Add global error handler for unhandled exceptions and return JSON response (#3835) [NA] [BE] Add configurable HTTP ports for application and admin connectors (#3984) Update base version to 1.9.4 Update TypeScript SDK version to 1.9.3 [NA] Update base version to 1.9.3 (#3979) [OPIK-2833] [FE] Revert recent changes (#3978) [OPIK-2714] Opik REST API parity with Python SDK - find_dataset_items_with_experiment_items (#3966) Update TypeScript SDK version to 1.9.2 Update version.txt (#3976) [NA] [CLI] Added opik usage-report (#3968) Disable metrics loggign in pydantic-ai configuration guide (#3975) Update TypeScript SDK version to 1.9.1 [NA] [BE] Update model prices file (#3971) [NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code (#3970) [NA] Add infrastructure check to --quick-restart in dev-runner scripts (#3963) [OPIK-0000] [P SDK] Alexkuzmik / make litellm chat model use new decorator instead of callback (#3950) [OPIK-2496] [BE] [SDK] Add provider info to token usage section in traces metadata (#3731) [OPIK-2794] [FE] Add tooltips to Annotation Queues UI elements (#3949) Update base version to 1.9.1 [OPIK-2856] [BE] Add UUIDv7 time-based filtering for trace threads (#3953) ...
Co-authored-by: Copilot <[email protected]>
…to feat/video-eval * 'feat/video-eval' of https://github.com/comet-ml/opik: Update apps/opik-frontend/src/lib/modelCapabilities.ts
Backend Tests Results 283 files 283 suites 49m 40s ⏱️ Results for commit 6af3039. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for video-based LLM-as-a-judge evaluations in the Python SDK. The implementation extends the existing multimodal content system to handle video URLs alongside image URLs, enabling video-capable models to process video content in prompts.
Key changes:
- Added video capability detection for models (similar to existing vision detection)
- Extended prompt template system to handle
video_urlcontent parts - Added video modality support throughout the rendering pipeline
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
model_capabilities.py |
Added video_capability_detector function and registry integration for detecting video-capable models |
evaluator.py |
Extended prompt evaluation to include video capability detection alongside vision |
types.py |
Added "video" to the ModalityName literal type |
chat_prompt_template.py |
Added render_video_url_part function and registered it as a default renderer |
chat_content_renderer_registry.py |
Added video placeholder configuration and extraction logic for video URLs |
test_model_capabilities.py |
Added unit tests for video capability detection |
test_message_renderer.py |
Added comprehensive tests for video content rendering and flattening |
…/opik into feat/video-costtracking * 'feat/video-costtracking' of https://github.com/comet-ml/opik: (60 commits) Update base version to 1.9.10 [OPIK-2856] [FE] Hide All time option in metrics tab and support optional date filtering (#4052) Update TypeScript SDK version to 1.9.9 Issue-4033: add option to override secretStoreRef name and kind (#4049) [OPIK-2987][FE] Alexkuzmik/change default experiments grouping (#3962) [NA] [SDK] Relax litellm dependencies (#4045) Update base version to 1.9.9 [OPIK-2856] [BE] Use batch calls to reduce test duration (#4034) Update TypeScript SDK version to 1.9.8 [OPIK-3050] [FE] Update LangGraph and LangChain message prettification to use last human message (#4043) [OPIK-2880] [SDK] Video: Add support for Video LLM-as-a-judge Python SDK (#3993) Bump minimum version for Opik SDK (#4042) bump optimizer version (#4041) [NA] [BE] Migrate to Jetty CrossOriginHandler and fix CORS behavior parity (#4040) [OPIK-2856] [FE] Add datetime picker to traces, spans, and threads tabs (#3977) [OPIK-2856] [BE] Implement UUIDv7-based time filtering for project metrics (#3969) [NA] [SDK] [DOCS] Update automatically OpenAPI spec and Fern code (#4037) [NA] [BE] Update model prices file (#4038) Update base version to 1.9.8 Update doc links (#4035) ...
Warning
Large base64 data such as video should ideally be moved to a blob store and appropriate routes, handlers and retention configured. For test purposes this will work but in large volumes could be unstable.
Note
The UI rendering in the FE is part of #3988
Details
Add support for video based online evals in SDK (LLMaaj).
Change checklist
Issues
Testing
Testing requires a local inference model, for this I have been using Ollama with vLLM endpoint. For cloud testing you can tunnel using ngrok. Ollama support added here ollama/ollama#12962 local build can be provided if not merged.
Documentation
Updated in #3988 (OPIK-2940)
Examples
In the UI:

In the CLI:
