OVFM is a comprehensive foundation model designed for understanding ophthalmic surgical videos. The model supports multiple downstream tasks including:
- 🔍 Surgical Step Recognition
- 🔧 Tool Existence Recognition
⚠️ Complication Detection- 📊 Surgical Skill Assessment
- 🎨 Surgical Scene Segmentation
- 🎯 Limbus Boundary Segmentation
- 📍 Nucleus Block Localization
Create the conda environment using the provided configuration:
conda env create -f environment.yaml
conda activate ovfmCompress original videos using FFmpeg for efficient storage:
python data/pretrain/video_compression.pyConfiguration:
src_main_folder: Path to original videosdst_main_folder: Path to save compressed videosffmpeg_path: Path toffmpeg.exe
💡 Note: Install FFmpeg from https://www.ffmpeg.org/
Downsample surgical videos into clips:
python data/pretrain/generate_video_clips.pyConfiguration:
input_folders: List all compressed video paths
Create the pretraining index file:
python data/pretrain/generate_pretraining_csv.pyThis generates train.csv containing video clip path indices.
Detailed data preparation instructions for each downstream task are provided in the Downstream Tasks section.
Download the initial pretrained weights:
# Download from: https://github.com/kahnchana/svt/releases/download/v1.0/kinetics400_vitb_ssl.pth
# Place in: checkpoints/kinetics400_vitb_ssl.pthChoose the appropriate model size:
OVFM Base Model:
bash scripts/pretrain_train_base.shOVFM Small Model:
bash scripts/pretrain_train_small.shOVFM Tiny Model:
bash scripts/pretrain_train_tiny.sh📦 Pretrained Models: Download our pretrained OVFM weights from Google Drive
Dataset Setup:
Create three folders under data/downstream/:
Aier_CataCataract101Xinhua_Cata
For each dataset:
- Place resized video frames in
data_resize_phase_recognition/[video_id]/ - Place label data in
phase_annotations_phase_recognition/video[video_id].csv
Data Preparation:
# Aier_Cata dataset
python data/downstream/step_recognition/Aier_Cata/generate_video_paths_and_label.py
# Cataract101 dataset
python data/downstream/step_recognition/Cataract101/generate_video_paths_and_label.py
# Xinhua_Cata dataset
python data/downstream/step_recognition/Xinhua_Cata/generate_video_paths_and_label.pyFine-tuning:
bash scripts/Finetune_step_recognition.shDataset Setup:
-
Place downsampled video frames in:
data/downstream/surgical_tool_existence_recognition/data_resize/ -
Place label files in:
data/downstream/surgical_tool_existence_recognition/label/
Data Preparation:
python data/downstream/surgical_tool_existence_recognition/generate_paths.pyFine-tuning:
bash scripts/Finetune_tool_existence.shDataset Setup:
Place videos in:
data/downstream/complication_detection/videos/
Data Preparation:
python data/downstream/complication_detection/generate_train_and_test.pyFine-tuning:
bash scripts/Finetune_complication_detection.shDataset Setup:
Place videos in:
data/downstream/surgical_skill_assessment/videos/
Data Preparation:
python data/downstream/surgical_skill_assessment/generate_train_and_test.pyFine-tuning:
bash scripts/Finetune_skill_assesement.shDataset Setup:
Place Cataract dataset scene segmentation data in:
data/downstream/surgical_scene_segmentation/Images-and-Supervisely-Annotations/
Data Preparation:
python data/downstream/surgical_scene_segmentation/Translate_into_label_images.py
python data/downstream/surgical_scene_segmentation/generate_train_test_path.pyFine-tuning:
python Downstream_tasks/Downstream_scene_segmentation/train.pyDataset Setup:
Create dataset folders:
data/downstream/segmentation/Cataract101/
data/downstream/segmentation/Xinhua_Cata/
For each dataset:
- Place video frames in:
data_resize_feature_extractor/ - Place labels in:
segmentation_labels_test_path/
Data Preparation:
# For Cataract101
python data/downstream/segmentation/Cataract101/generate_train_test_path.py
# For Xinhua_Cata
python data/downstream/segmentation/Xinhua_Cata/generate_train_test_path.pyFine-tuning:
python Downstream_tasks/Downstream_limbus_segmentation/train.pyDataset Setup:
Create folder:
data/downstream/nucleus_tracking/frames_with_json_labels/
Place all video frames and corresponding .json annotation files in this folder.
Data Preparation:
python data/downstream/necleus_tracking/generate_labels_and_visualization.py
python data/downstream/necleus_tracking/split_dataset.pyFine-tuning:
bash scripts/Finetune_nucleus_tracking.shTo perform knowledge distillation for OVFM:
bash scripts/distillation.shThis project builds upon the excellent work of:
We thank the authors for their valuable contributions to the community.
