OVFM: Ophthalmic Video Foundation Model for Surgical Recognition and Navigation

🎯 Overview

OVFM is a comprehensive foundation model designed for understanding ophthalmic surgical videos. The model supports multiple downstream tasks including:

🔍 Surgical Step Recognition
🔧 Tool Existence Recognition
⚠️ Complication Detection
📊 Surgical Skill Assessment
🎨 Surgical Scene Segmentation
🎯 Limbus Boundary Segmentation
📍 Nucleus Block Localization

🚀 Installation

Create the conda environment using the provided configuration:

conda env create -f environment.yaml
conda activate ovfm

📊 Dataset Preparation

Pretraining Data

Step 1: Video Compression

Compress original videos using FFmpeg for efficient storage:

python data/pretrain/video_compression.py

Configuration:

src_main_folder: Path to original videos
dst_main_folder: Path to save compressed videos
ffmpeg_path: Path to ffmpeg.exe

💡 Note: Install FFmpeg from https://www.ffmpeg.org/

Step 2: Generate Video Clips

Downsample surgical videos into clips:

python data/pretrain/generate_video_clips.py

Configuration:

input_folders: List all compressed video paths

Step 3: Generate Training Index

Create the pretraining index file:

python data/pretrain/generate_pretraining_csv.py

This generates train.csv containing video clip path indices.

Downstream Task Data

Detailed data preparation instructions for each downstream task are provided in the Downstream Tasks section.

🎓 Pretraining

Download Pretrained Weights

Download the initial pretrained weights:

# Download from: https://github.com/kahnchana/svt/releases/download/v1.0/kinetics400_vitb_ssl.pth
# Place in: checkpoints/kinetics400_vitb_ssl.pth

Start Pretraining

Choose the appropriate model size:

OVFM Base Model:

bash scripts/pretrain_train_base.sh

OVFM Small Model:

bash scripts/pretrain_train_small.sh

OVFM Tiny Model:

bash scripts/pretrain_train_tiny.sh

📦 Pretrained Models: Download our pretrained OVFM weights from Google Drive

🔬 Downstream Tasks

4.1 Surgical Step Recognition

Dataset Setup:

Create three folders under data/downstream/:

Aier_Cata
Cataract101
Xinhua_Cata

For each dataset:

Place resized video frames in data_resize_phase_recognition/[video_id]/
Place label data in phase_annotations_phase_recognition/video[video_id].csv

Data Preparation:

# Aier_Cata dataset
python data/downstream/step_recognition/Aier_Cata/generate_video_paths_and_label.py

# Cataract101 dataset
python data/downstream/step_recognition/Cataract101/generate_video_paths_and_label.py

# Xinhua_Cata dataset
python data/downstream/step_recognition/Xinhua_Cata/generate_video_paths_and_label.py

Fine-tuning:

bash scripts/Finetune_step_recognition.sh

4.2 Tool Existence Recognition

Dataset Setup:

Place downsampled video frames in:

data/downstream/surgical_tool_existence_recognition/data_resize/

Place label files in:

data/downstream/surgical_tool_existence_recognition/label/

Data Preparation:

python data/downstream/surgical_tool_existence_recognition/generate_paths.py

Fine-tuning:

bash scripts/Finetune_tool_existence.sh

4.3 Complication Detection

Dataset Setup:

Place videos in:

data/downstream/complication_detection/videos/

Data Preparation:

python data/downstream/complication_detection/generate_train_and_test.py

Fine-tuning:

bash scripts/Finetune_complication_detection.sh

4.4 Surgical Skill Assessment

Dataset Setup:

Place videos in:

data/downstream/surgical_skill_assessment/videos/

Data Preparation:

python data/downstream/surgical_skill_assessment/generate_train_and_test.py

Fine-tuning:

bash scripts/Finetune_skill_assesement.sh

4.5 Surgical Scene Segmentation

Dataset Setup:

Place Cataract dataset scene segmentation data in:

data/downstream/surgical_scene_segmentation/Images-and-Supervisely-Annotations/

Data Preparation:

python data/downstream/surgical_scene_segmentation/Translate_into_label_images.py
python data/downstream/surgical_scene_segmentation/generate_train_test_path.py

Fine-tuning:

python Downstream_tasks/Downstream_scene_segmentation/train.py

4.6 Limbus Boundary Segmentation

Dataset Setup:

Create dataset folders:

data/downstream/segmentation/Cataract101/
data/downstream/segmentation/Xinhua_Cata/

For each dataset:

Place video frames in: data_resize_feature_extractor/
Place labels in: segmentation_labels_test_path/

Data Preparation:

# For Cataract101
python data/downstream/segmentation/Cataract101/generate_train_test_path.py

# For Xinhua_Cata
python data/downstream/segmentation/Xinhua_Cata/generate_train_test_path.py

Fine-tuning:

python Downstream_tasks/Downstream_limbus_segmentation/train.py

4.7 Nucleus Block Localization

Dataset Setup:

Create folder:

data/downstream/nucleus_tracking/frames_with_json_labels/

Place all video frames and corresponding .json annotation files in this folder.

Data Preparation:

python data/downstream/necleus_tracking/generate_labels_and_visualization.py
python data/downstream/necleus_tracking/split_dataset.py

Fine-tuning:

bash scripts/Finetune_nucleus_tracking.sh

🎯 Knowledge Distillation

To perform knowledge distillation for OVFM:

bash scripts/distillation.sh

🙏 Acknowledgements

This project builds upon the excellent work of:

SVT - Self-supervised Video Transformer
Endo-FM - Foundation Model for Endoscopy Video Analysis

We thank the authors for their valuable contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Distillation		Distillation
Downstream_tasks		Downstream_tasks
data		data
datasets		datasets
images		images
models		models
scripts		scripts
utils		utils
README.md		README.md
environment.yaml		environment.yaml
pretrain_train.py		pretrain_train.py
vision_transformer.py		vision_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OVFM: Ophthalmic Video Foundation Model for Surgical Recognition and Navigation

🎯 Overview

🚀 Installation

📊 Dataset Preparation

Pretraining Data

Step 1: Video Compression

Step 2: Generate Video Clips

Step 3: Generate Training Index

Downstream Task Data

🎓 Pretraining

Download Pretrained Weights

Start Pretraining

🔬 Downstream Tasks

4.1 Surgical Step Recognition

4.2 Tool Existence Recognition

4.3 Complication Detection

4.4 Surgical Skill Assessment

4.5 Surgical Scene Segmentation

4.6 Limbus Boundary Segmentation

4.7 Nucleus Block Localization

🎯 Knowledge Distillation

🙏 Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OVFM: Ophthalmic Video Foundation Model for Surgical Recognition and Navigation

🎯 Overview

🚀 Installation

📊 Dataset Preparation

Pretraining Data

Step 1: Video Compression

Step 2: Generate Video Clips

Step 3: Generate Training Index

Downstream Task Data

🎓 Pretraining

Download Pretrained Weights

Start Pretraining

🔬 Downstream Tasks

4.1 Surgical Step Recognition

4.2 Tool Existence Recognition

4.3 Complication Detection

4.4 Surgical Skill Assessment

4.5 Surgical Scene Segmentation

4.6 Limbus Boundary Segmentation

4.7 Nucleus Block Localization

🎯 Knowledge Distillation

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages