LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds (CVPR 2025)
Zihui Zhang, Weisheng Dai, Hongtao Wen, Bo Yang
We propose an unsupervised learning approach for 3D semantic segmentation.
The global patterns are semantic-aware, and our performances exceed baselines:
![]() |
![]() |
|---|
### CUDA 11.8
conda env create -f env.yml
source activate LogoSP
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pytorch-scatter -c pyg
git clone https://github.com/NVIDIA/MinkowskiEngine.gitModify the MinkowskiEngine to adapt to Pytorch 2.x
- MinkowskiEngine/src/3rdparty/concurrent_unordered_map.cuh: Add '#include <thrust/execution_policy.h>'
- MinkowskiEngine/src/convolution_kernel.cuh: Add '#include <thrust/execution_policy.h>'
- MinkowskiEngine/src/coordinate_map_gpu.cu: Add '#include <thrust/unique.h>' and '#include <thrust/remove.h>'
- MinkowskiEngine/src/spmm.cu: Add '#include <thrust/execution_policy.h>', '#include <thrust/reduce.h>', and '#include <thrust/sort.h>'
cd MinkowskiEngine
python setup.py install --blas=openblasThe data preparation process includes segmentation data, superpoints, and DINOv2 feature extraction and projection.
We mainly follow GrowSP to preprocess the ScanNet dataset and build superpoints.
For ScanNet data, please download from here.
Uncompress the folder and move it to ./data/ScanNet/raw/.
For superpoints, we can follow GrowSP to use VCCS+Region Growing or use the ScanNet officially provided Felzenszwalb superpoints (optional). Choosing one of them with data preprocessing is enough.
python data_prepare/data_prepare_ScanNet.py --data_path './data/ScanNet/raw' --processed_data_path './data/ScanNet/processed' --Felzenszwalb False
python data_prepare/initialSP_prepare_ScanNet.py --input_path './data/ScanNet/processed/' --sp_path './data/ScanNet/initial_superpoints/'Please download the ScanNet toolkit and come into ScanNet/Segmentor to build by running make (or create makefiles for your system using cmake).
This will create a segmentator binary file.
Then, go outside the ./ScanNet to run the segmentator:
./run_segmentator.sh your_scannet_tranval_path ## e.g ./data/ScanNet/raw/scans
./run_segmentator.sh your_scannet_test_path ## e.g ./data/ScanNet/raw/scans_test
# Running the preprocessing when having the superpoint files.
python data_prepare/data_prepare_ScanNet.py --data_path './data/ScanNet/raw' --processed_data_path './data/ScanNet/processed' --processed_sp_path './data/ScanNet/Felzenszwalb'After superpoints construction and data preprocessing by (1) or (2), we can extract and project DINOv2 features.
We resume the data provided by OpenScene, uncompress them and put into ./data/ScanNet
wget https://cvg-data.inf.ethz.ch/openscene/data/scannet_processed/scannet_3d.zip
wget https://cvg-data.inf.ethz.ch/openscene/data/scannet_processed/scannet_2d.zipFinally, extracting DINOv2 features and project to 3D point clouds by:
python project_ScanNet.pyThis will create 3D point clouds with features in ./data/ScanNet/DINOv2_feats_s14up4_voxel_0.05.
The data structure should be:
ScanNet
βββ processed
βββ scannet_3d
| βββ train
| βββ val
| βββ scannetv2_train.txt
| βββ scannetv2_val.txt
| βββ scannetv2_test.txt
βββ initial_superpoints_0.25
βββ Felzenszwalb (optional)
βββ DINOv2_feats_s14up4_voxel_0.15S3DIS dataset can be found here.
Notice that download the files named "Stanford3dDataset_v1.2.zip". Uncompress the folder and move it to data/S3DIS/raw. Then run the commands below to begin preprocessing:
python data_prepare/data_prepare_S3DIS.py --data_path './data/S3DIS/raw'The 2D image and camera parameters are storted in 2D-3D-S dataset, please download it and extract DINO features for S3DIS by:
python project_S3DIS.pyThe data structure should be:
S3DIS
βββ input_0.010
βββ initial_superpoints
βββ DINOv2_feats_s14up4_voxel_0.05
βββ 2D-3D-S
βββ Area1
βββ Area2
...
βββ Area5a
βββ Area5b
βββ Area6The training and validation set of nuScenes (including RGB for distillation) can be downloaded following OpenScene:
# all 3d data
wget https://cvg-data.inf.ethz.ch/openscene/data/nuscenes_processed/nuscenes_3d.zip
wget https://cvg-data.inf.ethz.ch/openscene/data/nuscenes_processed/nuscenes_3d_train.zip
# all image data
wget https://cvg-data.inf.ethz.ch/openscene/data/nuscenes_processed/nuscenes_2d.zipConstructing superpoints by:
python data_prepare/initialSP_prepare_nuScenes.py --input_path '../data/nuscenes/nuScenes_3d/train/' --sp_path '../data/nuScenes/initial_superpoints/'DINOv2 features extracting and projecting by:
python project_nuScenes.py --output_dir './data/nuScenes/DINOv2_feats_s14up4_voxel_0.15'The data structure should be:
nuScenes
βββ nuScenes_3d
| βββ train
| βββ val
βββ nuScenes_2d
| βββ train
| βββ val
βββ initial_superpoints
| βββ train
βββ DINOv2_feats_s14up4_voxel_0.15The distillation model is first trained by:
CUDA_VISIBLE_DEVICES=0 python train_Distill_ScanNet.py --save_path 'ckpt/ScanNet/distill/' --feats_path './data/ScanNet/DINOv2_feats_s14up4_voxel_0.05/'After distillation, we have the model checkpoints and train the segmentation model:
# e.g., use the epoch 300 checkpoint
CUDA_VISIBLE_DEVICES=0 python train_Seg_ScanNet.py --save_path 'ckpt/ScanNet/seg/' --distill_ckpt './ckpt/ScanNet/distill/checkpoint_300.tar' --sp_path './data/ScanNet/initial_superpoints/'Distillation & Segmentation:
CUDA_VISIBLE_DEVICES=0 python train_Distill_S3DIS.py --save_path 'ckpt/S3DIS/distill/' --feats_path './data/S3DIS/DINOv2_feats_s14up4_voxel_0.05/'
# e.g., use the epoch 700 checkpoint
CUDA_VISIBLE_DEVICES=0 python train_Seg_S3DIS.py --save_path 'ckpt/S3DIS/seg/' --distill_ckpt './ckpt/S3DIS/distill/checkpoint_700.tar' --sp_path './data/S3DIS/initial_superpoints/'Distillation & Segmentation:
CUDA_VISIBLE_DEVICES=0 python train_Distill_nuScenes.py --save_path 'ckpt/nuScenes/distill/' --feats_path './data/nuScenes/DINOv2_feats_s14up4_voxel_0.15/'
# e.g., use the epoch 300 checkpoint
CUDA_VISIBLE_DEVICES=0 python train_Seg_nuScenes.py --save_path 'ckpt/nuScenes/seg/' --distill_ckpt './ckpt/nuScenes/distill/checkpoint_300.tar' --sp_path './data/nuScenes/initial_superpoints/train/'If preparing the online testing predictions, please download testing data from here, Uncompress them and put the data structure as:
v1.0-test_meta
βββ v1.0-test
βββ samples
βββ maps
βββ LICENSEMaking preprocessing for testing data, and then running testing for online submission:
# pip install nuscenes-devkit
python nuScenes_test_extraction.py --input_dir './v1.0-test_meta' --output_dir './data/nuScenes/nuscenes_3d/test'
# mode_ckpt, classifier_ckpt should be indicated. e.g. './ckpt_seg/nuScenes/model_50_checkpoint.pth'
CUDA_VISIBLE_DEVICES=0 python nuScenes_test_preds.py --test_input_pat './nuScenes_test_data' --val_input_path './data/nuScenes/nuScenes_3d/val' --out_path './nuScenes_online_testing', --mode_ckpt './ckpt_seg/nuScenes/model_50_checkpoint.pth' --classifier_ckpt './ckpt_seg/nuScenes/cls_50_checkpoint.pth' The well-trained checkpoints for three datasets are in Google Drive.
We also provide the scripts for visualization, each point of 3D scenes will be assigned a color (RGB) by the prediction model with a colormap, and
the colored point clouds are stored as .ply files. Please refer to the ./vis_predictions folder. These .ply files are then converted as .obj
files, refer to ./to_obj for rendering by KeyShot.


