Note that this implementation relies purely on the standard LibOPF. Therefore, if one uses our package, please also cite the original LibOPF authors.
OPForch is a PyTorch-based implementation of the Optimum-Path Forest (OPF) classifier, migrated from the original OPFython package. By replacing per-node Python objects with dense tensors and scalar Numba loops with batched tensor operations, OPForch delivers massive speedups while maintaining zero prediction mismatches against the reference implementation.
| Metric | Result |
|---|---|
| Accuracy Parity | 0 prediction mismatches across all 4 classifiers |
| Predict Speedup | Up to 484Γ faster at N=10,000 |
| Fit Speedup | Up to 19Γ faster at N=10,000 |
| Distance Matrix | Up to 413Γ faster (batched tensor vs NΒ² scalar loop) |
| GPU Acceleration | 12.7Γ additional speedup on RTX 4070 for distance computation |
| Device Support | CPU, CUDA, and Multi-GPU via DeviceManager |
- Graph-based classification without hyperparameter tuning
- Deterministic training with competitive accuracy
- GPU-accelerated distance computation and prediction
- A drop-in replacement for OPFython with orders-of-magnitude speedups
OPForch is compatible with: Python 3.8+ and PyTorch 2.0+.
opforch/
βββ core/
β βββ heap.py # Tensor-backed binary heap
β βββ subgraph.py # Dense tensor columns (13 state tensors)
β βββ opf.py # Abstract base (torch.save/load, device)
βββ math/
β βββ distance.py # 47 batched (N,D)Γ(M,D)β(N,M) distance metrics
β βββ general.py # Accuracy, confusion matrix, normalize, purity
β βββ random.py # Tensor-based random generators
βββ models/
β βββ supervised.py # MST + competition + batched predict
β βββ knn_supervised.py # KNN density clustering + k-selection
β βββ semi_supervised.py # Labeled + unlabeled propagation
β βββ unsupervised.py # Density clustering + normalized cut
βββ stream/
β βββ loader.py # CSV/TXT/JSON β torch.Tensor
β βββ parser.py # Extract features + labels
β βββ splitter.py # Train/test split
βββ subgraphs/
β βββ knn.py # KNNSubgraph (torch.topk, vectorized PDF)
βββ utils/
β βββ constants.py # EPSILON, FLOAT_MAX, status codes
β βββ converter.py # Binary OPF format converters
β βββ device.py # DeviceManager (CPU/GPU/multi-GPU)
β βββ exception.py # Custom exception hierarchy
β βββ logging.py # Timed rotating file logger
βββ report/ # Migration report, benchmarks, and plots
βββ examples/ # Usage scripts for all 4 classifiers
Install from source:
git clone https://github.com/gugarosa/opforch.git
cd opforch
pip install -e .For GPU support, install PyTorch with CUDA:
pip install torch --index-url https://download.pytorch.org/whl/cu124import torch
from opforch.models import SupervisedOPF
from opforch.stream import loader, parser, splitter
# Load data
data = loader.load_txt("data/boat.txt")
X, Y = parser.parse_loader(data)
X_train, X_test, Y_train, Y_test = splitter.split(X, Y, percentage=0.5)
# Train and predict (CPU)
opf = SupervisedOPF(distance="log_squared_euclidean")
opf.fit(X_train, Y_train)
predictions = opf.predict(X_test)
# GPU β just change the device
opf_gpu = SupervisedOPF(distance="euclidean", device="cuda:0")
opf_gpu.fit(X_train.cuda(), Y_train.cuda())
predictions = opf_gpu.predict(X_test.cuda())| Classifier | Description |
|---|---|
SupervisedOPF |
MST-based prototype detection + cost competition |
KNNSupervisedOPF |
k-NN density clustering with validation-driven k |
SemiSupervisedOPF |
Extends supervised with unlabeled data propagation |
UnsupervisedOPF |
Density-based clustering with normalized cut |
All classifiers support fit(), predict(), save(), and load(), and accept a device parameter for CPU/GPU execution.
Run the benchmark suite to compare performance on your hardware:
# Baseline benchmarks (47 metrics, 4 models, scaling)
python report/benchmark.py
# Extended benchmarks (up to N=10K, GPU, dimensionality)
python report/benchmark_extended.py
# Generate plots
python report/plot_benchmarks.py
python report/plot_extended.pyFor the full migration report with detailed analysis, see report/REPORT.md.
The key architectural change from OPFython is the elimination of per-node Python objects in favor of dense tensor columns:
OPFython: subgraph.nodes[i].cost = 5.0 # Python object attribute
OPForch: subgraph.costs[i] = 5.0 # Tensor element (GPU-ready)
Prediction is fully batched β a single tensor operation replaces the O(NΓM) Python loop:
dist_matrix = distance_fn(train_features, test_features) # (N, M)
path_costs = torch.maximum(train_costs[:, None], dist_matrix) # (N, M)
predictions = train_labels[path_costs.argmin(dim=0)] # (M,)For the complete architecture documentation, see ARCHITECTURE.md.
If you use OPForch to fulfill any of your needs, please cite us:
J. P. Papa, A. X. FalcΓ£o and C. T. N. Suzuki.
Supervised Pattern Classification based on Optimum-Path Forest.
International Journal of Imaging Systems and Technology (2009).
Looking for datasets? We have some pre-loaded into OPF file format in the data/ directory. More are available at recogna.tech.
If you ever need to report a bug, talk to us, or suggest improvements, please open an issue. We will do our best to help.