This project implements a high-performance phylogenetic tree for managing species based on genetic similarity, using a combination of Octree spatial partitioning and BK-tree indexing under Hamming distance.
📘 Built as a final project for Data Structures II (Spring 2025).
- ✅ DNA Sequence Compression using 2-bit encoding.
- ✅ Fast genetic similarity comparison with Simhash.
- ✅ Dynamic Octree for coarse 3D clustering.
- ✅ BK-tree indexing within Octree leaves for precise matching.
- ✅ Insertion, deletion, mutation, and search fully supported.
- ✅ Clean, modular, header-based C++ implementation.
- ✅ Terminal-based output for testing and structure validation.
Real-world datasets such as:
- Biodiversity gene banks
- Pathogen mutation trackers
- Forensic DNA matching
...require scalable and search-optimized representations of genetic sequences. Our project shows how hierarchical spatial + metric indexing can accelerate this task.
- Read species data from a CSV (
name,DNA sequence). - Compress DNA into bit-encoded format (2 bits per base).
- Generate Simhash to summarize sequence into 64-bit signature.
- Map Simhash to 3D coordinates → Insert into Octree.
- Insert into BK-Tree inside the appropriate Octree leaf.
- Support search, mutation, insert, delete operations.
Each BK-tree is limited to ~20 species per leaf for efficiency. Overflow triggers Octree subdivision.
| Operation | Time Complexity |
|---|---|
| Insert | O(log N + L) |
| Delete | O(log N + L) |
| Mutate | O(log N + L) |
| Search | O(log N + L) |
| Space (Total) | O(N × L) |
Where:
N= total number of speciesL= sequence length (fixed)M= max species in a BK-tree (≤ 20)
We evaluated other structures:
❌ Single massive BK-Tree: Doesn’t scale well with large datasets.
❌ Graph database: Too slow for similarity search; overkill for our needs.
Hybrid trees can simplify search logic at large scale.
Frontend integration needs early planning.
Rayyan Shah
Amn Naqvi
Tanzeel Shahid
Fatima Aijaz
Simhash GitHub Library used for similarity hashing
Academic discussions with DS2 Faculty