Run benchmarks with:
cargo bench --bench stemmer_bench| Scenario | Typical Latency | Notes |
|---|---|---|
| Cache hit (warm) | ~50 ns | After first stem of a word |
| Single word (cold) | ~1-5 µs | First time seeing a word |
| 10,000-word batch (hot) | ~5 ms | With cache hits |
The hot path uses:
SmallVecfor small candidate lists (stack-allocated)&strslices instead ofStringcopies- No heap allocations during stemming
The stemmer uses DashMap for O(1) concurrent cache lookups:
- Lock-free sharded hashmap
- Safe to share across threads
- Automatic cache warming
When using an FST dictionary:
- O(1) amortised lookup via mmap
- Memory-mapped file access (no loading overhead)
- Zero-copy lookups
- Base stemmer: ~1 MB (code + static data)
- Cache: Grows with unique words (typically < 10 MB for 100K unique words)
- FST dictionary: ~2-5 MB (depending on dictionary size)
The stemmer scales linearly with input size:
use harmorp::IndonesianStemmer;
let stemmer = IndonesianStemmer::new();
let words: Vec<String> = (0..100_000).map(|_| "membaca".to_string()).collect();
let stems = stemmer.stem_batch(&words);
// ~50 ms for 100,000 words (with cache)To profile performance:
cargo bench --bench stemmer_bench -- --profile-time=10Or with flamegraph:
cargo install flamegraph
cargo flamegraph --bench stemmer_bench- Reuse stemmer instances: The cache is per-instance
- Use batch processing: More efficient than individual calls
- Enable FST dictionary: Better accuracy with minimal overhead
- Warm cache: Process common words first to populate cache
Python bindings have similar performance characteristics:
- ~100-200 ns cache hit (Python overhead)
- ~2-10 µs cold stem
- Batch processing recommended for large datasets
import harmorp
stemmer = harmorp.Stemmer()
# Use stem_batch for multiple words
stems = stemmer.stem_batch(["membaca", "menulis"] * 1000)