Speed up sparsity-pattern sort with packed integer keys#301
Merged
Conversation
Building the COO->CSC permutation sorted a Vector{Tuple{Int64,Int64}} of
all (row, col) index pairs (~92M for a 530k-DOF mesh) with sortperm,
which takes an O(N log N) tuple comparison sort. Profiling a Carina
torsion run showed this was ~half the runtime of both _update_dofs! and
the SparseMatrixPattern constructor (the existing TODO comment already
flagged it).
Pack each (i, j) into a single Int64 key, (Int64(i) << 32) | Int64(j),
so sortperm takes the integer radix-sort path instead. DOF indices are
well below 2^32, so the packed key preserves (i, j) lexicographic order;
sortperm being stable, the resulting permutation is identical.
Applied at both sites in SparsityPatterns.jl (SparseMatrixPattern
construction and _update_dofs!). On a 530k-DOF torsion problem this cut
assembler build 5.0s -> 3.4s and parameter build 7.4s -> 3.9s. Carina
test suite: 106/106 pass, all results bit-identical.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #301 +/- ##
=======================================
Coverage 55.00% 55.00%
=======================================
Files 55 55
Lines 4801 4801
=======================================
Hits 2641 2641
Misses 2160 2160 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
|
@lxmota if things passed on the GPU for your carina tests and these tests passed locally feel free to merge. I'm out of town and my gpu runners are offline right now. |
cmhamel
approved these changes
May 21, 2026
Contributor
Author
|
Merging as Carina's GPU tests pass and at the direction of @cmhamel |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Building the COO->CSC permutation sorted a Vector{Tuple{Int64,Int64}} of all (row, col) index pairs (~92M for a 530k-DOF mesh) with sortperm, which takes an O(N log N) tuple comparison sort. Profiling a Carina torsion run showed this was ~half the runtime of both _update_dofs! and the SparseMatrixPattern constructor (the existing TODO comment already flagged it).
Pack each (i, j) into a single Int64 key, (Int64(i) << 32) | Int64(j), so sortperm takes the integer radix-sort path instead. DOF indices are well below 2^32, so the packed key preserves (i, j) lexicographic order; sortperm being stable, the resulting permutation is identical.
Applied at both sites in SparsityPatterns.jl (SparseMatrixPattern construction and _update_dofs!). On a 530k-DOF torsion problem this cut assembler build 5.0s -> 3.4s and parameter build 7.4s -> 3.9s. Carina test suite: 106/106 pass, all results bit-identical.