Skip to content

Conversation

@tenpercent
Copy link
Contributor

@tenpercent tenpercent commented Jan 16, 2026

Summary

  • Rewrite sequence_map_inverse using O(1) depth pack expansion
  • Replace O(N) recursive calculate_element_space_size with fold expression
  • Replace sequence_merge O(log N) recursion with O(1) fold expression

Results

Pattern Before After Reduction
sequence_map_inverse 45 inst, 187ms 10 inst, 10ms 95%
calculate_element_space_size 24 inst, 35ms 10 inst, 9ms 73%
sequence_merge O(log N) depth O(1) depth -

Test Plan

  • Waiting for full CI

PR Stack

# PR Description
1 #3585 sequence_gen with __make_integer_seq
2 #3588 generate_identity_sequences helper
3 #3589 Named functors in transform_tensor_descriptor
4 #3590 container_concat optimization
5 #3596 O(1) pack expansion rewrites
6 #3600 TensorDescriptor/TensorAdaptor lambda elimination

Replace O(N) recursive template sequence_map_inverse_impl with
constexpr function and pack expansion for O(1) template depth.

Results:
- sequence_map_inverse: 45 instances, 187ms → 7 instances, 10ms (95% reduction)
Use pack expansion with fold expression to compute element space size
instead of recursive template or recursive lambda.

Results:
- calculate_element_space_size: 24 instances, 35ms → 10 instances, 9ms
- Max template depth: 24 → 23
Use operator| with fold expression (Seqs{} | ...) to merge sequences
in O(1) template depth instead of O(log N) binary tree recursion.

- Reduces sequence_merge instantiations from 449 to 167 (63% reduction)
- Total template instantiations: 47,186 → 46,974 (-212)
- ADL finds operator| since Sequence is in ck namespace
@tenpercent tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 887bdf2 to 02e42dc Compare January 17, 2026 03:51
@tenpercent tenpercent force-pushed the mpodkory/recursive-to-pack-expansion branch from f5ada17 to 9942fd6 Compare January 17, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants