Skip to content

Add u8 fast path for some blend modes#1653

Open
LaurenzV wants to merge 1 commit into
mainfrom
laurenz/fast_blend
Open

Add u8 fast path for some blend modes#1653
LaurenzV wants to merge 1 commit into
mainfrom
laurenz/fast_blend

Conversation

@LaurenzV
Copy link
Copy Markdown
Collaborator

Mostly generated with codex, but I did look at it myself and make some adjustments, so I hope it's good now. Since we do have quite a few tests for blending (both manual ones as well as via COLR), not too concerned about correctness issues here.

Note that this does not address #1579 yet so it's possible this won't have much effect on AVX2. However, on NEON I'm seeing 4x-5x speedups for blending now:

fine/blend/normal_u8_neon
                        time:   [40.096 ns 40.297 ns 40.521 ns]
                        change: [-1.1748% +0.8265% +3.1392%] (p = 0.45 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

fine/blend/multiply_u8_neon
                        time:   [976.39 ns 978.78 ns 981.34 ns]
                        change: [-80.790% -80.728% -80.664%] (p = 0.00 < 0.05)
                        Performance has improved.

fine/blend/screen_u8_neon
                        time:   [1.0140 µs 1.0167 µs 1.0199 µs]
                        change: [-80.496% -80.407% -80.320%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

fine/blend/overlay_u8_neon
                        time:   [1.3631 µs 1.3667 µs 1.3701 µs]
                        change: [-74.682% -74.592% -74.499%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

fine/blend/darken_u8_neon
                        time:   [1.1359 µs 1.1385 µs 1.1412 µs]
                        change: [-77.273% -77.197% -77.125%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

fine/blend/lighten_u8_neon
                        time:   [1.1535 µs 1.1557 µs 1.1582 µs]
                        change: [-77.013% -76.936% -76.857%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

fine/blend/color_dodge_u8_neon
                        time:   [5.6951 µs 5.7070 µs 5.7195 µs]
                        change: [+1.6232% +1.9789% +2.3529%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

fine/blend/color_burn_u8_neon
                        time:   [5.6208 µs 5.6334 µs 5.6466 µs]
                        change: [+1.5668% +1.9000% +2.2646%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

fine/blend/hard_light_u8_neon
                        time:   [1.3581 µs 1.3602 µs 1.3626 µs]
                        change: [-75.426% -75.345% -75.265%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

fine/blend/soft_light_u8_neon
                        time:   [6.0497 µs 6.0630 µs 6.0768 µs]
                        change: [+1.2844% +1.7849% +2.2700%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

fine/blend/difference_u8_neon
                        time:   [1.2694 µs 1.2720 µs 1.2747 µs]
                        change: [-75.605% -75.514% -75.423%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

fine/blend/exclusion_u8_neon
                        time:   [1.0596 µs 1.0614 µs 1.0634 µs]
                        change: [-80.316% -80.250% -80.184%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

fine/blend/hue_u8_neon  time:   [8.5128 µs 8.5387 µs 8.5659 µs]
                        change: [+1.5041% +1.9534% +2.4143%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

fine/blend/saturation_u8_neon
                        time:   [8.5693 µs 8.6052 µs 8.6431 µs]
                        change: [+1.7844% +2.2460% +2.6872%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

fine/blend/color_u8_neon
                        time:   [7.5338 µs 7.5591 µs 7.5869 µs]
                        change: [+0.7948% +1.2293% +1.6653%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

fine/blend/luminosity_u8_neon
                        time:   [7.5325 µs 7.5531 µs 7.5772 µs]
                        change: [+0.8659% +1.3864% +1.8684%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

@LaurenzV LaurenzV requested a review from grebmeg May 16, 2026 13:15
@LaurenzV LaurenzV force-pushed the laurenz/fast_blend branch from 3220f1a to 37cf4ab Compare May 16, 2026 13:19
@LaurenzV LaurenzV force-pushed the laurenz/fast_blend branch from 37cf4ab to d46b2c0 Compare May 16, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant