Skip to content

[RVV] add rvv f32 kernels for velu, vgelu, vapproxgelu, ibilinear, ppmm, qc8w-gemm#9954

Open
velonica0 wants to merge 2 commits intogoogle:masterfrom
velonica0:rvv-fp32-kernel
Open

[RVV] add rvv f32 kernels for velu, vgelu, vapproxgelu, ibilinear, ppmm, qc8w-gemm#9954
velonica0 wants to merge 2 commits intogoogle:masterfrom
velonica0:rvv-fp32-kernel

Conversation

@velonica0
Copy link
Copy Markdown
Contributor

@velonica0 velonica0 commented Apr 13, 2026

Add rvv f32 kernels for velu, vgelu, vapproxgelu, ibilinear, ppmm, qc8w-gemm.

Tested on SpacemiT K1 and K3 CPU(both VLEN=256).

Operator Workload K1 Scalar (ns) K1 RVV (ns) K1 Speedup K3 Scalar (ns) K3 RVV (ns) K3 Speedup
f32-velu N:3840 107,989 12,885 8.4x 49,380 6,904 7.2x
f32-velu N:32640 931,925 167,903 5.6x 410,432 58,601 7.0x
f32-vgelu N:3840 211,944 22,137 9.6x 86,832 14,923 5.8x
f32-vgelu N:32640 1,807,096 225,792 8.0x 801,698 126,842 6.3x
f32-vapproxgelu N:3840 212,227 22,182 9.6x 86,038 14,924 5.8x
f32-vapproxgelu N:32640 1,805,752 222,769 8.1x 793,539 126,809 6.3x
f32-ibilinear C:256 1,198,610 238,184 5.0x 492,210 82,999 5.9x
f32-ibilinear C:48 1,193,605 271,030 4.4x 461,410 76,612 6.0x
f32-ibilinear C:24 3,241,163 805,143 4.0x 1,268,864 279,156 4.5x
f32-ppmm ALBERT 153,322,965 29,111,495 5.3x 77,835,415 9,417,268 8.3x
f32-ppmm MobileBERT 26,477,852 5,963,739 4.4x 12,942,549 1,552,870 8.3x

Next, I will continue with the RVV optimization of the FP16 operator.

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 13, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@velonica0
Copy link
Copy Markdown
Contributor Author

Hi, @dsharlet
Could you please take a look at this when you have a moment? Thank you!

Copy link
Copy Markdown
Collaborator

@dsharlet dsharlet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This is a pretty big PR with a wide variety of ops in it. I think this should be split into a few smaller PRs:

  1. ppmm kernel
  2. f32-qc8w kernel
  3. elementwise ops

Regarding the f32-qc8w kernel, what use case motivated implementing that kernel? It is not something we use much currently, and the operator code for that type of gemm has some issues we need to fix.

@velonica0
Copy link
Copy Markdown
Contributor Author

Thank you very much for your review.

This is a pretty big PR with a wide variety of ops in it. I think this should be split into a few smaller PRs:

The separated PR are #9962 #9963 #9964

Regarding the f32-qc8w kernel, what use case motivated implementing that kernel? It is not something we use much currently, and the operator code for that type of gemm has some issues we need to fix.

Sorry, I see qd8-f32-qc8w-gemm, so I deleted f32-qc8w-gemm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants