[RVV] add rvv f32 kernels for velu, vgelu, vapproxgelu, ibilinear, ppmm, qc8w-gemm by velonica0 · Pull Request #9954 · google/XNNPACK

velonica0 · 2026-04-13T07:27:15Z

Add rvv f32 kernels for velu, vgelu, vapproxgelu, ibilinear, ppmm, qc8w-gemm.

Tested on SpacemiT K1 and K3 CPU(both VLEN=256).

Operator	Workload	K1 Scalar (ns)	K1 RVV (ns)	K1 Speedup	K3 Scalar (ns)	K3 RVV (ns)	K3 Speedup
f32-velu	N:3840	107,989	12,885	8.4x	49,380	6,904	7.2x
f32-velu	N:32640	931,925	167,903	5.6x	410,432	58,601	7.0x
f32-vgelu	N:3840	211,944	22,137	9.6x	86,832	14,923	5.8x
f32-vgelu	N:32640	1,807,096	225,792	8.0x	801,698	126,842	6.3x
f32-vapproxgelu	N:3840	212,227	22,182	9.6x	86,038	14,924	5.8x
f32-vapproxgelu	N:32640	1,805,752	222,769	8.1x	793,539	126,809	6.3x
f32-ibilinear	C:256	1,198,610	238,184	5.0x	492,210	82,999	5.9x
f32-ibilinear	C:48	1,193,605	271,030	4.4x	461,410	76,612	6.0x
f32-ibilinear	C:24	3,241,163	805,143	4.0x	1,268,864	279,156	4.5x
f32-ppmm	ALBERT	153,322,965	29,111,495	5.3x	77,835,415	9,417,268	8.3x
f32-ppmm	MobileBERT	26,477,852	5,963,739	4.4x	12,942,549	1,552,870	8.3x

Next, I will continue with the RVV optimization of the FP16 operator.

…ar, f32-qc8w-gemm and f32-ppmm

google-cla · 2026-04-13T07:27:19Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

velonica0 · 2026-04-13T08:15:22Z

Hi, @dsharlet
Could you please take a look at this when you have a moment? Thank you!

dsharlet

Thanks for the PR!

This is a pretty big PR with a wide variety of ops in it. I think this should be split into a few smaller PRs:

ppmm kernel
f32-qc8w kernel
elementwise ops

Regarding the f32-qc8w kernel, what use case motivated implementing that kernel? It is not something we use much currently, and the operator code for that type of gemm has some issues we need to fix.

velonica0 · 2026-04-14T02:18:48Z

Thank you very much for your review.

This is a pretty big PR with a wide variety of ops in it. I think this should be split into a few smaller PRs:

The separated PR are #9962 #9963 #9964

Regarding the f32-qc8w kernel, what use case motivated implementing that kernel? It is not something we use much currently, and the operator code for that type of gemm has some issues we need to fix.

Sorry, I see qd8-f32-qc8w-gemm, so I deleted f32-qc8w-gemm.

add rvv kernels for f32-velu, f32-vgelu, f32-vapproxgelu, f32-ibiline…

7df32ea

…ar, f32-qc8w-gemm and f32-ppmm

dsharlet reviewed Apr 13, 2026

View reviewed changes

remove f32-qc8w-gemm

7279c2d

This was referenced Apr 14, 2026

[RVV] add rvv f32 kernel for ppmm #9962

Open

[RVV] add rvv f32 kernels for velu, vgelu, vapproxgelu #9963

Merged

[RVV] add rvv f32 kernel for ibilinear #9964

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RVV] add rvv f32 kernels for velu, vgelu, vapproxgelu, ibilinear, ppmm, qc8w-gemm#9954

[RVV] add rvv f32 kernels for velu, vgelu, vapproxgelu, ibilinear, ppmm, qc8w-gemm#9954
velonica0 wants to merge 2 commits intogoogle:masterfrom
velonica0:rvv-fp32-kernel

velonica0 commented Apr 13, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented Apr 13, 2026

Uh oh!

velonica0 commented Apr 13, 2026

Uh oh!

dsharlet left a comment

Uh oh!

velonica0 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

velonica0 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla Bot commented Apr 13, 2026

Uh oh!

velonica0 commented Apr 13, 2026

Uh oh!

dsharlet left a comment

Choose a reason for hiding this comment

Uh oh!

velonica0 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

velonica0 commented Apr 13, 2026 •

edited

Loading