Skip to content

Conversation

@Simonsays095
Copy link
Contributor

Fixes MFDNN-14247. Adds some int4/int8 dynamic quantization kernels with 4-byte alignment to cover low-alignment cases performantly.

Example (before and after):

Output template: perf,%engine%,%impl%,%name%,%prb%,%Gops%,%+ctime%,%-time%,%-Gflops%,%0time%,%0Gflops%
perf,gpu,jit:gemm:any,,--mode=P --matmul --engine=gpu --dt=f16:s8:f16 --strides=3420x1:1x3420:1280x1 --bia-dt=f16 --attr-scales=wei:per_oc:f16 --attr-post-ops=add:f16:3:ab --attr-scratchpad=user --attr-fpmath=f16:true 9048x3420:3420x1280,79.217,1165.27,11.8633,6677.47,11.919,6646.28
perf,gpu,jit:gemm:any,,--mode=P --matmul --engine=gpu --dt=f16:s8:f16 --strides=3420x1:1x3420:1280x1 --bia-dt=f16 --attr-scales=wei:per_oc:f16 --attr-post-ops=add:f16:3:ab --attr-scratchpad=user --attr-fpmath=f16:true 9048x3420:3420x1280,79.217,568.56,0.831145,95310.7,1.06991,74040.9

@Simonsays095 Simonsays095 requested a review from a team as a code owner November 3, 2025 23:39
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Nov 3, 2025
@Simonsays095
Copy link
Contributor Author

make test
disable test_device_cpu
disable build_cpu_runtime_omp
disable build_cpu_runtime_sycl
disable build_cpu_runtime_tbb
enable arch_gpu_xe2-lpg
enable arch_gpu_xe3-lpg
disable benchdnn_all
enable benchdnn_ip
enable benchdnn_matmul
enable benchdnn_rnn

@Simonsays095
Copy link
Contributor Author

make test perf-gpu
set primitive=gpu:gemm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants