Skip to content

benchdnn: prohibit --mode=CP#4950

Draft
kwieloch-intel wants to merge 7 commits intouxlfoundation:mainfrom
kwieloch-intel:skip-fill-mode-p
Draft

benchdnn: prohibit --mode=CP#4950
kwieloch-intel wants to merge 7 commits intouxlfoundation:mainfrom
kwieloch-intel:skip-fill-mode-p

Conversation

@kwieloch-intel
Copy link
Copy Markdown
Contributor

@kwieloch-intel kwieloch-intel commented Apr 3, 2026

This PR prohibits --mode=CP in benchdnn and auto-enables no_ref_memory for --mode=P on Intel GPUs to align performance data filling with --mode=F.

JIRA: MFDNN-14789


Problem description

Two issues with benchdnn performance mode data filling on Intel GPUs:

  1. --mode=CP is broken: correctness + performance combined mode is unused and currently broken on both CPU and GPU. More importantly, correctness validation requires specific low-range data filling (less randomness in mantissa to minimize round-off errors), which makes the data compressible by the GPU driver and produces unreliable performance numbers.

  2. --mode=P vs --mode=F data mismatch: in --mode=P, fill_random_real_dense() overwrites GPU buffers with deterministic CPU-generated data via reorder(), replacing the incompressible Philox PRNG data written by gpu_fill_random() during dnn_mem_t::initialize(). This leads to different (and potentially compressible) data patterns compared to --mode=F.


Proposed Solution

Two changes in engine_t constructor (dnnl_common.cpp):

  1. Prohibit --mode=CP: if both corr and perf mode bits are set, emit an error message and fail. Users should run --mode=C and --mode=P (or --mode=F) separately.

  2. Auto-enable no_ref_memory for --mode=P on Intel GPUs: when running perf-only mode (perf && !corr) with an Intel GPU engine, automatically set the no_ref_memory modifier. This skips CPU reference memory allocation and fill_random_real_dense() calls, so GPU buffers retain incompressible Philox PRNG data, aligning --mode=P data filling with --mode=F.

    For CPU, --mode=F uses 0x3F memset (different from --mode=P which uses fill_random_real_dense), so the existing behavior is intentionally preserved, no change for CPU.


Example

Before: --mode=CP silently runs but produces NaN in correctness output (broken) and unreliable perf numbers:

> benchdnn --mode=CP --matmul --engine=gpu --dt=f16 128x128:128x128

[   0][DST][0:0] exp_f32:     2438.02 exp:        2438 got:        -nan diff:     nan rdiff:     nan
[   1][DST][0:1] exp_f32:     115.184 exp:     115.188 got:        -nan diff:     nan rdiff:     nan
[   2][DST][0:2] exp_f32:     258.576 exp:       258.5 got:        -nan diff:     nan rdiff:     nan
...
perf,gpu,jit:gemm:any,,--mode=CP --matmul --engine=gpu --dt=f16:f16:f16 128x128:128x128,0.0041943,...
tests:1 passed:1 skipped:0 ... failed:0

After: --mode=CP is explicitly prohibited:

> benchdnn --mode=CP --matmul --engine=gpu --dt=f16 128x128:128x128

Error: --mode=CP is not supported. Use --mode=C and --mode=P (or --mode=F) separately.

@github-actions github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Apr 3, 2026
@kwieloch-intel kwieloch-intel changed the title benchdnn: prohibits --mode=CP benchdnn: prohibit --mode=CP Apr 3, 2026
&& !has_bench_mode_bit(mode_bit_t::corr)) {
bench_mode_modifier |= mode_modifier_t::no_ref_memory;
}
#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those parts to be handled in utils/parser.cpp, not here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fbfda9c I’ve moved it to utils/parser.cpp.

&& !has_bench_mode_bit(mode_bit_t::corr)) {
bench_mode_modifier |= mode_modifier_t::no_ref_memory;
}
#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I was not quite clear in my intentions. There's already logic handling this: https://github.com/uxlfoundation/oneDNN/blob/main/tests/benchdnn/utils/parser.cpp#L1568

In this function you'll find places for both updates you want to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:tests Codeowner: @oneapi-src/onednn-arch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants