Commit c2a9b82
add support for batched queue submissions (#19769)
Adding a new feature: batched queue submissions.
Batched queues enable submission of operations to the driver in batches,
therefore reducing the overhead of submitting every single operation
individually. Similarly to command buffers in L0v2, they use regular
command lists (later referenced as 'batches'). Operations enqueued on
regular command lists are not executed immediately, but only after
enqueueing the regular command list on an immediate command list.
However, in contrast to command buffers, batched queues also handle
submission of batches (regular command lists) instead of only collecting
enqueued operations, by using an internal immediate command list.
Batched queues introduce:
- batch_manager stores the current batch, the command list manager with
an immediate command list for batch submissions, the vector of submitted
batches, the generation number of the current batch.
- The current batch is a command list manager with a regular command
list; operations requested by users are enqueued on the current batch.
The current batch may be submitted for execution on the immediate
command list, replaced by a new regular command list and stored for
execution completion in the vector of submitted batches.
- The number of regular command lists stored for execution is limited.
- The generation number of the current batch is assigned to events
associated with operations enqueued on the given batch. It is
incremented during every replacement of the current batch. When an event
created by a batched queue appears in an eventWaitList, the batch
assigned to the given event might not have been executed yet and the
event might never be signalled. Comparing generation numbers enables
determining whether the current batch should be submitted for execution.
If the generation number of the current batch is higher than the number
assigned to the given event, the batch associated with the event has
already been submitted for execution and additional submission of the
current batch is not needed.
- Regular command lists use the regular pool cache type, whereas
immediate command lists use the immediate pool cache type. Since
user-requested operations are enqueued on regular command lists and
immediate command lists are only used internally by the batched queue
implementation, events are not created for immediate command lists (in
most cases; see below).
- When a user requests the command list manager to enqueue a command
buffer, the regular command list from the command buffer is appended to
the command list of the given command list manager. Since regular
command lists cannot be enqueued on other regular command lists, but
only on immediate command lists, enqueueing command buffers must be
performed on an immediate command list. Therefore, an additional event
pool with the immediate cache type is introduced in order to provide
events for operations requested by users and enqueued directly on an
immediate command list.
- wait_list_view is modified. Previously, it only stored the waitlist
(as a ze_event_handle buffer created from events) and the corresponding
event count in a single container, which could be passed as an argument
to the driver API. Currently, the constructor also ensures that all
associated operations will eventually be executed. Since regular command
lists are not executed immediately, but only after enqueueing on
immediate lists, it is necessary to enqueue the regular command list
associated with the given event. Otherwise, the event would never be
signalled.
Additionally, support for UR_QUEUE_INFO_FLAGS in urQueueGetInfo has been
added for native CPU, which is required by the enqueueTimestampRecording
tests. Currently, enqueueTimestampRecording is not supported by batched
queues.
Batched queues can be enabled by setting
UR_QUEUE_FLAG_SUBMISSION_BATCHED in ur_queue_flags_t or globally,
through the environment variable UR_L0_V2_FORCE_BATCHED=1.
Batched queues are intended to improve performance on platforms, where
eager submission is not efficient due to driver limitations. Such
hardware includes Xe (and older GPUs) on Windows. There are also
workloads which benefit from batched submissions (e.g., dl-cifar). SYCL
graphs should be preferred for new software, since they allow for better
control of grouped commands submissions.
Benchmark results for default in-order queues (sycl branch, commit hash:
b76f12e554760c3fcfc55f1f815a76b0d8b208ad) and batched queues:
api_overhead_benchmark_ur SubmitKernel in order: 20.839 μs
api_overhead_benchmark_ur SubmitKernel batched: 12.183 μs1 parent e64444b commit c2a9b82
File tree
23 files changed
+2376
-471
lines changed- scripts/templates
- source/adapters
- level_zero
- v2
- native_cpu
- test
- adapters/level_zero/v2
- conformance
- enqueue
- queue
- testing/include/uur
23 files changed
+2376
-471
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | | - | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
| 174 | + | |
174 | 175 | | |
175 | 176 | | |
176 | 177 | | |
| |||
187 | 188 | | |
188 | 189 | | |
189 | 190 | | |
| 191 | + | |
190 | 192 | | |
191 | 193 | | |
192 | 194 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
328 | 329 | | |
329 | 330 | | |
330 | 331 | | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
331 | 335 | | |
332 | 336 | | |
333 | | - | |
| 337 | + | |
334 | 338 | | |
335 | 339 | | |
336 | 340 | | |
| |||
353 | 357 | | |
354 | 358 | | |
355 | 359 | | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
356 | 363 | | |
357 | | - | |
| 364 | + | |
358 | 365 | | |
359 | 366 | | |
360 | 367 | | |
| |||
380 | 387 | | |
381 | 388 | | |
382 | 389 | | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
383 | 393 | | |
384 | | - | |
385 | | - | |
| 394 | + | |
| 395 | + | |
386 | 396 | | |
387 | 397 | | |
388 | 398 | | |
| |||
407 | 417 | | |
408 | 418 | | |
409 | 419 | | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
410 | 423 | | |
411 | | - | |
412 | | - | |
| 424 | + | |
| 425 | + | |
413 | 426 | | |
414 | 427 | | |
415 | 428 | | |
| |||
432 | 445 | | |
433 | 446 | | |
434 | 447 | | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
435 | 451 | | |
436 | | - | |
437 | | - | |
| 452 | + | |
| 453 | + | |
438 | 454 | | |
439 | 455 | | |
440 | 456 | | |
| |||
461 | 477 | | |
462 | 478 | | |
463 | 479 | | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
464 | 483 | | |
465 | 484 | | |
466 | | - | |
467 | | - | |
| 485 | + | |
| 486 | + | |
468 | 487 | | |
469 | 488 | | |
470 | 489 | | |
| |||
491 | 510 | | |
492 | 511 | | |
493 | 512 | | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
494 | 516 | | |
495 | 517 | | |
496 | | - | |
497 | | - | |
| 518 | + | |
498 | 519 | | |
499 | 520 | | |
500 | 521 | | |
| |||
522 | 543 | | |
523 | 544 | | |
524 | 545 | | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
525 | 549 | | |
526 | 550 | | |
527 | | - | |
528 | | - | |
| 551 | + | |
529 | 552 | | |
530 | 553 | | |
531 | 554 | | |
| |||
548 | 571 | | |
549 | 572 | | |
550 | 573 | | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
551 | 577 | | |
552 | | - | |
553 | | - | |
| 578 | + | |
| 579 | + | |
554 | 580 | | |
555 | 581 | | |
556 | 582 | | |
| |||
572 | 598 | | |
573 | 599 | | |
574 | 600 | | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
575 | 604 | | |
576 | | - | |
577 | | - | |
| 605 | + | |
| 606 | + | |
578 | 607 | | |
579 | 608 | | |
580 | 609 | | |
| |||
598 | 627 | | |
599 | 628 | | |
600 | 629 | | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
601 | 633 | | |
602 | | - | |
| 634 | + | |
603 | 635 | | |
604 | 636 | | |
605 | 637 | | |
| |||
622 | 654 | | |
623 | 655 | | |
624 | 656 | | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
625 | 660 | | |
626 | | - | |
| 661 | + | |
627 | 662 | | |
628 | 663 | | |
629 | 664 | | |
| |||
672 | 707 | | |
673 | 708 | | |
674 | 709 | | |
675 | | - | |
676 | | - | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
677 | 715 | | |
678 | 716 | | |
679 | 717 | | |
680 | 718 | | |
| 719 | + | |
681 | 720 | | |
682 | 721 | | |
683 | | - | |
| 722 | + | |
684 | 723 | | |
685 | 724 | | |
686 | 725 | | |
| |||
0 commit comments