[QST] HSTU support on Blackwell?


Hi Team, we are trying to run the HSTU benchmarks on Nvidia RTX PRO 6000 blackwell chips by following this guide [link](https://github.com/NVIDIA/recsys-examples/blob/main/examples/hstu/inference/benchmark/README.md) and this [link](https://github.com/NVIDIA/recsys-examples/blob/main/examples/hstu/inference/README.md)

but while running we are coming across a lot of errors like

```
fea:/workspace/recsys-examples/examples/hstu# python3 ./inference/benchmark/inference_benchmark.py
/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/dynamicemb_config.py:335: UserWarning: max_capacity is changed to 10112 from 10000
  warnings.warn(
Traceback (most recent call last):
  File "/workspace/recsys-examples/examples/hstu/./inference/benchmark/inference_benchmark.py", line 158, in <module>
    run_ranking_gr_inference(args.disable_kvcache)
  File "/workspace/recsys-examples/examples/hstu/./inference/benchmark/inference_benchmark.py", line 110, in run_ranking_gr_inference
    model_predict = get_inference_ranking_gr(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/model/inference_ranking_gr.py", line 151, in get_inference_ranking_gr
    inference_sparse = InferenceEmbedding(
                       ^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 255, in __init__
    self._dynamic_embedding_collection = create_embedding_collection(
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 189, in create_embedding_collection
    return InferenceDynamicEmbeddingCollection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 104, in __init__
    self._embedding_tables = create_dynamic_embedding_tables(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 86, in create_dynamic_embedding_tables
    return BatchedDynamicEmbeddingTablesV2(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/batched_dynamicemb_tables.py", line 416, in __init__
    self._create_score()
  File "/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/batched_dynamicemb_tables.py", line 983, in _create_score
    self._scores[table_name] = device_timestamp()
                               ^^^^^^^^^^^^^^^^^^
RuntimeError: cudaCheckError() failed at /workspace/recsys-examples/corelib/dynamicemb/src/torch_utils.cu:38 : no kernel image is available for execution on the device
```

The SM version of RTX Pro 6000 is SM120, we tried updating the docker file with sm 12.0, but still we got the above errors.

Wanted to check if for HSTU inference benchmarking is supported on Blackwell SM120 right now? If yes, is there any workaround to get this working? or is currently in works. @shijieliu , @JacoCheung or anyone, any thoughts.


-----
By submitting this issue, you agree to follow our [code of conduct](https://docs.rapids.ai/resources/conduct/) and our [contributing guidelines](https://github.com/jarmak-nv/rapids-repo-template/blob/main/CONTRIBUTING.md).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] HSTU support on Blackwell? #335

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[QST] HSTU support on Blackwell? #335

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions