Skip to content

[QST] HSTU support on Blackwell? #335

@depksingh

Description

@depksingh

Hi Team, we are trying to run the HSTU benchmarks on Nvidia RTX PRO 6000 blackwell chips by following this guide link and this link

but while running we are coming across a lot of errors like

fea:/workspace/recsys-examples/examples/hstu# python3 ./inference/benchmark/inference_benchmark.py
/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/dynamicemb_config.py:335: UserWarning: max_capacity is changed to 10112 from 10000
  warnings.warn(
Traceback (most recent call last):
  File "/workspace/recsys-examples/examples/hstu/./inference/benchmark/inference_benchmark.py", line 158, in <module>
    run_ranking_gr_inference(args.disable_kvcache)
  File "/workspace/recsys-examples/examples/hstu/./inference/benchmark/inference_benchmark.py", line 110, in run_ranking_gr_inference
    model_predict = get_inference_ranking_gr(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/model/inference_ranking_gr.py", line 151, in get_inference_ranking_gr
    inference_sparse = InferenceEmbedding(
                       ^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 255, in __init__
    self._dynamic_embedding_collection = create_embedding_collection(
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 189, in create_embedding_collection
    return InferenceDynamicEmbeddingCollection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 104, in __init__
    self._embedding_tables = create_dynamic_embedding_tables(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 86, in create_dynamic_embedding_tables
    return BatchedDynamicEmbeddingTablesV2(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/batched_dynamicemb_tables.py", line 416, in __init__
    self._create_score()
  File "/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/batched_dynamicemb_tables.py", line 983, in _create_score
    self._scores[table_name] = device_timestamp()
                               ^^^^^^^^^^^^^^^^^^
RuntimeError: cudaCheckError() failed at /workspace/recsys-examples/corelib/dynamicemb/src/torch_utils.cu:38 : no kernel image is available for execution on the device

The SM version of RTX Pro 6000 is SM120, we tried updating the docker file with sm 12.0, but still we got the above errors.

Wanted to check if for HSTU inference benchmarking is supported on Blackwell SM120 right now? If yes, is there any workaround to get this working? or is currently in works. @shijieliu , @JacoCheung or anyone, any thoughts.


By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.

Metadata

Metadata

Labels

questionFurther information is requested

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions