Hi Team, we are trying to run the HSTU benchmarks on Nvidia RTX PRO 6000 blackwell chips by following this guide link and this link
but while running we are coming across a lot of errors like
fea:/workspace/recsys-examples/examples/hstu# python3 ./inference/benchmark/inference_benchmark.py
/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/dynamicemb_config.py:335: UserWarning: max_capacity is changed to 10112 from 10000
warnings.warn(
Traceback (most recent call last):
File "/workspace/recsys-examples/examples/hstu/./inference/benchmark/inference_benchmark.py", line 158, in <module>
run_ranking_gr_inference(args.disable_kvcache)
File "/workspace/recsys-examples/examples/hstu/./inference/benchmark/inference_benchmark.py", line 110, in run_ranking_gr_inference
model_predict = get_inference_ranking_gr(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/recsys-examples/examples/hstu/model/inference_ranking_gr.py", line 151, in get_inference_ranking_gr
inference_sparse = InferenceEmbedding(
^^^^^^^^^^^^^^^^^^^
File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 255, in __init__
self._dynamic_embedding_collection = create_embedding_collection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 189, in create_embedding_collection
return InferenceDynamicEmbeddingCollection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 104, in __init__
self._embedding_tables = create_dynamic_embedding_tables(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/recsys-examples/examples/hstu/modules/inference_embedding.py", line 86, in create_dynamic_embedding_tables
return BatchedDynamicEmbeddingTablesV2(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/batched_dynamicemb_tables.py", line 416, in __init__
self._create_score()
File "/usr/local/lib/python3.12/dist-packages/dynamicemb-0.0.1-py3.12-linux-x86_64.egg/dynamicemb/batched_dynamicemb_tables.py", line 983, in _create_score
self._scores[table_name] = device_timestamp()
^^^^^^^^^^^^^^^^^^
RuntimeError: cudaCheckError() failed at /workspace/recsys-examples/corelib/dynamicemb/src/torch_utils.cu:38 : no kernel image is available for execution on the device
The SM version of RTX Pro 6000 is SM120, we tried updating the docker file with sm 12.0, but still we got the above errors.
Wanted to check if for HSTU inference benchmarking is supported on Blackwell SM120 right now? If yes, is there any workaround to get this working? or is currently in works. @shijieliu , @JacoCheung or anyone, any thoughts.
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.
Hi Team, we are trying to run the HSTU benchmarks on Nvidia RTX PRO 6000 blackwell chips by following this guide link and this link
but while running we are coming across a lot of errors like
The SM version of RTX Pro 6000 is SM120, we tried updating the docker file with sm 12.0, but still we got the above errors.
Wanted to check if for HSTU inference benchmarking is supported on Blackwell SM120 right now? If yes, is there any workaround to get this working? or is currently in works. @shijieliu , @JacoCheung or anyone, any thoughts.
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.