Add ROCm support for H100 tests#2202
Conversation
|
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
|
existing tests are failing, can we solve them first? |
It seems ROCm runner ran out of disk space. I've shared this issue with our runner team. |
|
The H100 tests are working for ROCm and CI for ROCm is enabled. However, we also wanted to change the name of the H100 test to something generic that goes with ROCm as well. The reason the tests were named H100 because there are some features that's only supported on H100, including async TP with symmetric memory, Float8 quantization. Hence the test name is H100. As of now ROCm supports FP8 quant but does not support sync TP with symmetric memory. We are working on supporting sync TP with symmetric memory for ROCm. Hence, moved the PR to draft. Once the support is added we'll open the PR again and think about changing the test name. |
…workflow such that it now takes runner labels as input and override the default runner labels.
f5e56c0 to
a58cb19
Compare
This PR adds ROCm support for H100 tests.