Skip to content

Commit 058fb0b

Browse files
authored
Merge branch 'master' into gather_output
2 parents 761d1fb + 3054b93 commit 058fb0b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1314
-195
lines changed

.github/workflows/cpu-torch-latest.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
git clone https://github.com/huggingface/transformers
4343
cd transformers
4444
# if needed switch to the last known good SHA until transformers@master is fixed
45-
git checkout 981c276
45+
# git checkout 981c276
4646
git rev-parse --short HEAD
4747
pip install .
4848

.github/workflows/hpu-gaudi2-nightly.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ jobs:
4545
test_zero_leaf_module.py
4646
test_zero_offloadpp.py
4747
test_zero_tiled.py
48+
test_autotp_training.py
49+
test_ulysses.py
4850
4951
# Steps represent a sequence of tasks that will be executed as part of the job
5052
steps:

.github/workflows/hpu-gaudi2.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ jobs:
9494
test_zero_nesting_init.py
9595
test_zeropp.py
9696
(test_zero.py and (TestZero3ParamPartitioningLargeParam or TestZero3ParamPartitioningLargeParam))
97+
(test_linear.py and (TestLoRALinear or TestBasicLinear))
98+
(test_ctx.py and TestEngine)
9799
98100
# Steps represent a sequence of tasks that will be executed as part of the job
99101
steps:
@@ -112,7 +114,7 @@ jobs:
112114
git clone https://github.com/huggingface/transformers
113115
cd transformers
114116
# if needed switch to the last known good SHA until transformers@master is fixed
115-
git checkout 981c276
117+
# git checkout 981c276
116118
git rev-parse --short HEAD
117119
pip install .
118120

.github/workflows/nv-a6000.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
unit-tests:
2424
runs-on: [self-hosted, nvidia, a6000]
2525
container:
26-
image: nvcr.io/nvidia/pytorch:24.09-py3
26+
image: nvcr.io/nvidia/pytorch:24.12-py3
2727
ports:
2828
- 80
2929
options: --gpus all --shm-size "8G"
@@ -43,7 +43,7 @@ jobs:
4343
git clone https://github.com/huggingface/transformers
4444
cd transformers
4545
# if you need to use an older transformers version temporarily in case of breakage
46-
git checkout 981c276
46+
# git checkout 981c276
4747
git rev-parse --short HEAD
4848
python -m pip install .
4949
- name: Install deepspeed
@@ -58,8 +58,8 @@ jobs:
5858
run: |
5959
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
6060
cd tests
61-
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2' unit/ --torch_ver="2.5" --cuda_ver="12"
62-
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2_ops' unit/ --torch_ver="2.5" --cuda_ver="12"
61+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2' unit/ --torch_ver="2.6" --cuda_ver="12"
62+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'inference_v2_ops' unit/ --torch_ver="2.6" --cuda_ver="12"
6363
- name: MII unit tests
6464
run: |
6565
BRANCH="main"

.github/workflows/nv-flash-attn.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
unit-tests:
1919
runs-on: [self-hosted, nvidia, a6000]
2020
container:
21-
image: nvcr.io/nvidia/pytorch:24.09-py3
21+
image: nvcr.io/nvidia/pytorch:24.12-py3
2222
ports:
2323
- 80
2424
options: --gpus all --shm-size "8G"
@@ -53,7 +53,7 @@ jobs:
5353
run: |
5454
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
5555
cd tests
56-
python -m pytest --color=yes --durations=0 --verbose -rF unit/sequence_parallelism/test_ulysses.py --torch_ver="2.5" --cuda_ver="12"
56+
python -m pytest --color=yes --durations=0 --verbose -rF unit/sequence_parallelism/test_ulysses.py --torch_ver="2.6" --cuda_ver="12"
5757
- name: Open GitHub issue if nightly CI fails
5858
if: ${{ failure() && (github.event_name == 'schedule') }}
5959
uses: JasonEtco/create-an-issue@v2

.github/workflows/nv-human-eval.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
unit-tests:
1212
runs-on: [self-hosted, nvidia, a6000]
1313
container:
14-
image: nvcr.io/nvidia/pytorch:24.09-py3
14+
image: nvcr.io/nvidia/pytorch:24.12-py3
1515
ports:
1616
- 80
1717
options: --gpus all --shm-size "8G"
@@ -50,4 +50,4 @@ jobs:
5050
run: |
5151
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
5252
cd tests
53-
python -m pytest --color=yes --durations=0 --verbose -rF -m 'evaluation' -k "test_human_eval" unit/ --torch_ver="2.5" --cuda_ver="12"
53+
python -m pytest --color=yes --durations=0 --verbose -rF -m 'evaluation' -k "test_human_eval" unit/ --torch_ver="2.6" --cuda_ver="12"

.github/workflows/nv-torch-nightly-v100.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
git clone https://github.com/huggingface/transformers
3838
cd transformers
3939
# if needed switch to the last known good SHA until transformers@master is fixed
40-
git checkout 981c276
40+
# git checkout 981c276
4141
git rev-parse --short HEAD
4242
pip install .
4343

CONTRIBUTING.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -48,16 +48,15 @@ pytest run_sanity_check.py
4848
```
4949
Note that the `--forked` flag is not necessary for the model tests.
5050

51-
## Contributor License Agreement
52-
This project welcomes contributions and suggestions. Most contributions require you to
53-
agree to a Contributor License Agreement (CLA) declaring that you have the right to, and
54-
actually do, grant us the rights to use your contribution. For details, visit
55-
https://cla.opensource.microsoft.com.
56-
57-
When you submit a pull request, a CLA bot will automatically determine whether you need
58-
to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply
59-
follow the instructions provided by the bot. You will only need to do this once across
60-
all repos using our CLA.
51+
## Developer Certificate of Origin
52+
This project welcomes contributions and suggestions. All contributions to deepspeedai projects
53+
require commits to be signed off with a [Developer Certificate of Origin](https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin)
54+
(DCO) declaring that you have the right to, and actually do, grant us the rights to use your contribution.
55+
56+
When you submit a pull request, the DCO app will check for the presence of signed commits.
57+
Information about how this check works is here: https://github.com/dcoapp/app?tab=readme-ov-file#how-it-works
58+
59+
To sign commits, you will need to include `-s` when running `git commit`. For example, `git commit -s -m "Commit message"`. One note, creating PRs via the GitHub interface do not appear to include this option. If you forget this, clicking on the failing check in your PR will point you to commands you can run to rebase and sign previous commits.
6160

6261
## Code of Conduct
6362
This project has adopted the [Microsoft Open Source Code of

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
## Latest News
1717
<b> <span style="color:orange" > DeepSpeed empowers ChatGPT-like model training with a single click, offering 15x speedup over SOTA RLHF systems with unprecedented cost reduction at all scales; [learn how](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-chat)</span>.</b>
18-
18+
* [2025/03] [DeepSpeed-AutoTP: Automatic Tensor Parallel Training of Hugging Face models](https://github.com/deepspeedai/DeepSpeed/blob/master/blogs/huggingface-tp/README.md)
1919
* [2024/12] [Ulysses-Offload: Democratizing Long Context LLM Training ](https://github.com/deepspeedai/DeepSpeed/blob/master/blogs/ulysses-offload/README.md)
2020
* [2024/12] [DeepSpeed-Domino: Communication-Free LLM Training Engine](https://github.com/deepspeedai/DeepSpeed/blob/master/blogs/deepspeed-domino/README.md)
2121
* [2024/08] [DeepSpeed on Windows](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/windows/08-2024/README.md) [[日本語](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/windows/08-2024/japanese/README.md)] [[中文](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/windows/08-2024/chinese/README.md)]

blogs/deepspeed-gds/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ We used three benchmarking tools for our evaluations. The first is fio, the popu
4747

4848
## High-Performance I/O with CPU Buffers via NVMe Scaling
4949

50-
Our first set of microbenchmark evaluations used fio and ds\_io to measure the performance of transferring 1GB data between NVMe and CPU memory. We configure fio to use the libaio backend for these experiments1. The results are summarized in Figure 1, from which we make two observations. First, DeepNVMe demonstrates high performance as it roughly matches fio, despite being more representative of DL applications. Second, DeepNVMe scales I/O performance almost linearly with available NVMe bandwidth, achieving rates of 10GB/sec reads and 5GB/sec writes.
50+
Our first set of microbenchmark evaluations used fio and ds\_io to measure the performance of transferring 1GB data between NVMe and CPU memory. We configure fio to use the libaio backend for these experiments. The results are summarized in Figure 1, from which we make two observations. First, DeepNVMe demonstrates high performance as it roughly matches fio, despite being more representative of DL applications. Second, DeepNVMe scales I/O performance almost linearly with available NVMe bandwidth, achieving rates of 10GB/sec reads and 5GB/sec writes.
5151

5252
<img src="./media/figure1.png" style="width:6.5in;height:3.42153in" />
5353

@@ -85,4 +85,4 @@ In this blog post, we introduced DeepNVMe, an I/O optimization technology create
8585

8686

8787
# Acknowlegements
88-
This work is the result of a deep collaboration between Microsoft and NVIDIA. The contributors include Joe Mayer, Martin Cai, and Olatunji Ruwase from Microsoft; Kiran Modukuri, Vahid Noormofidi, Sourab Gupta, and Sandeep Joshi from Nivida.
88+
This work is the result of a deep collaboration between Microsoft and NVIDIA. The contributors include Joe Mayer, Martin Cai, and Olatunji Ruwase from Microsoft; Kiran Modukuri, Vahid Noormofidi, Sourab Gupta, and Sandeep Joshi from Nvidia.

0 commit comments

Comments
 (0)