Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/blog/posts/amd-mi300x-inference-benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,8 +217,8 @@ is the primary sponsor of this benchmark, and we are sincerely grateful for thei
If you'd like to use top-tier bare metal compute with AMD GPUs, we recommend going
with Hot Aisle. Once you gain access to a cluster, it can be easily accessed via `dstack`'s [SSH fleet](../../docs/concepts/fleets.md#ssh-fleets) easily.

### RunPod
### Runpod
If you’d like to use on-demand compute with AMD GPUs at affordable prices, you can configure `dstack` to
use [RunPod](https://runpod.io/). In
use [Runpod](https://runpod.io/). In
this case, `dstack` will be able to provision fleets automatically when you run dev environments, tasks, and
services.
14 changes: 7 additions & 7 deletions docs/blog/posts/amd-on-runpod.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
---
title: Supporting AMD accelerators on RunPod
title: Supporting AMD accelerators on Runpod
date: 2024-08-21
description: "dstack, the open-source AI container orchestration platform, adds support for AMD accelerators, with RunPod as the first supported cloud provider."
description: "dstack, the open-source AI container orchestration platform, adds support for AMD accelerators, with Runpod as the first supported cloud provider."
slug: amd-on-runpod
categories:
- Changelog
---

# Supporting AMD accelerators on RunPod
# Supporting AMD accelerators on Runpod

While `dstack` helps streamline the orchestration of containers for AI, its primary goal is to offer vendor independence
and portability, ensuring compatibility across different hardware and cloud providers.

Inspired by the recent `MI300X` benchmarks, we are pleased to announce that RunPod is the first cloud provider to offer
Inspired by the recent `MI300X` benchmarks, we are pleased to announce that Runpod is the first cloud provider to offer
AMD GPUs through `dstack`, with support for other cloud providers and on-prem servers to follow.

<!-- more -->

## Specification

For the reference, below is a comparison of the `MI300X` and `H100 SXM` specs, incl. the prices offered by RunPod.
For the reference, below is a comparison of the `MI300X` and `H100 SXM` specs, incl. the prices offered by Runpod.

| | MI300X | H100X SXM |
|---------------------------------|-------------------------------------------|--------------|
Expand Down Expand Up @@ -113,8 +113,8 @@ cloud resources and run the configuration.
1. The examples above demonstrate the use of
[TGI](https://huggingface.co/docs/text-generation-inference/en/installation_amd).
AMD accelerators can also be used with other frameworks like vLLM, Ollama, etc., and we'll be adding more examples soon.
2. RunPod is the first cloud provider where dstack supports AMD. More cloud providers will be supported soon as well.
3. Want to give RunPod and `dstack` a try? Make sure you've signed up for [RunPod](https://www.runpod.io/),
2. Runpod is the first cloud provider where dstack supports AMD. More cloud providers will be supported soon as well.
3. Want to give Runpod and `dstack` a try? Make sure you've signed up for [Runpod](https://www.runpod.io/),
then [set up](../../docs/reference/server/config.yml.md#runpod) the `dstack server`.

> Have questioned or feedback? Join our [Discord](https://discord.gg/u8SmfwPpMd)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ While `dstack` integrates with leading cloud GPU providers, we aim to expand par
sharing our vision of simplifying AI infrastructure orchestration with a lightweight, efficient alternative to Kubernetes.

This year, we’re excited to welcome our first partners: [Lambda](https://lambdalabs.com/),
[RunPod](https://www.runpod.io/),
[Runpod](https://www.runpod.io/),
[CUDO Compute](https://www.cudocompute.com/),
and [Hot Aisle](https://hotaisle.xyz/).

Expand Down Expand Up @@ -114,7 +114,7 @@ This year, we’re particularly proud of our newly added integration with AMD.
`dstack` works seamlessly with any on-prem AMD clusters. For example, you can rent such servers through our partner
[Hot Aisle](https://hotaisle.xyz/).

> Among cloud providers, [AMD](https://www.amd.com/en/products/accelerators/instinct.html) is supported only through RunPod. In Q1 2025, we plan to extend it to
> Among cloud providers, [AMD](https://www.amd.com/en/products/accelerators/instinct.html) is supported only through Runpod. In Q1 2025, we plan to extend it to
[Nscale](https://www.nscale.com/),
> [Hot Aisle](https://hotaisle.xyz/), and potentially other providers open to collaboration.

Expand Down
2 changes: 1 addition & 1 deletion docs/blog/posts/dstack-sky-own-cloud-accounts.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ To use your own cloud account, open the project settings and edit the correspond
![dstack-sky-banner.png](https://raw.githubusercontent.com/dstackai/static-assets/main/static-assets/images/dstack-sky-edit-backend-config.png){ width=650 }

You can configure your cloud accounts for any of the supported providers, including AWS, GCP, Azure, TensorDock, Lambda,
CUDO, RunPod, and Vast.ai.
CUDO, Runpod, and Vast.ai.

Additionally, you can disable certain backends if you do not plan to use them.

Expand Down
4 changes: 2 additions & 2 deletions docs/blog/posts/state-of-cloud-gpu-2025.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ These axes split providers into distinct archetypes—each with different econom
| :---- | :---- | :---- |
| **Classical hyperscalers** | General-purpose clouds with GPU SKUs bolted on | AWS, Google Cloud, Azure, OCI |
| **Massive neoclouds** | GPU-first operators built around dense HGX or MI-series clusters | CoreWeave, Lambda, Nebius, Crusoe |
| **Rapidly-catching neoclouds** | Smaller GPU-first players building out aggressively | RunPod, DataCrunch, Voltage Park, TensorWave, Hot Aisle |
| **Rapidly-catching neoclouds** | Smaller GPU-first players building out aggressively | Runpod, DataCrunch, Voltage Park, TensorWave, Hot Aisle |
| **Cloud marketplaces** | Don’t own capacity; sell orchestration + unified API over multiple backends | NVIDIA DGX Cloud (Lepton), Modal, Lightning AI, dstack Sky |
| **DC aggregators** | Aggregate idle capacity from third-party datacenters, pricing via market dynamics | Vast.ai |

Expand Down Expand Up @@ -89,7 +89,7 @@ For comparison, below is the price range for H100×GPU clusters across providers

<img src="https://dstack.ai/static-assets/static-assets/images/cloud-providers-cluster-h100.png" width="750"/>

> Most hyperscalers and neoclouds need short- or long-term contracts, though providers like RunPod, DataCrunch, and Nebius offer on-demand clusters. Larger capacity and longer commitments bring bigger discounts — Nebius offers up to 35% off for longer terms.
> Most hyperscalers and neoclouds need short- or long-term contracts, though providers like Runpod, DataCrunch, and Nebius offer on-demand clusters. Larger capacity and longer commitments bring bigger discounts — Nebius offers up to 35% off for longer terms.

## New GPU generations – why they matter

Expand Down
4 changes: 2 additions & 2 deletions docs/blog/posts/toffee.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ In a recent engineering [blog post](https://research.toffee.ai/blog/how-we-use-d

[Toffee](https://toffee.ai) builds AI-powered experiences backed by LLMs and image-generation models. To serve these workloads efficiently, they combine:

- **GPU neoclouds** such as [RunPod](https://www.runpod.io/) and [Vast.ai](https://vast.ai/) for flexible, cost-efficient GPU capacity
- **GPU neoclouds** such as [Runpod](https://www.runpod.io/) and [Vast.ai](https://vast.ai/) for flexible, cost-efficient GPU capacity
- **AWS** for core, non-AI services and backend infrastructure
- **dstack** as the orchestration layer that provisions GPU resources and exposes AI models via `dstack` [services](../../docs/concepts/services.md) and [gateways](../../docs/concepts/gateways.md)

Expand Down Expand Up @@ -68,7 +68,7 @@ Beyond oechestration, Toffee relies on `dstack`’s UI as a central observabilit

<img src="https://dstack.ai/static-assets/static-assets/images/toffee-metrics-dark.png" width="750" />

> *Thanks to dstack’s seamless integration with GPU neoclouds like RunPod and Vast.ai, we’ve been able to shift most workloads off hyperscalers — reducing our effective GPU spend by roughly 2–3× without changing a single line of model code.*
> *Thanks to dstack’s seamless integration with GPU neoclouds like Runpod and Vast.ai, we’ve been able to shift most workloads off hyperscalers — reducing our effective GPU spend by roughly 2–3× without changing a single line of model code.*
>
> *— [Nikita Shupeyko](https://www.linkedin.com/in/nikita-shupeyko/), AI/ML & Cloud Infrastructure Architect at Toffee*

Expand Down
16 changes: 8 additions & 8 deletions docs/blog/posts/volumes-on-runpod.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
---
title: Using volumes to optimize cold starts on RunPod
title: Using volumes to optimize cold starts on Runpod
date: 2024-08-13
description: "Learn how to use volumes with dstack to optimize model inference cold start times on RunPod."
description: "Learn how to use volumes with dstack to optimize model inference cold start times on Runpod."
slug: volumes-on-runpod
categories:
- Changelog
---

# Using volumes to optimize cold starts on RunPod
# Using volumes to optimize cold starts on Runpod

Deploying custom models in the cloud often faces the challenge of cold start times, including the time to provision a
new instance and download the model. This is especially relevant for services with autoscaling when new model replicas
need to be provisioned quickly.

Let's explore how `dstack` optimizes this process using volumes, with an example of
deploying a model on RunPod.
deploying a model on Runpod.

<!-- more -->

Suppose you want to deploy Llama 3.1 on RunPod as a [service](../../docs/concepts/services.md):
Suppose you want to deploy Llama 3.1 on Runpod as a [service](../../docs/concepts/services.md):

<div editor-title="examples/llms/llama31/tgi/service.dstack.yml">

Expand Down Expand Up @@ -59,9 +59,9 @@ When starting each replica, `text-generation-launcher` downloads the model to th
usually takes under a minute, but larger models may take longer. Repeated downloads can significantly affect
auto-scaling efficiency.

Great news: RunPod supports network volumes, which we can use for caching models across multiple replicas.
Great news: Runpod supports network volumes, which we can use for caching models across multiple replicas.

With `dstack`, you can create a RunPod volume using the following configuration:
With `dstack`, you can create a Runpod volume using the following configuration:

<div editor-title="examples/mist/volumes/runpod.dstack.yml">

Expand Down Expand Up @@ -130,7 +130,7 @@ resources:
In this case, `dstack` attaches the specified volume to each new replica. This ensures the model is downloaded only
once, reducing cold start time in proportion to the model size.

A notable feature of RunPod is that volumes can be attached to multiple containers simultaneously. This capability is
A notable feature of Runpod is that volumes can be attached to multiple containers simultaneously. This capability is
particularly useful for auto-scalable services or distributed tasks.

Using [volumes](../../docs/concepts/volumes.md) not only optimizes inference cold start times but also enhances the
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/concepts/backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -1132,9 +1132,9 @@ projects:

> To learn more, see the [Lambda](../../examples/clusters/lambda/#kubernetes) and [Crusoe](../../examples/clusters/crusoe/#kubernetes) examples.

### RunPod
### Runpod

Log into your [RunPod](https://www.runpod.io/console/) console, click Settings in the sidebar, expand the `API Keys` section, and click
Log into your [Runpod](https://www.runpod.io/console/) console, click Settings in the sidebar, expand the `API Keys` section, and click
the button to create a Read & Write key.

Then proceed to configuring the backend.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/concepts/snippets/manage-fleets.ext
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ If the run reuses an existing fleet instance, only the fleet's

If an instance remains `idle`, it is automatically terminated after `idle_duration`.

> Not applied for container-based backends (Kubernetes, Vast.ai, RunPod).
> Not applied for container-based backends (Kubernetes, Vast.ai, Runpod).
4 changes: 2 additions & 2 deletions docs/docs/guides/migration/slurm.md
Original file line number Diff line number Diff line change
Expand Up @@ -908,7 +908,7 @@ resources:

#### Network volumes

Network volumes are persistent cloud storage (AWS EBS, GCP persistent disks, RunPod volumes).
Network volumes are persistent cloud storage (AWS EBS, GCP persistent disks, Runpod volumes).

Single-node task:

Expand Down Expand Up @@ -936,7 +936,7 @@ resources:

</div>

Network volumes cannot be used with distributed tasks (no multi-attach support), except where multi-attach is supported (RunPod) or via volume interpolation.
Network volumes cannot be used with distributed tasks (no multi-attach support), except where multi-attach is supported (Runpod) or via volume interpolation.

For distributed tasks, use interpolation to attach different volumes to each node.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/guides/protips.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ If the run reuses an existing fleet instance, only the fleet's

If an instance remains `idle`, it is automatically terminated after `idle_duration`.

> Not applied for container-based backends (Kubernetes, Vast.ai, RunPod).
> Not applied for container-based backends (Kubernetes, Vast.ai, Runpod).

## Volumes

Expand Down
6 changes: 3 additions & 3 deletions examples/accelerators/amd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Llama 3.1 70B in FP16 using [TGI](https://huggingface.co/docs/text-generation-in
type: service
name: llama31-service-vllm-amd

# Using RunPod's ROCm Docker image
# Using Runpod's ROCm Docker image
image: runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04
# Required environment variables
env:
Expand Down Expand Up @@ -125,7 +125,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
type: task
name: trl-amd-llama31-train

# Using RunPod's ROCm Docker image
# Using Runpod's ROCm Docker image
image: runpod/pytorch:2.1.2-py3.10-rocm6.1-ubuntu22.04

# Required environment variables
Expand Down Expand Up @@ -172,7 +172,7 @@ To request multiple GPUs, specify the quantity after the GPU name, separated by
# The name is optional, if not specified, generated randomly
name: axolotl-amd-llama31-train

# Using RunPod's ROCm Docker image
# Using Runpod's ROCm Docker image
image: runpod/pytorch:2.1.2-py3.10-rocm6.0.2-ubuntu22.04
# Required environment variables
env:
Expand Down
4 changes: 2 additions & 2 deletions src/dstack/_internal/core/backends/runpod/api_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def edit_pod(
container_disk_in_gb: int,
container_registry_auth_id: str,
# Default pod volume is 20GB.
# RunPod errors if it's not specified for podEditJob.
# Runpod errors if it's not specified for podEditJob.
volume_in_gb: int = 20,
) -> str:
resp = self._make_request(
Expand Down Expand Up @@ -320,7 +320,7 @@ def _make_request(self, data: Optional[Dict[str, Any]] = None) -> Response:
)
response.raise_for_status()
response_json = response.json()
# RunPod returns 200 on client errors
# Runpod returns 200 on client errors
if "errors" in response_json:
raise RunpodApiClientError(errors=response_json["errors"])
return response
Expand Down
2 changes: 1 addition & 1 deletion src/dstack/_internal/core/backends/runpod/compute.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@

CONTAINER_REGISTRY_AUTH_CLEANUP_INTERVAL = 60 * 60 * 24 # 24 hour

# RunPod does not seem to have any limits on the disk size.
# Runpod does not seem to have any limits on the disk size.
CONFIGURABLE_DISK_SIZE = Range[Memory](min=Memory.parse("1GB"), max=None)


Expand Down
2 changes: 1 addition & 1 deletion src/dstack/_internal/core/backends/runpod/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class RunpodBackendConfig(CoreModel):
type: Literal["runpod"] = "runpod"
regions: Annotated[
Optional[List[str]],
Field(description="The list of RunPod regions. Omit to use all regions"),
Field(description="The list of Runpod regions. Omit to use all regions"),
] = None
community_cloud: Annotated[
Optional[bool],
Expand Down