Update slurm comparison

eagleonhill · eagleonhill · commit ecb3b9e29cb2 · 2025-09-28T14:58:48.000-07:00
diff --git a/.vitepress/theme/ComparisonLayout.vue b/.vitepress/theme/ComparisonLayout.vue
@@ -11,7 +11,7 @@
             Begin your Velda journey today
         </h2>
         <a
-            href="/get-started"
+            href="/comparison"
             class="get-started-btn"
         >
             Get Started
diff --git a/slurm-alternative.md b/slurm-alternative.md
@@ -1,17 +1,17 @@
 ---
 layout: page
 sidebar: false
-title: "Velda vs Slurm: Modern Cloud Development Platform"
-description: "Compare Velda's cloud-native development platform with traditional Slurm HPC systems. Discover Kubernetes-based architecture, better resource utilization, and superior developer experience."
-keywords: ["Velda vs Slurm", "HPC alternative", "cloud development", "Kubernetes pods", "workload isolation", "container deployment", "modern HPC", "Slurm replacement"]
+title: "Velda: The Modern Slurm Alternative for High Performance computing"
+description: "Compare Velda's cloud-native development platform with traditional Slurm HPC systems. Discover cloud-native architecture, better resource utilization, and superior developer experience."
+keywords: ["Slurm Alternative", "HPC Cluster", "cloud HPC development", "Slurm VSCode", "Slurm GPU training", "Slurm replacement"]
 image: "https://velda.io/og-preview.png"
 ---
 <ComparisonDoc>
 <template #comparison-banner>
 
 <ComparisonBanner 
   verdict-title="Finally, a modern HPC cluster management"
-  verdict-description="Velda simplifies development and job management with its user-friendly CLIs. It also supports flexible scaling with minimal infrastructure cost and effort. In contrast, Slurm requires a rigid environment and static cluster, which means more effort to manage the platform and more restrictions on the workload. Designed for teams who want more productivity, Velda helps developers easily develop and manage their workloads from the start."
+  verdict-description="Velda simplifies development and job management with its user-friendly command-line interfaces (CLIs). It supports flexible scaling with minimal infrastructure cost and effort. In contrast, Slurm requires a rigid environment and a typically static cluster, which means more effort to manage the platform and more restrictions on the workload. Designed for teams that demand higher productivity, Velda empowers developers to manage their workloads easily and cost-effectively from the start."
   :cta-buttons="[
     { text: 'Get Started', link: 'https://velda.io/comparison', variant: 'primary' },
     { text: 'Book Demo', link: 'https://calendly.com/velda-io/30min', variant: 'secondary' }
@@ -29,51 +29,52 @@ image: "https://velda.io/og-preview.png"
 |--------------------|--------------------|--------------------|
 | **Containerized Execution**<br>Each task runs in a container that is isolated with other workloads on the same node, and workload cannot access tasks of other users| ❌ | ✅ |
 | **Environment customization**<br>Every workload can run in fully customizable environment, including some system packages(apt, pip, etc.) | ❌ | ✅ |
-| **Email**<br>Get notified when your job is completed | ❌ | ✅ |
-| **Cluster autoscale**<br>Allocate compute resources from Cloud / Kubernetes based on demand | ❌ | ✅ |
-| **Interactive Development**<br>Developers have access to dedicated dev-environments, with full capability like IDEs and docker access | ❌ | ✅ |
-| **Service deployment**<br>Run service directly in the cluster in one command, and access them with multiple options like DNS names, port-forwarding or load-balancer | ❌ | ✅ |
+| **Cluster Management**<br>Allocate compute resources from Cloud / Kubernetes based on demand, share resource with other Kubernetes on the same Node | ❌ | ✅ |
+| **Interactive Development**<br>Developers have access to dev-environments that can be used as the [primary dev solution](/blog/why-stuck-inefficient-gpu-setup), with full capability like IDEs and docker access, or VS-Code access from browser.| ❌ | ✅ |
+| **Service development**<br>Run service directly in the cluster in one command, and access them with multiple options like DNS names, port-forwarding or load-balancer. [vrun is all you need](/blog/vrun-is-all-you-need) | ❌ | ✅ |
 | **SSO integration**<br>Use your SSO provider to manage user access / login | ❌ | ✅ |
 | **RBAC**<br>Role based access control: Limit who can see other's job/data | ❌ | ✅ |
-| **API access**<br>Run jobs, manage workloads, analyze task results, through powerful and secure rest API | ❌ | ✅ |
 | **Web access**<br>From browser, directly start development in VS Code Web, or view task logs | Third-party | ✅ |
 
 </template>
 
-# Velda vs. Slurm
 # What's right for me?
+Both Velda and Slurm are options for managing HPC clusters, and they provide a similar command-line interface to run jobs. Users who are already familiar with Slurm will find it easy to get started with Velda, while Velda offers a more modern, cloud-native approach.
 
-Both Velda and Slurm serve as options for managing HPC clusters. They provide similar command-line interface to run jobs,
-assuming all nodes have a consistent environment setup.
+## Resource Management and Deployment Model
+**Slurm’s Traditional Node-Based System**: Slurm operates on a legacy HPC model where entire compute nodes are allocated to jobs, regardless of their actual resource requirements. Slurm is designed for traditional MPI applications. To ensure optimal performance, Slurm clusters must be carefully configured with partition systems, quality-of-service (QoS) configurations, and resource limit hierarchies. This often creates significant inefficiencies with modern, heterogeneous workloads. Users must understand these complex configurations to maximize cluster capacity and efficiency.
 
-## Architecture and Deployment Model
+**Velda’s Cloud-Native Architecture**: Velda leverages multiple backend options to allocate compute resources: statically managed nodes in a traditional data center, virtual machines from a cloud platform, pods from Kubernetes, or resources provisioned through custom scripts. Velda supports a variety of instance configurations and can dynamically provision compute based on workload demands. This enables precise resource allocation and efficient multi-tenancy. The platform abstracts away infrastructure complexity while providing enterprise-grade security, automatic scaling, and comprehensive observability.
 
-**Slurm's Traditional Node-Based System:**
-Slurm operates on a legacy HPC model where entire compute nodes are allocated to jobs, regardless of actual resource requirements. This approach, designed for traditional MPI applications, creates significant inefficiencies in modern heterogeneous workloads. Users must understand complex partition systems, quality-of-service (QoS) configurations, and resource limit hierarchies.
+## Development Experience
+Slurm was built for running jobs, not for daily development. Slurm does not provide a native way to allocate development instances. Developers often rely on error-prone workarounds, such as submitting "sleep" jobs and using direct SSH access to the allocated node. Alternatively, they may use a shared login node, risking system instability. Debugging running jobs is challenging, and running services is poorly supported. These systems are not easy for new users to adopt and often require manual steps to sync code and builds from other systems to the Slurm cluster.
 
-**Velda's Cloud-Native Architecture:**
-Velda leverages multiple backend options to allocate compute resources: VMs from your cloud provider, Kubernetes to deploy workloads as pods, or through custom scripts like SSH. 
-This enables precise resource allocation and efficient multi-tenancy.
-The platform abstracts away infrastructure complexity while providing
-enterprise-grade security, automatic scaling, and comprehensive observability.
+Velda puts the developer experience first. It offers the features of a modern Cloud Development Environment (CDE), allowing for the instant onboarding of new users by cloning instances from templates. It also manages resources for development instances with features like auto-shutdown, all accessible through a rich user interface. Developers access the cluster through dynamically provisioned personal environments. They can connect via SSH or any modern IDE (VSCode, Cursor, PyCharm, etc.), or access VSCode with one click from a browser. When running workloads, it can automatically forward ports for debugging or make services accessible via a developer-chosen DNS name. This makes accessing the cluster feel just like working on a local machine.
 
-## Security and Isolation
+Velda's superior developer experience can [eliminate the need for separate development GPUs or compute-heavy instances](/blog/why-stuck-inefficient-gpu-setup), thereby improving overall resource utilization.
 
-Slurm provides basic process isolation through Linux user accounts and relies on shared filesystems for data access. This approach creates potential security vulnerabilities and makes it difficult to ensure complete workload separation, especially in multi-tenant research environments or when workloads need different system dependencies.
+## Job Scheduler & Workload Management
+Slurm has powerful workload management features, thanks to its long-standing industry adoption and recognition. Velda, on the other hand, is an emerging product. It has support for common features for AI/ML training, serving, and batch jobs, but may currently lack support for some long-tail use cases that Slurm provides, such as gang scheduling with suspend/resume capabilities. These features will be added as usage grows.
 
-Velda implements per-user-per-instance containers. Each user's workload runs in a completely isolated environment with
-guaranteed resource limits and secure communication channels, ensuring consistency across all jobs initiated by that instance.
-This architecture allows fully customizable packages to be installed, while enabling safe multi-tenancy even for sensitive computational workloads.
+### Built for Model Training & AI Workloads
+Velda is built for the full lifecycle of machine learning and natively supports many scheduling features, such as:
 
-## Developer Experience and Workflow Integration
+* Advanced GPU scheduling options, E.g
+  * Gang scheduling, start all workload when all resource are ready.
+  * Network-topology-aware cloud instance allocation, extending your resource allocation backend with specific topology options (e.g., EC2 Placement Groups in AWS, Compact Placement in Google Cloud, or custom config in KQueue).
+* [Define workflows that involve heterogeneous steps](/blog/build-machine-learning-workflow) (e.g., data processing, training, evaluation, and model deployment).
+* Platform-agnostic, support PyTorch, Jax, Ray, and other frameworks.
+* Start model serving and serving development in one command, accessible via custom DNS names. See [vrun is all you need](/blog/vrun-is-all-you-need) .
+* Preemption & multi-cloud support is coming soon.
 
-Traditional HPC systems require significant domain expertise, with users needing to understand batch schedulers, module systems, and job script creation. Debugging running jobs is challenging, and interactive development workflows are poorly supported.
+## Workload Isolation
+Slurm provides basic process isolation through Linux user accounts and relies on shared filesystems for data access. This approach creates potential security vulnerabilities and makes it difficult to ensure complete workload separation, especially in multi-tenant environments or when workloads require different system dependencies. Users often get direct SSH access to compute nodes, even those not allocated to their workloads.
 
-Velda prioritizes developer experience with intuitive command-line interfaces that feel like local development. Full support for modern IDEs, real-time debugging, and seamless port forwarding enables productive development workflows. Teams can use their existing tools and practices without HPC-specific training.
+Velda implements per-user, per-instance containers. Each user’s workload runs in a completely isolated environment, which addresses the challenge of secure multi-tenancy. This architecture allows fully customizable packages to be installed while enabling safe multi-tenancy for sensitive computational workloads. When developers need to access a compute node (e.g., for debugging), they can only access the container for their specific workload and are unable to interfere with other workloads on the same node.
 
 ## Enterprise Features and Compliance
-Modern organizations require robust access controls, audit logging, and compliance capabilities. Slurm's basic accounting and limited RBAC features may not meet enterprise security requirements.
+Modern organizations require robust access controls, audit logging, and compliance capabilities. Slurm’s basic accounting and limited Role-Based Access Control (RBAC) features may not meet enterprise security requirements and often rely heavily on third-party tools or customizations, such as syncing SSO with an LDAP directory.
 
-Velda provides comprehensive enterprise features including SSO integration, detailed audit trails, usage reporting, RBAC policies. Integration with existing security infrastructure ensures seamless adoption in regulated environments.
+Velda provides comprehensive enterprise features, including SSO / SAML integration, usage reporting, and RBAC policies. It also includes features for optimizing resource usage and cost, supporting the efficient management of workloads. Integration with existing security infrastructure ensures seamless adoption in regulated environments.
 
 </ComparisonDoc>

Original file line number	Diff line number	Diff line change
`@@ -11,7 +11,7 @@`
`11`	`11`	`Begin your Velda journey today`
`12`	`12`	`</h2>`
`13`	`13`	`<a`
`14`		`- href="/get-started"`
	`14`	`+ href="/comparison"`
`15`	`15`	`class="get-started-btn"`
`16`	`16`	`>`
`17`	`17`	`Get Started`