buun-stack

A remotely accessible Kubernetes home lab with OIDC authentication. Build a modern development environment with integrated data analytics and AI capabilities. Includes an open data stack for data ingestion, transformation, serving, and orchestration—built on open-source components you can run locally and port to any cloud.

📺 Remote-Accessible Kubernetes Home Lab (YouTube playlist)
📝 Building a Remote-Accessible Kubernetes Home Lab with k3s (Dev.to article)

Architecture

Foundation

k3s: Lightweight Kubernetes distribution
Just: Task runner with templated configurations
Cloudflare Tunnel: Secure internet connectivity

Core Components (Required)

PostgreSQL: Database cluster with pgvector extension
Keycloak: Identity and access management with OIDC authentication

Recommended Components

HashiCorp Vault: Centralized secrets management
- Used by most stack modules for secure credential storage
- Can be deployed without, but highly recommended
External Secrets Operator: Kubernetes secret synchronization from Vault
- Automatically syncs secrets from Vault to Kubernetes Secrets
- Provides secure secret rotation and lifecycle management

Observability (Optional)

Prometheus: Metrics collection and alerting
Grafana: Metrics visualization and dashboards
Goldilocks: Resource recommendation dashboard powered by VPA

Storage (Optional)

Longhorn: Distributed block storage
MinIO: S3-compatible object storage

GPU Support (Optional)

NVIDIA Device Plugin: NVIDIA GPU support for Kubernetes

Data & Analytics (Optional)

JupyterHub: Interactive computing with collaborative notebooks
Trino: Distributed SQL query engine for querying multiple data sources
Querybook: Big data querying UI with notebook interface
ClickHouse: High-performance columnar analytics database
Qdrant: Vector database for AI/ML applications
FalkorDB: Graph database with vector similarity search for knowledge graphs
Lakekeeper: Apache Iceberg REST Catalog for data lake management
Apache Superset: BI platform with rich chart types and high customizability
Metabase: Lightweight BI with simple configuration and clean, modern interface
DataHub: Data catalog and metadata management

Machine Learning (Optional)

MLflow: Machine learning lifecycle management with experiment tracking and model registry
KServe: Model serving platform for deploying ML models on Kubernetes

LLM & AI Applications (Optional)

Langfuse: LLM observability and analytics platform for tracking and debugging AI applications

Orchestration (Optional)

Dagster: Modern data orchestration platform
Apache Airflow: Workflow orchestration and task scheduling

Security & Compliance (Optional)

OAuth2 Proxy: Authentication proxy for adding Keycloak authentication
Fairwinds Polaris: Kubernetes configuration validation and security auditing

Quick Start

For detailed step-by-step instructions, see the Installation Guide.

Clone and configure

git clone https://github.com/buun-ch/buun-stack
cd buun-stack
mise install
just env::setup

Deploy cluster and services

just k8s::install
just longhorn::install
just vault::install
just postgres::install
just keycloak::install

Configure authentication

just keycloak::create-realm
just vault::setup-oidc-auth
just keycloak::create-user
just k8s::setup-oidc-auth

Component Details

k3s

Lightweight Kubernetes distribution optimized for edge computing:

Resource Efficient: Runs on resource-constrained environments
Production Ready: Full Kubernetes functionality with minimal overhead
Easy Deployment: Single binary installation with built-in ingress

Longhorn

Enterprise-grade distributed storage system:

Highly Available: Block storage with no single point of failure
Backup & Recovery: Built-in disaster recovery capabilities
NFS Support: Persistent volumes with NFS compatibility

HashiCorp Vault

Centralized secrets management:

Secure Storage: Encrypted secret storage with access control
Dynamic Secrets: Automatic credential generation and rotation
External Secrets Integration: Syncs with Kubernetes via External Secrets Operator

Keycloak

Open-source identity and access management:

Single Sign-On: OIDC/OAuth2 authentication across all services
User Federation: Identity brokering and external provider integration
Group-Based Access: Role and permission management

PostgreSQL

Production-ready relational database:

High Availability: Clustered deployment with CloudNativePG
pgvector Extension: Vector similarity search for AI/ML workloads
Multi-Tenant: Shared database for Keycloak and applications

Prometheus and Grafana

Comprehensive monitoring and observability stack:

Metrics Collection: Prometheus server with Prometheus Operator
Visualization: Grafana with customizable dashboards
Alerting: Alertmanager for alert routing and management
Namespace-Based Control: Explicit monitoring via labels
OIDC Integration: Optional Keycloak authentication for Grafana

📖 See Prometheus Documentation

External Secrets Operator

Kubernetes operator for secret synchronization:

Vault Integration: Automatically syncs secrets from Vault to Kubernetes
Multiple Backends: Supports various secret management systems
Secure Rotation: Automatic secret lifecycle management

MinIO

S3-compatible object storage:

S3 API: Drop-in replacement for AWS S3
High Performance: Distributed object storage with erasure coding
Multi-Tenancy: Isolated storage buckets per application

JupyterHub

Multi-user platform for interactive computing:

Keycloak Authentication: OAuth2 integration with SSO
Persistent Storage: User notebooks stored in Longhorn volumes
Collaborative: Shared computing environment for teams
GPU Support: CUDA-enabled notebooks with nvidia-device-plugin integration

📖 See JupyterHub Documentation

MLflow

Machine learning lifecycle management platform:

Experiment Tracking: Log parameters, metrics, and artifacts for ML experiments
Model Registry: Version and manage ML models with deployment lifecycle
Keycloak Authentication: OAuth2 integration with group-based access control

📖 See MLflow Documentation

KServe

Model serving platform for deploying ML models on Kubernetes:

Multi-Framework Support: TensorFlow, PyTorch, scikit-learn, XGBoost, MLflow, and more
MLflow Integration: Deploy models directly from MLflow Model Registry
Inference Protocols: REST and gRPC with v2 Open Inference Protocol
RawDeployment Mode: Uses native Kubernetes Deployments without Knative dependency

📖 See KServe Documentation

Langfuse

LLM observability and analytics platform:

Trace Tracking: Monitor LLM calls, chains, and agent executions with detailed traces
Prompt Management: Version and test prompts with playground interface
Analytics: Track costs, latency, and token usage across all LLM applications
Keycloak Authentication: OAuth2 integration with automatic user provisioning

📖 See Langfuse Documentation

Apache Superset

Modern business intelligence platform:

Rich Visualizations: 40+ chart types including mixed charts, treemaps, and heatmaps
SQL Lab: Powerful editor for complex queries and dataset creation
Keycloak & Trino: OAuth2 authentication and Iceberg data lake integration

📖 See Superset Documentation

Metabase

Lightweight business intelligence:

Simple Setup: Quick configuration with clean, modern UI
Multiple Databases: Connect to PostgreSQL, Trino, and more
Keycloak Authentication: OAuth2 integration for user management

📖 See Metabase Documentation

Querybook

Big data querying UI with notebook interface:

Trino Integration: SQL queries against multiple data sources with user impersonation
Notebook Interface: Shareable datadocs with queries and visualizations
Real-time Execution: WebSocket-based query progress updates

📖 See Querybook Documentation

Trino

Fast distributed SQL query engine:

Multi-Source Queries: Query PostgreSQL, Iceberg, and other sources in single query
Keycloak Authentication: OAuth2 for Web UI, password auth for JDBC clients
Sample Data: TPCH catalog with benchmark data for testing

📖 See Trino Documentation

DataHub

Modern data catalog and metadata management:

OIDC Integration: Keycloak authentication for unified access
Metadata Discovery: Search and browse data assets across platforms
Lineage Tracking: Visualize data flow and dependencies

📖 See DataHub Documentation

ClickHouse

High-performance columnar OLAP database:

Fast Analytics: Optimized for analytical queries on large datasets
Compression: Efficient storage with columnar format
Real-time Ingestion: Stream data from Kafka and other sources

📖 See ClickHouse Documentation

Qdrant

High-performance vector database:

Similarity Search: Fast vector search for AI/ML applications
Rich Filtering: Combine vector search with structured filters
Scalable: Distributed deployment for large-scale embeddings

📖 See Qdrant Documentation

FalkorDB

High-performance graph database with vector capabilities:

Knowledge Graphs: Build and query complex relationship networks with OpenCypher
Vector Search: Native vector similarity for GraphRAG applications
Redis Compatible: Uses Redis protocol for easy integration

📖 See FalkorDB Documentation

Lakekeeper

Apache Iceberg REST Catalog:

OIDC Authentication: Keycloak integration for secure access
Table Management: Manages Iceberg tables with ACID transactions
Multi-Engine: Compatible with Trino, Spark, and other query engines

📖 See Lakekeeper Documentation

Apache Airflow

Workflow orchestration platform:

DAG-Based: Define data pipelines as code with Python
JupyterHub Integration: Develop and test workflows in notebooks
Keycloak Authentication: OAuth2 for user management

📖 See Airflow Documentation

Dagster

Modern data orchestration platform:

Asset-Centric: Define data assets and their dependencies
Integrated Development: Built-in UI for development and monitoring
Testing & Validation: Data quality checks and pipeline testing

📖 See Dagster Documentation

Fairwinds Polaris

Kubernetes configuration validation and best practices auditing:

Security Checks: Validates security configurations against best practices
Efficiency Analysis: Identifies missing resource requests and limits
Real-time Auditing: Continuous cluster configuration scanning
Dashboard Interface: Visual reporting of issues by severity

📖 See Fairwinds Polaris Documentation

Goldilocks

Resource recommendation dashboard for right-sizing workloads:

VPA Integration: Powered by Vertical Pod Autoscaler for metrics-based recommendations
Visual Dashboard: User-friendly interface for viewing resource recommendations
QoS Guidance: Recommendations for Guaranteed, Burstable, and BestEffort classes
Monitoring-Only Mode: Observes workloads without automatic scaling
Namespace-Based: Enable monitoring per namespace with labels

📖 See Goldilocks Documentation

📖 See VPA Documentation

Common Operations

User Management

Create additional users:

just keycloak::create-user

Add user to group:

just keycloak::add-user-to-group <username> <group>

Database Management

Create database:

just postgres::create-db <dbname>

Create database user:

just postgres::create-user <username>

Grant privileges:

just postgres::grant <dbname> <username>

Secret Management

Store secrets in Vault:

just vault::put <path> <key>=<value>

Retrieve secrets:

just vault::get <path> <field>

Security & Authentication

OAuth2 Proxy Integration

For applications that don't natively support Keycloak/OIDC authentication, buun-stack provides OAuth2 Proxy integration to add Keycloak authentication to any application:

Universal Authentication: Add Keycloak SSO to any web application
Automatic Setup: Configures Keycloak client, secrets, and proxy deployment
Security: Prevents unauthorized access by routing all traffic through authentication
Easy Management: Simple recipes for setup and removal

Setup OAuth2 authentication for any application:

# For CH-UI (included in installation prompt)
just ch-ui::setup-oauth2-proxy

# For any custom application
just oauth2-proxy::setup-for-app <app-name> <app-host> [namespace] [upstream-service]

Remove OAuth2 authentication:

just ch-ui::remove-oauth2-proxy
just oauth2-proxy::remove-for-app <app-name> [namespace]

The OAuth2 Proxy automatically:

Creates a Keycloak client with proper audience mapping
Generates secure secrets and stores them in Vault
Deploys proxy with Traefik ingress routing
Disables direct application access to ensure security

Remote Access

Once configured, you can access your cluster from anywhere:

# SSH access
ssh ssh.yourdomain.com

# Kubernetes API
kubectl --context yourpc-oidc get nodes

# Web interfaces
# Vault: https://vault.yourdomain.com
# Keycloak: https://auth.yourdomain.com
# Grafana: https://grafana.yourdomain.com
# Trino: https://trino.yourdomain.com
# Querybook: https://querybook.yourdomain.com
# Superset: https://superset.yourdomain.com
# Metabase: https://metabase.yourdomain.com
# Airflow: https://airflow.yourdomain.com
# JupyterHub: https://jupyter.yourdomain.com
# MLflow: https://mlflow.yourdomain.com
# Langfuse: https://langfuse.yourdomain.com

Customization

Adding Custom Recipes

You can extend buun-stack with your own Just recipes and services:

Copy the example files:

cp custom-example.just custom.just
cp -r custom-example custom

Use the custom recipes:

# Install reddit-rss
just custom::reddit-rss::install

# Install Miniflux feed reader
just custom::miniflux::install

Create your own recipes:

Add new modules to the custom/ directory following the same pattern as the examples. Each module should have its own justfile with install, uninstall, and other relevant recipes.

The custom.just file is automatically imported by the main Justfile if it exists, allowing you to maintain your custom workflows separately from the core stack.

Demo Projects

The following demo projects showcase end-to-end data workflows using buun-stack:

ML Model Serving with MLflow and KServe

examples/kserve-mlflow-iris

End-to-end machine learning workflow demonstrating JupyterHub, MLflow, and KServe integration:

JupyterHub for model training and testing
MLflow for experiment tracking and model registry
KServe for model deployment and inference

Key technologies: MLflow, KServe, MinIO, JupyterHub

Salesforce to Iceberg REST Catalog

dlt-salesforce-iceberg-rest-demo

Demonstrates Salesforce data ingestion into an Iceberg data lake:

dlt extracts data from Salesforce API (Account, Contact, Opportunity, etc.)
- Custom Iceberg destination loads data into Lakekeeper REST Catalog
- Automatic schema conversion from dlt to Iceberg with PyArrow
Orchestration with Dagster or Apache Airflow

Key technologies: dlt, Iceberg, Lakekeeper, MinIO

E-commerce Lakehouse Analytics

payload-ecommerce-lakehouse-demo

Full-stack e-commerce application with integrated lakehouse analytics:

Next.js + Payload CMS for e-commerce application
dlt ingests data incrementally from Payload API to Iceberg
dbt transforms raw data into analytics-ready star schema
Trino queries across all data layers (raw, staging, marts)
Superset/Metabase for dashboards and business intelligence

Key technologies: Next.js, Payload CMS, dlt, dbt, Iceberg, Lakekeeper, Trino, Superset, Metabase

Both projects demonstrate the medallion architecture (raw → staging → marts) and showcase how buun-stack components work together for production data workflows.

Documentation

Troubleshooting

Having issues? Check the Troubleshooting Guide for solutions to common problems:

Resource Management

See Resource Management Guide for configuring CPU and memory:

QoS classes (Guaranteed vs Burstable)
Using Goldilocks for recommendations
Best practices and examples

License

MIT License - See LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
airflow		airflow
cert-manager		cert-manager
ch-ui		ch-ui
charts		charts
clickhouse		clickhouse
custom-example		custom-example
dagster		dagster
datahub		datahub
debug		debug
docs		docs
env		env
examples/kserve-mlflow-iris		examples/kserve-mlflow-iris
external-secrets		external-secrets
fairwinds-polaris		fairwinds-polaris
falkordb		falkordb
goldilocks		goldilocks
jupyterhub		jupyterhub
k8s		k8s
keycloak		keycloak
kserve		kserve
lakekeeper		lakekeeper
langfuse		langfuse
longhorn		longhorn
metabase		metabase
minio		minio
mlflow		mlflow
nvidia-device-plugin		nvidia-device-plugin
oauth2-proxy		oauth2-proxy
postgres		postgres
prometheus		prometheus
python-package		python-package
qdrant		qdrant
querybook		querybook
security		security
superset		superset
trino		trino
utils		utils
vault		vault
vpa		vpa
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.prettierrc.js		.prettierrc.js
CLAUDE.md		CLAUDE.md
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
custom-example.just		custom-example.just
justfile		justfile
mise.toml		mise.toml
package-lock.json		package-lock.json
package.json		package.json

License

buun-ch/buun-stack

Folders and files

Latest commit

History

Repository files navigation

buun-stack

Architecture

Foundation

Core Components (Required)

Recommended Components

Observability (Optional)

Storage (Optional)

GPU Support (Optional)

Data & Analytics (Optional)

Machine Learning (Optional)

LLM & AI Applications (Optional)

Orchestration (Optional)

Security & Compliance (Optional)

Quick Start

Component Details

k3s

Longhorn

HashiCorp Vault

Keycloak

PostgreSQL

Prometheus and Grafana

External Secrets Operator

MinIO

JupyterHub

MLflow

KServe

Langfuse

Apache Superset

Metabase

Querybook

Trino

DataHub

ClickHouse

Qdrant

FalkorDB

Lakekeeper

Apache Airflow

Dagster

Fairwinds Polaris

Goldilocks

Common Operations

User Management

Database Management

Secret Management

Security & Authentication

OAuth2 Proxy Integration

Remote Access

Customization

Adding Custom Recipes

Demo Projects

ML Model Serving with MLflow and KServe

Salesforce to Iceberg REST Catalog

E-commerce Lakehouse Analytics

Documentation

Troubleshooting

Resource Management

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages