A remotely accessible Kubernetes home lab with OIDC authentication. Build a modern development environment with integrated data analytics and AI capabilities. Includes an open data stack for data ingestion, transformation, serving, and orchestrationβbuilt on open-source components you can run locally and port to any cloud.
- πΊ Remote-Accessible Kubernetes Home Lab (YouTube playlist)
- π Building a Remote-Accessible Kubernetes Home Lab with k3s (Dev.to article)
- k3s: Lightweight Kubernetes distribution
- Just: Task runner with templated configurations
- Cloudflare Tunnel: Secure internet connectivity
- PostgreSQL: Database cluster with pgvector extension
- Keycloak: Identity and access management with OIDC authentication
- HashiCorp Vault: Centralized secrets management
- Used by most stack modules for secure credential storage
- Can be deployed without, but highly recommended
- External Secrets Operator: Kubernetes secret synchronization from Vault
- Automatically syncs secrets from Vault to Kubernetes Secrets
- Provides secure secret rotation and lifecycle management
- Prometheus: Metrics collection and alerting
- Grafana: Metrics visualization and dashboards
- Goldilocks: Resource recommendation dashboard powered by VPA
- NVIDIA Device Plugin: NVIDIA GPU support for Kubernetes
- JupyterHub: Interactive computing with collaborative notebooks
- Trino: Distributed SQL query engine for querying multiple data sources
- Querybook: Big data querying UI with notebook interface
- ClickHouse: High-performance columnar analytics database
- Qdrant: Vector database for AI/ML applications
- FalkorDB: Graph database with vector similarity search for knowledge graphs
- Lakekeeper: Apache Iceberg REST Catalog for data lake management
- Apache Superset: BI platform with rich chart types and high customizability
- Metabase: Lightweight BI with simple configuration and clean, modern interface
- DataHub: Data catalog and metadata management
- MLflow: Machine learning lifecycle management with experiment tracking and model registry
- KServe: Model serving platform for deploying ML models on Kubernetes
- Langfuse: LLM observability and analytics platform for tracking and debugging AI applications
- Dagster: Modern data orchestration platform
- Apache Airflow: Workflow orchestration and task scheduling
- OAuth2 Proxy: Authentication proxy for adding Keycloak authentication
- Fairwinds Polaris: Kubernetes configuration validation and security auditing
For detailed step-by-step instructions, see the Installation Guide.
-
Clone and configure
git clone https://github.com/buun-ch/buun-stack cd buun-stack mise install just env::setup -
Deploy cluster and services
just k8s::install just longhorn::install just vault::install just postgres::install just keycloak::install
-
Configure authentication
just keycloak::create-realm just vault::setup-oidc-auth just keycloak::create-user just k8s::setup-oidc-auth
Lightweight Kubernetes distribution optimized for edge computing:
- Resource Efficient: Runs on resource-constrained environments
- Production Ready: Full Kubernetes functionality with minimal overhead
- Easy Deployment: Single binary installation with built-in ingress
Enterprise-grade distributed storage system:
- Highly Available: Block storage with no single point of failure
- Backup & Recovery: Built-in disaster recovery capabilities
- NFS Support: Persistent volumes with NFS compatibility
Centralized secrets management:
- Secure Storage: Encrypted secret storage with access control
- Dynamic Secrets: Automatic credential generation and rotation
- External Secrets Integration: Syncs with Kubernetes via External Secrets Operator
Open-source identity and access management:
- Single Sign-On: OIDC/OAuth2 authentication across all services
- User Federation: Identity brokering and external provider integration
- Group-Based Access: Role and permission management
Production-ready relational database:
- High Availability: Clustered deployment with CloudNativePG
- pgvector Extension: Vector similarity search for AI/ML workloads
- Multi-Tenant: Shared database for Keycloak and applications
Comprehensive monitoring and observability stack:
- Metrics Collection: Prometheus server with Prometheus Operator
- Visualization: Grafana with customizable dashboards
- Alerting: Alertmanager for alert routing and management
- Namespace-Based Control: Explicit monitoring via labels
- OIDC Integration: Optional Keycloak authentication for Grafana
π See Prometheus Documentation
Kubernetes operator for secret synchronization:
- Vault Integration: Automatically syncs secrets from Vault to Kubernetes
- Multiple Backends: Supports various secret management systems
- Secure Rotation: Automatic secret lifecycle management
S3-compatible object storage:
- S3 API: Drop-in replacement for AWS S3
- High Performance: Distributed object storage with erasure coding
- Multi-Tenancy: Isolated storage buckets per application
Multi-user platform for interactive computing:
- Keycloak Authentication: OAuth2 integration with SSO
- Persistent Storage: User notebooks stored in Longhorn volumes
- Collaborative: Shared computing environment for teams
- GPU Support: CUDA-enabled notebooks with nvidia-device-plugin integration
π See JupyterHub Documentation
Machine learning lifecycle management platform:
- Experiment Tracking: Log parameters, metrics, and artifacts for ML experiments
- Model Registry: Version and manage ML models with deployment lifecycle
- Keycloak Authentication: OAuth2 integration with group-based access control
Model serving platform for deploying ML models on Kubernetes:
- Multi-Framework Support: TensorFlow, PyTorch, scikit-learn, XGBoost, MLflow, and more
- MLflow Integration: Deploy models directly from MLflow Model Registry
- Inference Protocols: REST and gRPC with v2 Open Inference Protocol
- RawDeployment Mode: Uses native Kubernetes Deployments without Knative dependency
LLM observability and analytics platform:
- Trace Tracking: Monitor LLM calls, chains, and agent executions with detailed traces
- Prompt Management: Version and test prompts with playground interface
- Analytics: Track costs, latency, and token usage across all LLM applications
- Keycloak Authentication: OAuth2 integration with automatic user provisioning
π See Langfuse Documentation
Modern business intelligence platform:
- Rich Visualizations: 40+ chart types including mixed charts, treemaps, and heatmaps
- SQL Lab: Powerful editor for complex queries and dataset creation
- Keycloak & Trino: OAuth2 authentication and Iceberg data lake integration
π See Superset Documentation
Lightweight business intelligence:
- Simple Setup: Quick configuration with clean, modern UI
- Multiple Databases: Connect to PostgreSQL, Trino, and more
- Keycloak Authentication: OAuth2 integration for user management
π See Metabase Documentation
Big data querying UI with notebook interface:
- Trino Integration: SQL queries against multiple data sources with user impersonation
- Notebook Interface: Shareable datadocs with queries and visualizations
- Real-time Execution: WebSocket-based query progress updates
π See Querybook Documentation
Fast distributed SQL query engine:
- Multi-Source Queries: Query PostgreSQL, Iceberg, and other sources in single query
- Keycloak Authentication: OAuth2 for Web UI, password auth for JDBC clients
- Sample Data: TPCH catalog with benchmark data for testing
Modern data catalog and metadata management:
- OIDC Integration: Keycloak authentication for unified access
- Metadata Discovery: Search and browse data assets across platforms
- Lineage Tracking: Visualize data flow and dependencies
π See DataHub Documentation
High-performance columnar OLAP database:
- Fast Analytics: Optimized for analytical queries on large datasets
- Compression: Efficient storage with columnar format
- Real-time Ingestion: Stream data from Kafka and other sources
π See ClickHouse Documentation
High-performance vector database:
- Similarity Search: Fast vector search for AI/ML applications
- Rich Filtering: Combine vector search with structured filters
- Scalable: Distributed deployment for large-scale embeddings
High-performance graph database with vector capabilities:
- Knowledge Graphs: Build and query complex relationship networks with OpenCypher
- Vector Search: Native vector similarity for GraphRAG applications
- Redis Compatible: Uses Redis protocol for easy integration
π See FalkorDB Documentation
Apache Iceberg REST Catalog:
- OIDC Authentication: Keycloak integration for secure access
- Table Management: Manages Iceberg tables with ACID transactions
- Multi-Engine: Compatible with Trino, Spark, and other query engines
π See Lakekeeper Documentation
Workflow orchestration platform:
- DAG-Based: Define data pipelines as code with Python
- JupyterHub Integration: Develop and test workflows in notebooks
- Keycloak Authentication: OAuth2 for user management
π See Airflow Documentation
Modern data orchestration platform:
- Asset-Centric: Define data assets and their dependencies
- Integrated Development: Built-in UI for development and monitoring
- Testing & Validation: Data quality checks and pipeline testing
π See Dagster Documentation
Kubernetes configuration validation and best practices auditing:
- Security Checks: Validates security configurations against best practices
- Efficiency Analysis: Identifies missing resource requests and limits
- Real-time Auditing: Continuous cluster configuration scanning
- Dashboard Interface: Visual reporting of issues by severity
π See Fairwinds Polaris Documentation
Resource recommendation dashboard for right-sizing workloads:
- VPA Integration: Powered by Vertical Pod Autoscaler for metrics-based recommendations
- Visual Dashboard: User-friendly interface for viewing resource recommendations
- QoS Guidance: Recommendations for Guaranteed, Burstable, and BestEffort classes
- Monitoring-Only Mode: Observes workloads without automatic scaling
- Namespace-Based: Enable monitoring per namespace with labels
π See Goldilocks Documentation
Create additional users:
just keycloak::create-userAdd user to group:
just keycloak::add-user-to-group <username> <group>Create database:
just postgres::create-db <dbname>Create database user:
just postgres::create-user <username>Grant privileges:
just postgres::grant <dbname> <username>Store secrets in Vault:
just vault::put <path> <key>=<value>Retrieve secrets:
just vault::get <path> <field>For applications that don't natively support Keycloak/OIDC authentication, buun-stack provides OAuth2 Proxy integration to add Keycloak authentication to any application:
- Universal Authentication: Add Keycloak SSO to any web application
- Automatic Setup: Configures Keycloak client, secrets, and proxy deployment
- Security: Prevents unauthorized access by routing all traffic through authentication
- Easy Management: Simple recipes for setup and removal
Setup OAuth2 authentication for any application:
# For CH-UI (included in installation prompt)
just ch-ui::setup-oauth2-proxy
# For any custom application
just oauth2-proxy::setup-for-app <app-name> <app-host> [namespace] [upstream-service]Remove OAuth2 authentication:
just ch-ui::remove-oauth2-proxy
just oauth2-proxy::remove-for-app <app-name> [namespace]The OAuth2 Proxy automatically:
- Creates a Keycloak client with proper audience mapping
- Generates secure secrets and stores them in Vault
- Deploys proxy with Traefik ingress routing
- Disables direct application access to ensure security
Once configured, you can access your cluster from anywhere:
# SSH access
ssh ssh.yourdomain.com
# Kubernetes API
kubectl --context yourpc-oidc get nodes
# Web interfaces
# Vault: https://vault.yourdomain.com
# Keycloak: https://auth.yourdomain.com
# Grafana: https://grafana.yourdomain.com
# Trino: https://trino.yourdomain.com
# Querybook: https://querybook.yourdomain.com
# Superset: https://superset.yourdomain.com
# Metabase: https://metabase.yourdomain.com
# Airflow: https://airflow.yourdomain.com
# JupyterHub: https://jupyter.yourdomain.com
# MLflow: https://mlflow.yourdomain.com
# Langfuse: https://langfuse.yourdomain.comYou can extend buun-stack with your own Just recipes and services:
-
Copy the example files:
cp custom-example.just custom.just cp -r custom-example custom
-
Use the custom recipes:
# Install reddit-rss just custom::reddit-rss::install # Install Miniflux feed reader just custom::miniflux::install
-
Create your own recipes:
Add new modules to the custom/ directory following the same pattern as the examples. Each module should have its own justfile with install, uninstall, and other relevant recipes.
The custom.just file is automatically imported by the main Justfile if it exists, allowing you to maintain your custom workflows separately from the core stack.
The following demo projects showcase end-to-end data workflows using buun-stack:
End-to-end machine learning workflow demonstrating JupyterHub, MLflow, and KServe integration:
- JupyterHub for model training and testing
- MLflow for experiment tracking and model registry
- KServe for model deployment and inference
Key technologies: MLflow, KServe, MinIO, JupyterHub
dlt-salesforce-iceberg-rest-demo
Demonstrates Salesforce data ingestion into an Iceberg data lake:
- dlt extracts data from Salesforce API (Account, Contact, Opportunity, etc.)
- Custom Iceberg destination loads data into Lakekeeper REST Catalog
- Automatic schema conversion from dlt to Iceberg with PyArrow
- Orchestration with Dagster or Apache Airflow
Key technologies: dlt, Iceberg, Lakekeeper, MinIO
payload-ecommerce-lakehouse-demo
Full-stack e-commerce application with integrated lakehouse analytics:
- Next.js + Payload CMS for e-commerce application
- dlt ingests data incrementally from Payload API to Iceberg
- dbt transforms raw data into analytics-ready star schema
- Trino queries across all data layers (raw, staging, marts)
- Superset/Metabase for dashboards and business intelligence
Key technologies: Next.js, Payload CMS, dlt, dbt, Iceberg, Lakekeeper, Trino, Superset, Metabase
Both projects demonstrate the medallion architecture (raw β staging β marts) and showcase how buun-stack components work together for production data workflows.
Having issues? Check the Troubleshooting Guide for solutions to common problems:
See Resource Management Guide for configuring CPU and memory:
- QoS classes (Guaranteed vs Burstable)
- Using Goldilocks for recommendations
- Best practices and examples
MIT License - See LICENSE file for details