Skip to content

A remotely accessible Kubernetes home lab with OIDC authentication. Build a modern development environment with integrated data analytics and AI capabilities. Includes an open data stack for data ingestion, transformation, serving, and orchestration.

License

Notifications You must be signed in to change notification settings

buun-ch/buun-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

buun-stack

A remotely accessible Kubernetes home lab with OIDC authentication. Build a modern development environment with integrated data analytics and AI capabilities. Includes an open data stack for data ingestion, transformation, serving, and orchestrationβ€”built on open-source components you can run locally and port to any cloud.

Architecture

Foundation

  • k3s: Lightweight Kubernetes distribution
  • Just: Task runner with templated configurations
  • Cloudflare Tunnel: Secure internet connectivity

Core Components (Required)

  • PostgreSQL: Database cluster with pgvector extension
  • Keycloak: Identity and access management with OIDC authentication

Recommended Components

  • HashiCorp Vault: Centralized secrets management
    • Used by most stack modules for secure credential storage
    • Can be deployed without, but highly recommended
  • External Secrets Operator: Kubernetes secret synchronization from Vault
    • Automatically syncs secrets from Vault to Kubernetes Secrets
    • Provides secure secret rotation and lifecycle management

Observability (Optional)

  • Prometheus: Metrics collection and alerting
  • Grafana: Metrics visualization and dashboards
  • Goldilocks: Resource recommendation dashboard powered by VPA

Storage (Optional)

  • Longhorn: Distributed block storage
  • MinIO: S3-compatible object storage

GPU Support (Optional)

Data & Analytics (Optional)

  • JupyterHub: Interactive computing with collaborative notebooks
  • Trino: Distributed SQL query engine for querying multiple data sources
  • Querybook: Big data querying UI with notebook interface
  • ClickHouse: High-performance columnar analytics database
  • Qdrant: Vector database for AI/ML applications
  • FalkorDB: Graph database with vector similarity search for knowledge graphs
  • Lakekeeper: Apache Iceberg REST Catalog for data lake management
  • Apache Superset: BI platform with rich chart types and high customizability
  • Metabase: Lightweight BI with simple configuration and clean, modern interface
  • DataHub: Data catalog and metadata management

Machine Learning (Optional)

  • MLflow: Machine learning lifecycle management with experiment tracking and model registry
  • KServe: Model serving platform for deploying ML models on Kubernetes

LLM & AI Applications (Optional)

  • Langfuse: LLM observability and analytics platform for tracking and debugging AI applications

Orchestration (Optional)

Security & Compliance (Optional)

  • OAuth2 Proxy: Authentication proxy for adding Keycloak authentication
  • Fairwinds Polaris: Kubernetes configuration validation and security auditing

Quick Start

For detailed step-by-step instructions, see the Installation Guide.

  1. Clone and configure

    git clone https://github.com/buun-ch/buun-stack
    cd buun-stack
    mise install
    just env::setup
  2. Deploy cluster and services

    just k8s::install
    just longhorn::install
    just vault::install
    just postgres::install
    just keycloak::install
  3. Configure authentication

    just keycloak::create-realm
    just vault::setup-oidc-auth
    just keycloak::create-user
    just k8s::setup-oidc-auth

Component Details

k3s

Lightweight Kubernetes distribution optimized for edge computing:

  • Resource Efficient: Runs on resource-constrained environments
  • Production Ready: Full Kubernetes functionality with minimal overhead
  • Easy Deployment: Single binary installation with built-in ingress

Longhorn

Enterprise-grade distributed storage system:

  • Highly Available: Block storage with no single point of failure
  • Backup & Recovery: Built-in disaster recovery capabilities
  • NFS Support: Persistent volumes with NFS compatibility

HashiCorp Vault

Centralized secrets management:

  • Secure Storage: Encrypted secret storage with access control
  • Dynamic Secrets: Automatic credential generation and rotation
  • External Secrets Integration: Syncs with Kubernetes via External Secrets Operator

Keycloak

Open-source identity and access management:

  • Single Sign-On: OIDC/OAuth2 authentication across all services
  • User Federation: Identity brokering and external provider integration
  • Group-Based Access: Role and permission management

PostgreSQL

Production-ready relational database:

  • High Availability: Clustered deployment with CloudNativePG
  • pgvector Extension: Vector similarity search for AI/ML workloads
  • Multi-Tenant: Shared database for Keycloak and applications

Prometheus and Grafana

Comprehensive monitoring and observability stack:

  • Metrics Collection: Prometheus server with Prometheus Operator
  • Visualization: Grafana with customizable dashboards
  • Alerting: Alertmanager for alert routing and management
  • Namespace-Based Control: Explicit monitoring via labels
  • OIDC Integration: Optional Keycloak authentication for Grafana

πŸ“– See Prometheus Documentation

External Secrets Operator

Kubernetes operator for secret synchronization:

  • Vault Integration: Automatically syncs secrets from Vault to Kubernetes
  • Multiple Backends: Supports various secret management systems
  • Secure Rotation: Automatic secret lifecycle management

MinIO

S3-compatible object storage:

  • S3 API: Drop-in replacement for AWS S3
  • High Performance: Distributed object storage with erasure coding
  • Multi-Tenancy: Isolated storage buckets per application

JupyterHub

Multi-user platform for interactive computing:

  • Keycloak Authentication: OAuth2 integration with SSO
  • Persistent Storage: User notebooks stored in Longhorn volumes
  • Collaborative: Shared computing environment for teams
  • GPU Support: CUDA-enabled notebooks with nvidia-device-plugin integration

πŸ“– See JupyterHub Documentation

MLflow

Machine learning lifecycle management platform:

  • Experiment Tracking: Log parameters, metrics, and artifacts for ML experiments
  • Model Registry: Version and manage ML models with deployment lifecycle
  • Keycloak Authentication: OAuth2 integration with group-based access control

πŸ“– See MLflow Documentation

KServe

Model serving platform for deploying ML models on Kubernetes:

  • Multi-Framework Support: TensorFlow, PyTorch, scikit-learn, XGBoost, MLflow, and more
  • MLflow Integration: Deploy models directly from MLflow Model Registry
  • Inference Protocols: REST and gRPC with v2 Open Inference Protocol
  • RawDeployment Mode: Uses native Kubernetes Deployments without Knative dependency

πŸ“– See KServe Documentation

Langfuse

LLM observability and analytics platform:

  • Trace Tracking: Monitor LLM calls, chains, and agent executions with detailed traces
  • Prompt Management: Version and test prompts with playground interface
  • Analytics: Track costs, latency, and token usage across all LLM applications
  • Keycloak Authentication: OAuth2 integration with automatic user provisioning

πŸ“– See Langfuse Documentation

Apache Superset

Modern business intelligence platform:

  • Rich Visualizations: 40+ chart types including mixed charts, treemaps, and heatmaps
  • SQL Lab: Powerful editor for complex queries and dataset creation
  • Keycloak & Trino: OAuth2 authentication and Iceberg data lake integration

πŸ“– See Superset Documentation

Metabase

Lightweight business intelligence:

  • Simple Setup: Quick configuration with clean, modern UI
  • Multiple Databases: Connect to PostgreSQL, Trino, and more
  • Keycloak Authentication: OAuth2 integration for user management

πŸ“– See Metabase Documentation

Querybook

Big data querying UI with notebook interface:

  • Trino Integration: SQL queries against multiple data sources with user impersonation
  • Notebook Interface: Shareable datadocs with queries and visualizations
  • Real-time Execution: WebSocket-based query progress updates

πŸ“– See Querybook Documentation

Trino

Fast distributed SQL query engine:

  • Multi-Source Queries: Query PostgreSQL, Iceberg, and other sources in single query
  • Keycloak Authentication: OAuth2 for Web UI, password auth for JDBC clients
  • Sample Data: TPCH catalog with benchmark data for testing

πŸ“– See Trino Documentation

DataHub

Modern data catalog and metadata management:

  • OIDC Integration: Keycloak authentication for unified access
  • Metadata Discovery: Search and browse data assets across platforms
  • Lineage Tracking: Visualize data flow and dependencies

πŸ“– See DataHub Documentation

ClickHouse

High-performance columnar OLAP database:

  • Fast Analytics: Optimized for analytical queries on large datasets
  • Compression: Efficient storage with columnar format
  • Real-time Ingestion: Stream data from Kafka and other sources

πŸ“– See ClickHouse Documentation

Qdrant

High-performance vector database:

  • Similarity Search: Fast vector search for AI/ML applications
  • Rich Filtering: Combine vector search with structured filters
  • Scalable: Distributed deployment for large-scale embeddings

πŸ“– See Qdrant Documentation

FalkorDB

High-performance graph database with vector capabilities:

  • Knowledge Graphs: Build and query complex relationship networks with OpenCypher
  • Vector Search: Native vector similarity for GraphRAG applications
  • Redis Compatible: Uses Redis protocol for easy integration

πŸ“– See FalkorDB Documentation

Lakekeeper

Apache Iceberg REST Catalog:

  • OIDC Authentication: Keycloak integration for secure access
  • Table Management: Manages Iceberg tables with ACID transactions
  • Multi-Engine: Compatible with Trino, Spark, and other query engines

πŸ“– See Lakekeeper Documentation

Apache Airflow

Workflow orchestration platform:

  • DAG-Based: Define data pipelines as code with Python
  • JupyterHub Integration: Develop and test workflows in notebooks
  • Keycloak Authentication: OAuth2 for user management

πŸ“– See Airflow Documentation

Dagster

Modern data orchestration platform:

  • Asset-Centric: Define data assets and their dependencies
  • Integrated Development: Built-in UI for development and monitoring
  • Testing & Validation: Data quality checks and pipeline testing

πŸ“– See Dagster Documentation

Fairwinds Polaris

Kubernetes configuration validation and best practices auditing:

  • Security Checks: Validates security configurations against best practices
  • Efficiency Analysis: Identifies missing resource requests and limits
  • Real-time Auditing: Continuous cluster configuration scanning
  • Dashboard Interface: Visual reporting of issues by severity

πŸ“– See Fairwinds Polaris Documentation

Goldilocks

Resource recommendation dashboard for right-sizing workloads:

  • VPA Integration: Powered by Vertical Pod Autoscaler for metrics-based recommendations
  • Visual Dashboard: User-friendly interface for viewing resource recommendations
  • QoS Guidance: Recommendations for Guaranteed, Burstable, and BestEffort classes
  • Monitoring-Only Mode: Observes workloads without automatic scaling
  • Namespace-Based: Enable monitoring per namespace with labels

πŸ“– See Goldilocks Documentation

πŸ“– See VPA Documentation

Common Operations

User Management

Create additional users:

just keycloak::create-user

Add user to group:

just keycloak::add-user-to-group <username> <group>

Database Management

Create database:

just postgres::create-db <dbname>

Create database user:

just postgres::create-user <username>

Grant privileges:

just postgres::grant <dbname> <username>

Secret Management

Store secrets in Vault:

just vault::put <path> <key>=<value>

Retrieve secrets:

just vault::get <path> <field>

Security & Authentication

OAuth2 Proxy Integration

For applications that don't natively support Keycloak/OIDC authentication, buun-stack provides OAuth2 Proxy integration to add Keycloak authentication to any application:

  • Universal Authentication: Add Keycloak SSO to any web application
  • Automatic Setup: Configures Keycloak client, secrets, and proxy deployment
  • Security: Prevents unauthorized access by routing all traffic through authentication
  • Easy Management: Simple recipes for setup and removal

Setup OAuth2 authentication for any application:

# For CH-UI (included in installation prompt)
just ch-ui::setup-oauth2-proxy

# For any custom application
just oauth2-proxy::setup-for-app <app-name> <app-host> [namespace] [upstream-service]

Remove OAuth2 authentication:

just ch-ui::remove-oauth2-proxy
just oauth2-proxy::remove-for-app <app-name> [namespace]

The OAuth2 Proxy automatically:

  • Creates a Keycloak client with proper audience mapping
  • Generates secure secrets and stores them in Vault
  • Deploys proxy with Traefik ingress routing
  • Disables direct application access to ensure security

Remote Access

Once configured, you can access your cluster from anywhere:

# SSH access
ssh ssh.yourdomain.com

# Kubernetes API
kubectl --context yourpc-oidc get nodes

# Web interfaces
# Vault: https://vault.yourdomain.com
# Keycloak: https://auth.yourdomain.com
# Grafana: https://grafana.yourdomain.com
# Trino: https://trino.yourdomain.com
# Querybook: https://querybook.yourdomain.com
# Superset: https://superset.yourdomain.com
# Metabase: https://metabase.yourdomain.com
# Airflow: https://airflow.yourdomain.com
# JupyterHub: https://jupyter.yourdomain.com
# MLflow: https://mlflow.yourdomain.com
# Langfuse: https://langfuse.yourdomain.com

Customization

Adding Custom Recipes

You can extend buun-stack with your own Just recipes and services:

  1. Copy the example files:

    cp custom-example.just custom.just
    cp -r custom-example custom
  2. Use the custom recipes:

    # Install reddit-rss
    just custom::reddit-rss::install
    
    # Install Miniflux feed reader
    just custom::miniflux::install
  3. Create your own recipes:

Add new modules to the custom/ directory following the same pattern as the examples. Each module should have its own justfile with install, uninstall, and other relevant recipes.

The custom.just file is automatically imported by the main Justfile if it exists, allowing you to maintain your custom workflows separately from the core stack.

Demo Projects

The following demo projects showcase end-to-end data workflows using buun-stack:

ML Model Serving with MLflow and KServe

examples/kserve-mlflow-iris

End-to-end machine learning workflow demonstrating JupyterHub, MLflow, and KServe integration:

  • JupyterHub for model training and testing
  • MLflow for experiment tracking and model registry
  • KServe for model deployment and inference

Key technologies: MLflow, KServe, MinIO, JupyterHub

Salesforce to Iceberg REST Catalog

dlt-salesforce-iceberg-rest-demo

Demonstrates Salesforce data ingestion into an Iceberg data lake:

  • dlt extracts data from Salesforce API (Account, Contact, Opportunity, etc.)
    • Custom Iceberg destination loads data into Lakekeeper REST Catalog
    • Automatic schema conversion from dlt to Iceberg with PyArrow
  • Orchestration with Dagster or Apache Airflow

Key technologies: dlt, Iceberg, Lakekeeper, MinIO

E-commerce Lakehouse Analytics

payload-ecommerce-lakehouse-demo

Full-stack e-commerce application with integrated lakehouse analytics:

  • Next.js + Payload CMS for e-commerce application
  • dlt ingests data incrementally from Payload API to Iceberg
  • dbt transforms raw data into analytics-ready star schema
  • Trino queries across all data layers (raw, staging, marts)
  • Superset/Metabase for dashboards and business intelligence

Key technologies: Next.js, Payload CMS, dlt, dbt, Iceberg, Lakekeeper, Trino, Superset, Metabase

Both projects demonstrate the medallion architecture (raw β†’ staging β†’ marts) and showcase how buun-stack components work together for production data workflows.

Documentation

Troubleshooting

Having issues? Check the Troubleshooting Guide for solutions to common problems:

Resource Management

See Resource Management Guide for configuring CPU and memory:

  • QoS classes (Guaranteed vs Burstable)
  • Using Goldilocks for recommendations
  • Best practices and examples

License

MIT License - See LICENSE file for details

About

A remotely accessible Kubernetes home lab with OIDC authentication. Build a modern development environment with integrated data analytics and AI capabilities. Includes an open data stack for data ingestion, transformation, serving, and orchestration.

Topics

Resources

License

Stars

Watchers

Forks