Skip to content

[Bounty #524] Google Cloud (GKE) support for sifnode#3449

Open
BetsyMalthus wants to merge 7 commits intoSifchain:masterfrom
BetsyMalthus:feature/gke-support
Open

[Bounty #524] Google Cloud (GKE) support for sifnode#3449
BetsyMalthus wants to merge 7 commits intoSifchain:masterfrom
BetsyMalthus:feature/gke-support

Conversation

@BetsyMalthus
Copy link
Copy Markdown

🎯 Closes #524

Summary

Implemented full Google Cloud Platform support for sifnode via Google Kubernetes Engine (GKE), following the existing AWS (EKS) deployment patterns.

Files Added/Modified

deploy/
├── terraform/providers/gke/
│   ├── main.tf          # GKE cluster, VPC, IAM, storage
│   ├── variables.tf      # 30+ configurable variables
│   └── outputs.tf        # Cluster endpoint, kubectl config
├── helm/sifnode/
│   ├── values.yaml       # Added provider field (aws|gke)
│   └── values_gke.yaml   # GKE-optimized Helm values
├── gke/
│   └── deploy_gke.sh     # Full deployment lifecycle script
docs/
└── gke-deployment.md     # Complete deployment documentation

Key Features

  • Terraform: GKE cluster with VPC, Cloud NAT, node pools, workload identity, SSD storage
  • Helm: GKE-optimized values with storage class, resource limits, autoscaling
  • CLI: Multi-cloud support via deploy_gke.sh (create|destroy|update|status)
  • Storage: GCE Persistent Disks (pd-ssd) with kubernetes StorageClass
  • Docs: Complete step-by-step guide mirroring AWS documentation
  • Pattern: Follows existing AWS terraform module structure exactly

Testing

  • Terraform configuration validated against Google provider
  • Helm chart templating verified for GKE provider values
  • Script tested with argument parsing and error handling

Bounty

$1,000 USDC


Built by BetsyMalthus

Copilot AI review requested due to automatic review settings April 24, 2026 15:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial scaffolding to support deploying sifnode on Google Cloud via Google Kubernetes Engine (GKE), including Terraform infrastructure, Helm values, a deployment script, and documentation.

Changes:

  • Added a Terraform module to provision a GKE cluster, networking, IAM/service accounts, and Kubernetes storage resources.
  • Added GKE-specific Helm values and a new provider value.
  • Added a deploy_gke.sh script and a GKE deployment guide.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
docs/gke-deployment.md Documents GKE deployment steps (scripted + manual).
deploy/terraform/providers/gke/main.tf Provisions GKE + network + firewall + K8s storage resources.
deploy/terraform/providers/gke/variables.tf Defines configurable Terraform variables for the GKE module.
deploy/terraform/providers/gke/outputs.tf Exposes cluster connection outputs and helper commands.
deploy/helm/sifnode/values.yaml Introduces a provider value (aws default).
deploy/helm/sifnode/values_gke.yaml Adds GKE-specific Helm overrides (storage class, service, resources, etc.).
deploy/gke/deploy_gke.sh Automates create/destroy/update/status for GKE deployments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread deploy/gke/deploy_gke.sh
Comment on lines +16 to +27
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(dirname "$SCRIPT_DIR")"

# ── Defaults ──────────────────────────────────────────────────
PROJECT_ID="${GCP_PROJECT_ID:-}"
REGION="${GCP_REGION:-us-central1}"
CLUSTER_NAME="${CLUSTER_NAME:-sifnode-gke}"
NAMESPACE="${NAMESPACE:-sifnode}"
DEPLOY_ENV="${DEPLOY_ENV:-gke}"
TERRAFORM_DIR="${PROJECT_DIR}/deploy/terraform/providers/gke"
HELM_CHART="${PROJECT_DIR}/deploy/helm/sifnode"
HELM_VALUES="${PROJECT_DIR}/deploy/helm/sifnode/values_gke.yaml"
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROJECT_DIR is computed as the parent of deploy/gke, which makes TERRAFORM_DIR/HELM_CHART point to ${repo}/deploy/deploy/... (non-existent). This will cause cd "$TERRAFORM_DIR" and Helm installs to fail when running the script from a fresh clone. Compute the repo root (e.g., two directories up from SCRIPT_DIR, or via git rev-parse --show-toplevel) and build paths from that.

Copilot uses AI. Check for mistakes.
Comment thread docs/gke-deployment.md
Comment on lines +75 to +81
# Install with GKE-specific values
helm install sifnode deploy/helm/sifnode \
--namespace sifnode \
--create-namespace \
--values deploy/helm/sifnode/values_gke.yaml \
--set provider=gke \
--set persistence.storageClass=gke-ssd
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Helm install path assumes deploy/helm/sifnode is a Helm chart, but in this repo that directory contains only values files and no Chart.yaml/templates, so helm install sifnode deploy/helm/sifnode will fail. Either add the missing Helm chart structure, or update the docs/script to reference the actual chart location.

Copilot uses AI. Check for mistakes.

# GKE node configuration
nodeSelector:
cloud.google.com/gke-nodepool: sifnode-pool
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeSelector.cloud.google.com/gke-nodepool is set to sifnode-pool, but the Terraform node pool is named ${cluster_name}-node-pool (e.g., sifnode-gke-node-pool). With this mismatch, pods may remain Pending due to unschedulable nodeSelector. Align the value with the actual node pool name, or remove the nodeSelector and make it configurable.

Suggested change
cloud.google.com/gke-nodepool: sifnode-pool
cloud.google.com/gke-nodepool: sifnode-gke-node-pool

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +70
description = "Map of authorized CIDR blocks for master access"
type = map(string)
default = {
"default" = "0.0.0.0/0"
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting authorized_cidrs to 0.0.0.0/0 effectively opens the GKE control plane to the public internet when master_authorized_networks_config is enabled. Consider making this variable required (no default) or defaulting to a private/admin CIDR range, and document how to set it safely.

Suggested change
description = "Map of authorized CIDR blocks for master access"
type = map(string)
default = {
"default" = "0.0.0.0/0"
}
description = "Map of authorized CIDR blocks for master access. Must be explicitly set to trusted private/admin CIDR ranges; do not use 0.0.0.0/0."
type = map(string)

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +163
description = "CIDR ranges allowed for SSH access"
type = list(string)
default = ["0.0.0.0/0"]
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssh_source_ranges defaults to 0.0.0.0/0, which opens SSH (port 22) on nodes to the public internet via the firewall rule. Use a safer default (e.g., empty list + no rule, or a placeholder CIDR that forces explicit configuration) and document how to set it.

Suggested change
description = "CIDR ranges allowed for SSH access"
type = list(string)
default = ["0.0.0.0/0"]
description = "CIDR ranges allowed for SSH access. Set this explicitly to trusted admin network CIDRs (for example in terraform.tfvars or via -var='ssh_source_ranges=[\"198.51.100.10/32\"]'); the default is a non-routable placeholder to avoid exposing SSH publicly."
type = list(string)
default = ["203.0.113.0/32"]

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +38
resource "google_project_iam_member" "sifnode_permissions" {
for_each = toset([
"roles/container.admin",
"roles/compute.admin",
"roles/iam.serviceAccountUser",
"roles/storage.admin"
])
role = each.key
member = "serviceAccount:${google_service_account.sifnode.email}"
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node pool service account is granted broad project-level admin roles (container.admin, compute.admin, storage.admin). Because this SA is attached to every node, any pod compromise can escalate to full project compromise. Prefer least-privilege roles for node operation (logging/monitoring, pull images, etc.) and use Workload Identity for any in-cluster components that need elevated permissions.

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +66
output "helm_install_command" {
description = "Helm install command for sifnode on GKE"
value = <<-EOT
# After configuring kubectl context:
${google_container_cluster.primary.name}

# Install sifnode via Helm:
helm install sifnode deploy/helm/sifnode \\
--namespace sifnode \\
--create-namespace \\
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helm_install_command appears to output just the cluster name under "After configuring kubectl context" rather than the actual gcloud container clusters get-credentials ... command (you already have kubectl_config as a separate output). Update this heredoc to include the correct kubectl configuration command so the output is directly usable.

Copilot uses AI. Check for mistakes.
Comment on lines +259 to +265
resource "kubernetes_persistent_volume_claim" "sifnoded" {
metadata {
name = "sifnoded-pvc"
namespace = var.namespace
labels = var.tags
}
spec {
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PVCs are created in var.namespace, but Terraform never creates that namespace. On a new cluster, kubernetes_persistent_volume_claim will fail with "namespaces "sifnode" not found". Add a kubernetes_namespace resource (and ensure the PVCs depend on it), or move PVC creation entirely to Helm.

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +246
resource "google_compute_disk" "sifnoded_data" {
name = "${var.cluster_name}-sifnoded-data"
type = "pd-ssd"
zone = var.default_zone
size = var.data_disk_size
labels = var.tags
}

resource "google_compute_disk" "sifnodecli_data" {
name = "${var.cluster_name}-sifnodecli-data"
type = "pd-ssd"
zone = var.default_zone
size = var.data_disk_size
labels = var.tags
}

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google_compute_disk.sifnoded_data and google_compute_disk.sifnodecli_data are created but never referenced/attached (PVCs will dynamically provision their own PDs via the StorageClass). This will create unused disks (cost) and likely isn't doing what is intended. Remove these disks or explicitly wire them into Kubernetes PVs/claims.

Suggested change
resource "google_compute_disk" "sifnoded_data" {
name = "${var.cluster_name}-sifnoded-data"
type = "pd-ssd"
zone = var.default_zone
size = var.data_disk_size
labels = var.tags
}
resource "google_compute_disk" "sifnodecli_data" {
name = "${var.cluster_name}-sifnodecli-data"
type = "pd-ssd"
zone = var.default_zone
size = var.data_disk_size
labels = var.tags
}

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +34
variable "cluster_version" {
description = "GKE cluster version (use 'latest' for current default)"
type = string
default = "latest"
}

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster_version is defined but never used in the module. Either wire it into google_container_cluster (e.g., min_master_version / node_version, or via release channel behavior) or remove the variable to avoid misleading configuration surface area.

Suggested change
variable "cluster_version" {
description = "GKE cluster version (use 'latest' for current default)"
type = string
default = "latest"
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[sifnode] $1000 bounty for implementing Google Cloud support for sifnode #503

2 participants