Skip to content

feat: GCP Terraform Phase 1 infrastructure#3448

Open
wilsonhoe wants to merge 1 commit intoSifchain:masterfrom
wilsonhoe:feature/gcp-terraform-phase1
Open

feat: GCP Terraform Phase 1 infrastructure#3448
wilsonhoe wants to merge 1 commit intoSifchain:masterfrom
wilsonhoe:feature/gcp-terraform-phase1

Conversation

@wilsonhoe
Copy link
Copy Markdown

Summary

Implements Phase 1 GCP infrastructure for Sifnode using Terraform.

Modules Implemented

Module Description
VPC Custom network, subnets, Cloud NAT, firewall rules
GKE Regional cluster, n2-standard-4, autoscaling 3-5 nodes, Workload Identity
Cloud SQL PostgreSQL 14, HA regional, private IP, backups
Load Balancer External HTTPS with managed SSL, health checks on /status:1317
Monitoring 5 alert policies, validator dashboard template

Environments

  • environments/dev/ - Development configuration
  • environments/prod/ - Production configuration

Validation

  • terraform fmt passes
  • terraform validate passes for all environments

Bounty: Sifnode Google Cloud Platform

5 modules: VPC, GKE, Cloud SQL, Load Balancer, Monitoring
Includes dev/prod environments and documentation

Note: .terraform directories excluded (binaries >100MB)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c551d5ba5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +54 to +55
# Disable default node pool - we define our own below
initial_node_count = 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove default node pool from the GKE cluster

This cluster never sets remove_default_node_pool = true, so GKE will keep the default node pool in addition to validator_pool. That leaves extra untainted nodes running (unexpected cost and scheduling behavior) even though the module comment says the default pool is disabled. Add remove_default_node_pool = true on the cluster resource to make the dedicated validator pool behavior match the intended design.

Useful? React with 👍 / 👎.

Comment on lines +128 to +132
filter = "metric.type=\"kubernetes.io/container/memory/used_bytes\" resource.type=\"k8s_container\"${local.cluster_filter != "" ? " AND ${local.cluster_filter}" : ""}"
duration = "300s"
comparison = "COMPARISON_GT"

threshold_value = 0.85
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a byte-based threshold or utilization metric for memory alert

The policy compares kubernetes.io/container/memory/used_bytes against 0.85, but this metric is measured in bytes, so nearly every running container exceeds 0.85 bytes and the alert will fire continuously. This creates persistent false positives and makes the alert unusable. Use a bytes threshold appropriate for your workloads or switch to a utilization/ratio metric before applying a percent-style threshold.

Useful? React with 👍 / 👎.

Comment on lines +23 to +27
resource "google_compute_backend_service" "this" {
name = "${var.name}-backend"
project = var.project_id

protocol = "HTTP"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set backend service to EXTERNAL_MANAGED for this load balancer

The forwarding rules are configured with load_balancing_scheme = "EXTERNAL_MANAGED", but the backend service leaves load_balancing_scheme unset (defaulting to classic EXTERNAL). This mixes classic and external managed ALB components and causes creation/update failures when wiring the URL map/proxy/backend together. Set the backend service scheme explicitly to EXTERNAL_MANAGED (or consistently use classic EXTERNAL everywhere).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant