Skip to content

Add language detection to auto-monitor to inject only the relevant SDK#380

Open
Miqueasher wants to merge 1 commit into
aws:mainfrom
Miqueasher:feature/auto-monitor-language-detection
Open

Add language detection to auto-monitor to inject only the relevant SDK#380
Miqueasher wants to merge 1 commit into
aws:mainfrom
Miqueasher:feature/auto-monitor-language-detection

Conversation

@Miqueasher
Copy link
Copy Markdown

@Miqueasher Miqueasher commented May 5, 2026

Summary

When monitorAllServices: true, auto-monitor currently injects all 4 language SDK init containers (Java, Python, Node.js, .NET) into every pod regardless of runtime, causing liveness/readiness probe failures, restart loops, and deployment instability.

This PR adds a registry-based language detector that inspects container image config (ENV, CMD, ENTRYPOINT) via google/go-containerregistry without pulling layers (~100-500ms, 5s timeout)
Falls back gracefully through image name patterns → pod-spec env vars → pod-spec commands → all languages (current behavior), ensuring zero regression

Detection Layers

  1. Registry image config — fetch ENV/CMD/ENTRYPOINT from manifest (public registries + ECR with IRSA)
  2. Image name patterns — match language keywords in image reference
  3. Pod-spec env vars — check for JAVA_HOME, PYTHONPATH, NODE_ENV, DOTNET_ROOT, etc.
  4. Pod-spec commands — match runtime binaries (java, python, node, dotnet)
  5. Fallback — all configured languages

Dependencies added

Package | Version | Purpose
github.com/google/go-containerregistry | v0.20.0 | Fetch image config from registry without pulling layers
github.com/aws/aws-sdk-go (existing) | v1.45.25 | ECR auth via custom keychain

Test Plan

  • Unit tests: 92/92 passing (65 new + 27 existing, zero regressions)
  • Live EKS validation (us-east-1): Custom operator build deployed with monitorAllServices=true, no manual annotations
  • Java, Python, Node.js, .NET public images → 1 init container each (detected via registry config)
  • Alpine, Nginx, Busybox → 4 init containers (correct fallback, no false positives)
  • Private ECR image with language keyword → detected via image name pattern
  • Pod-spec JAVA_HOME on Alpine → detected via env var fallback
  • Multi-container (Python + Nginx sidecar) → correctly detected Python only
  • Failure mode: When all detection fails, behavior is identical to current production
  • Operator image digest confirmed matching between running pod and ECR (sha256:5f74a89f76b3...), zero pod restarts observed

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment thread pkg/instrumentation/auto/language_detector.go
Comment thread pkg/instrumentation/auto/monitor.go
Comment thread pkg/instrumentation/auto/language_detector.go
Comment thread pkg/instrumentation/auto/language_detector.go
Comment on lines +196 to +199
desc, err := remote.Get(ref,
remote.WithAuthFromKeychain(d.keychain),
remote.WithContext(ctx),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to cache this for the same kind of pod. otherwise for large cluster with thousands of pod, it will either be throttled or make some performance impact.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — we'll add an image-level cache so that the same image reference is only looked up once from the registry. If 200 workloads use the same image, the registry call fires once and subsequent lookups return the cached result.

Comment thread pkg/instrumentation/auto/language_detector.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants