Skip to content

Conversation

@Rizwana777
Copy link
Collaborator

@Rizwana777 Rizwana777 commented Nov 27, 2025

fixes #454

JIRA - https://issues.redhat.com/browse/GITOPS-8091

gen-redis-tls-certs.sh:
Generates CA and Redis server TLS certificates with appropriate SANs for all vclusters

configure-redis-tls.sh:
Patches Redis deployments to enable TLS-only mode and creates the argocd-redis-tls secret

configure-argocd-redis-tls.sh:
Configures Argo CD components (server, repo-server, application-controller) to connect to Redis using TLS

E2E tests use InsecureSkipVerify: true to skip certificate validation while maintaining TLS encryption, simplifying automated testing with dynamic LoadBalancer addresses that don't match certificate SANs. Please let me know if this is incorrect and need to be changed

Summary by CodeRabbit

  • New Features
    • Added end-to-end Redis TLS support (agent, principal, proxy, cluster cache) with CA, server certs, and insecure option; CLI flags and deployment config exposed.
  • Documentation
    • New Redis TLS guide and Kubernetes getting-started TLS sections; README and helm docs updated.
  • Bug Fixes
    • Increased timeouts, startup ordering, and retry behavior to reduce test flakiness.
  • Tests
    • E2E tests, fixtures and helpers updated for TLS flows and more robust polling/retries.
  • Chores
    • Dev/e2e scripts and helper scripts added for generating and configuring Redis TLS.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 27, 2025

Walkthrough

Adds end-to-end Redis TLS support across agent and principal: CLI flags and option setters, TLS wiring for Redis clients and the Redis proxy (server and upstream), manifest/Helm changes, dev/e2e scripts and tests updated to require/detect Redis TLS, and new documentation and helper scripts for certificate generation and deployment.

Changes

Cohort / File(s) Summary
Agent runtime & TLS wiring
agent/agent.go, agent/inbound_redis.go, agent/options.go
Add Redis TLS flags/fields, load CA/pool or allow InsecureSkipVerify, build tls.Config for Redis clients, pass TLS config into cluster cache, and expose AgentOption setters for TLS settings.
Principal CLI, proxy & server
cmd/argocd-agent/principal.go, principal/options.go, principal/server.go, principal/redisproxy/redisproxy.go, principal/listen.go, principal/resource.go
Add flags/options to enable TLS, load server cert/key (file or secret), configure upstream CA or insecure mode, implement proxy server TLS and upstream TLS wrapping, extend cluster manager Redis TLS wiring, and increase some timeouts and logging.
Cluster cache / manager API
internal/argocd/cluster/cluster.go, internal/argocd/cluster/manager.go, internal/argocd/cluster/*.go test files
Extend NewClusterCacheInstance and NewManager signatures to accept *tls.Config; wire TLS into redis.Options; update call sites and tests to match new arity.
Agent CLI wiring
cmd/argocd-agent/agent.go
Add CLI flags --redis-tls-enabled, --redis-tls-ca-path, --redis-tls-insecure, validate exclusivity, and apply TLS options during agent startup.
Helm, Kubernetes manifests & values
install/helm-repo/argocd-agent-agent/..., install/kubernetes/agent/agent-deployment.yaml, install/kubernetes/principal/principal-deployment.yaml, install/kubernetes/*-params-cm.yaml, install/helm-repo/.../values.yaml, values.schema.json, README.md
Add redisTLS and networkPolicy values, new env vars, optional secret volume mounts for CA/server certs, update defaults (tlsRootCAPath), and add schema entries and README docs.
Dev / E2E scripts & Makefile
hack/dev-env/gen-redis-tls-certs.sh, hack/dev-env/configure-redis-tls.sh, hack/dev-env/configure-argocd-redis-tls.sh, hack/dev-env/start-*, hack/dev-env/Procfile.e2e, hack/dev-env/start-e2e.sh, hack/dev-env/start-principal.sh, test/run-e2e.sh, Makefile
Add cert generation and configuration scripts, patch Argo CD components to use Redis TLS, detect TLS certs in start scripts, change local port-forward defaults, and add Makefile steps for Redis TLS during e2e setup.
E2E tests & fixtures
test/e2e/*, test/e2e/fixture/*.go, test/e2e/redis_proxy_test.go, test/e2e/rp_test.go, test/e2e/clusterinfo_test.go, test/e2e/fixture/cluster.go, test/e2e/fixture/argoclient.go, test/run-e2e.sh
Require/detect Redis TLS in test flows, add cached Redis client accessor, enable TLS flags by default in fixtures, increase timeouts, buffer channels, tune HTTP transport for SSE, and update helpers to use env overrides.
Documentation
docs/configuration/redis-tls.md, docs/getting-started/kubernetes/index.md, test/e2e/README.md
Add a comprehensive Redis TLS guide, add TLS setup steps to Kubernetes getting-started, and update E2E README with TLS prerequisites and verification.
Misc tests & small changes
agent/outbound_test.go, internal/argocd/cluster/*_test.go, test/e2e/*
Update tests to new signatures/arguments (compression/TLS/nil), adjust timeouts, and minor refactors (buffering, cleanup behavior).

Sequence Diagram(s)

sequenceDiagram
    participant AC as ArgoCD Component (client)
    participant RP as Redis Proxy (principal)
    participant AR as argocd-redis (upstream)
    participant CM as Cluster Manager / Agent Cache

    AC->>RP: TLS connect to proxy (server TLS)
    RP->>RP: createServerTLSConfig() using cert/key (file or secret)
    RP->>AR: Dial upstream (TCP)
    alt upstream TLS enabled
        RP->>AR: TLS handshake (use CA pool/file or InsecureSkipVerify)
    end
    AR-->>RP: Redis response
    RP-->>AC: Relay response (over TLS)
    CM->>AR: TLS connect for cluster cache (uses tls.Config passed into cache)
    AR-->>CM: Redis response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas to focus on:

  • TLS certificate/CA loading, error paths, and mutual exclusivity checks (principal, agent, proxy).
  • All updated public signatures: NewClusterCacheInstance and NewManager — verify every call site and tests updated.
  • Distinguish server-side proxy TLS vs upstream TLS configuration sources (file vs secret vs CA pool) and confirm correct mounts/env names in Helm and k8s manifests.
  • Dev/E2E scripts and Makefile steps: idempotence, context switching, and rollout/wait behavior.
  • Tests: increased timeouts, buffered channels, and cached client behavior — ensure these changes don’t hide race conditions.

Possibly related PRs

Suggested reviewers

  • jgwest
  • mikeshng
  • chetan-rns
  • jannfis

Poem

🐰 I hopped through code to stitch a TLS shawl,

certs in secrets, no plaintext to sprawl.
Proxy listens safe, upstream whispers true,
Tests now tug the thread — secure through and through.
Hop, hop — Redis sleeps wrapped tight with a paw.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main feature: enabling Redis TLS encryption by default across all connections, which is the primary objective of this comprehensive changeset.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from issue #454: adds TLS parameters for agent/principal, implements TLS configuration in code, updates default manifests and scripts to enable Redis TLS, configures E2E tests for TLS-only mode, and includes documentation updates.
Out of Scope Changes check ✅ Passed All changes align with the Redis TLS feature scope. Minor additions like increased timeouts, buffered channels, and logging enhancements support TLS infrastructure without introducing unrelated functionality.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 4 times, most recently from 5743de3 to 40254ae Compare November 27, 2025 11:43
@codecov-commenter
Copy link

codecov-commenter commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 7.38255% with 276 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.55%. Comparing base (22377e0) to head (6b246bf).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
principal/redisproxy/redisproxy.go 0.00% 87 Missing ⚠️
cmd/argocd-agent/principal.go 0.00% 50 Missing ⚠️
principal/options.go 0.00% 32 Missing ⚠️
principal/server.go 6.25% 28 Missing and 2 partials ⚠️
agent/agent.go 7.69% 23 Missing and 1 partial ⚠️
cmd/argocd-agent/agent.go 0.00% 22 Missing ⚠️
agent/inbound_redis.go 0.00% 16 Missing and 1 partial ⚠️
agent/options.go 0.00% 12 Missing ⚠️
principal/listen.go 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #664      +/-   ##
==========================================
- Coverage   45.56%   44.55%   -1.02%     
==========================================
  Files          90       90              
  Lines       10011    10280     +269     
==========================================
+ Hits         4562     4580      +18     
- Misses       4978     5225     +247     
- Partials      471      475       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 40254ae to 3df4a33 Compare November 27, 2025 12:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (11)
hack/dev-env/start-principal.sh (1)

23-43: Port-forward setup looks good; consider addressing the shellcheck hint.

The port-forward logic is sound. The shellcheck warning (SC2064) about using double quotes in the trap is a false positive here since $PORT_FORWARD_PID is set once and won't change. However, you could use single quotes for consistency with shellcheck best practices:

-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       trap 'kill $PORT_FORWARD_PID 2>/dev/null || true' EXIT

Note: With single quotes, the variable will expand when the trap is triggered rather than when it's set, but in this case both work correctly since the PID doesn't change.

internal/argocd/cluster/cluster_test.go (1)

31-37: Cluster manager tests correctly updated for new constructor signature

All NewManager invocations now provide the Redis compression type and a trailing nil TLS config, matching the new constructor; existing test behavior is preserved. If you later want more coverage for the Redis TLS path introduced in this PR, adding dedicated tests around a non-nil TLS config in another test file would be a good follow-up.

Also applies to: 223-226, 303-305

agent/inbound_redis.go (1)

345-372: Consider using TLS 1.3 as minimum version.

The TLS configuration sets MinVersion: tls.VersionTLS12, but for new implementations in 2025, TLS 1.3 should be preferred as the minimum version for better security. TLS 1.2 has known vulnerabilities in certain configurations.

Apply this diff:

 	if a.redisProxyMsgHandler.redisTLSEnabled {
 		tlsConfig = &tls.Config{
-			MinVersion: tls.VersionTLS12,
+			MinVersion: tls.VersionTLS13,
 		}

That said, the CA loading logic and error handling are well-implemented with appropriate warnings for insecure mode and system CA fallback.

test/run-e2e.sh (1)

61-66: Use jq for structured JSON parsing instead of grep.

Line 62 uses grep "tls-port" on JSON output, which is fragile and could produce false positives (e.g., matching in comments, annotations, or labels).

Replace with structured JSON querying using jq:

-        # Check if Redis is configured with TLS (it's a Deployment, not StatefulSet)
-        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then
+        # Check if Redis is configured with TLS
+        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | \
+             jq -e '.spec.template.spec.containers[].args[] | select(contains("--tls-port"))' >/dev/null; then
             echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!"
             echo "Please run: ./hack/dev-env/configure-redis-tls.sh ${CONTEXT}"
             exit 1
         fi

This approach reliably checks for the --tls-port argument in the container args array.

hack/dev-env/gen-redis-tls-certs.sh (1)

17-17: Consider ECDSA keys for better performance.

The script generates 4096-bit RSA keys, which are secure but relatively slow. For development and testing, consider using ECDSA P-256 keys instead, which provide equivalent security with better performance and smaller certificate sizes.

Example:

-    openssl genrsa -out "${CREDS_DIR}/ca.key" 4096
+    openssl ecparam -genkey -name prime256v1 -out "${CREDS_DIR}/ca.key"

This is optional for a dev/test certificate generation script, but ECDSA is increasingly preferred in modern TLS implementations.

test/e2e/fixture/cluster.go (1)

40-54: E2E Redis TLS wiring is correct; consider small helper for TLSConfig

The new *RedisTLSEnabled fields and getCacheInstance TLSConfig setup give tests a clear, deterministic TLS path (TLS 1.2, InsecureSkipVerify only in e2e). Defaulting both TLSEnabled flags to true in the config helpers matches the “TLS-only e2e” objective. You might later factor the repeated TLSConfig construction for principal/managed into a tiny helper, but it’s not required.

Also applies to: 165-204, 251-268, 273-315

internal/argocd/cluster/cluster.go (1)

17-32: TLS parameterization of cluster cache is clean and backwards-compatible

Extending NewClusterCacheInstance with a *tls.Config and wiring it directly into redis.Options.TLSConfig cleanly enables TLS while keeping nil as the “no TLS” path. Callers now own policy, which is appropriate. Consider updating any GoDoc on this function to mention the new TLS behavior, but the implementation itself looks solid.

Also applies to: 168-178

hack/dev-env/configure-argocd-redis-tls.sh (1)

1-201: Dev script works; consider restoring context and clarifying the banner

The script does what it needs for dev/e2e, but two improvements would help:

  1. Context restorationkubectl config use-context ${CONTEXT} permanently switches the user’s context. Mirroring hack/dev-env/configure-redis-tls.sh by capturing the original context and restoring it in a trap would make this safer to run manually.
  2. Clarify the “proper TLS certificate validation” note – Redis connections are indeed validated via --redis-use-tls and --redis-ca-certificate, but argocd-server is started with --insecure, which weakens client→server TLS. Rewording the banner to “Using proper Redis TLS certificate validation (server is insecure for dev only)” would avoid confusion.

These are UX/docs-level tweaks; the functional Redis TLS wiring looks fine.

principal/server.go (1)

349-372: Redis proxy and cluster-manager TLS wiring is coherent and option-driven

The server now cleanly drives Redis TLS from ServerOptions: redisProxy is toggled via redisTLSEnabled, with clear precedence for server cert sources (path vs in‑memory) and upstream verification (insecure vs CA path vs CA pool). The cluster manager reuses the same upstream TLS knobs to build clusterMgrRedisTLSConfig and passes it down to cluster.NewManager, so its Redis cache observes the same trust policy. Error paths on CA file read/parse are explicit and early, which is good.

If you want extra transparency, you could log a brief message when TLS is enabled but neither redisUpstreamTLSInsecure, redisUpstreamTLSCA, nor redisUpstreamTLSCAPath are set (i.e., relying on system CAs), but that’s optional.

Also applies to: 400-427

agent/agent.go (1)

17-24: Cluster cache Redis TLS follows agent Redis TLS options; consider minor reuse/logging tweaks

The new clusterCacheTLSConfig correctly mirrors the agent’s Redis TLS options (enabled flag, insecure mode, CA path) and feeds them into NewClusterCacheInstance, so the cluster cache honors the same security posture as the main Redis client. Error handling on CA read/parse is clear and fails fast.

Two optional refinements to consider later:

  • Factor the TLSConfig construction shared between this file and agent/inbound_redis.go into a small helper to keep behavior perfectly in sync.
  • When TLS is enabled but no CA path is set and redisTLSInsecure is false, add a log line indicating that system CAs are being used for the cluster cache as well (to match the visibility you already give the main Redis client).

Also applies to: 323-345

principal/redisproxy/redisproxy.go (1)

131-165: Remove unused key parsing.

Line 154 parses the private key but never uses the result. This validation is redundant since the key has already been marshaled from a valid crypto.PrivateKey interface at line 145.

Apply this diff to remove the dead code:

 		cert.Certificate = [][]byte{certDER}
 		cert.PrivateKey = rp.tlsServerKey
 		cert.Leaf = rp.tlsServerCert
-
-		// Try to parse the key
-		if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil {
-			return nil, fmt.Errorf("failed to parse private key: %w", err)
-		}
 	} else {
 		return nil, fmt.Errorf("no TLS certificate configured")
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 101d4c8 and 3df4a33.

📒 Files selected for processing (37)
  • Makefile (1 hunks)
  • agent/agent.go (2 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (3 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (2 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (2 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/server.go (3 hunks)
  • test/e2e/README.md (2 hunks)
  • test/e2e/fixture/cluster.go (5 hunks)
  • test/run-e2e.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/run-e2e.sh
  • Makefile
  • hack/dev-env/start-e2e.sh
  • install/helm-repo/argocd-agent-agent/values.yaml
  • test/e2e/README.md
  • hack/dev-env/Procfile.e2e
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • install/kubernetes/agent/agent-params-cm.yaml
  • install/kubernetes/agent/agent-deployment.yaml
🧬 Code graph analysis (12)
cmd/argocd-agent/agent.go (2)
agent/options.go (3)
  • WithRedisTLSEnabled (112-117)
  • WithRedisTLSInsecure (128-133)
  • WithRedisTLSCAPath (120-125)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
agent/inbound_redis.go (2)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
principal/options.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/server.go (1)
  • Server (72-164)
internal/tlsutil/kubernetes.go (2)
  • TLSCertFromSecret (44-66)
  • X509CertPoolFromSecret (106-128)
agent/outbound_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
🪛 Shellcheck (0.11.0)
hack/dev-env/start-e2e.sh

[warning] 58-58: Declare and assign separately to avoid masking return values.

(SC2155)

hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build and push image
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Analyze (go)
🔇 Additional comments (43)
hack/dev-env/start-agent-managed.sh (3)

37-46: LGTM!

The TLS certificate detection and conditional enablement logic is clear and well-structured. The user guidance for missing certificates is helpful.


48-61: LGTM!

The Redis address configuration is well-documented and provides clear guidance for local development with TLS. The default localhost address appropriately aligns with certificate SANs.


66-67: LGTM!

The TLS and address arguments are correctly injected into the agent startup command.

hack/dev-env/start-principal.sh (2)

58-74: LGTM!

The TLS certificate detection correctly handles both server-side certificates (for the Redis proxy) and upstream CA validation. The logic is sound and well-documented.


82-82: LGTM!

The TLS arguments are correctly injected into the principal startup command.

Makefile (1)

59-70: LGTM!

The Redis TLS setup sequence is well-structured and correctly configures TLS for all three vclusters. The messaging clearly indicates that TLS is required for E2E tests, aligning with the PR objectives.

install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)

93-101: LGTM!

The Redis TLS configuration keys are well-documented and follow the existing naming conventions. The "INSECURE" warning on the insecure flag is appropriate.

hack/dev-env/start-agent-autonomous.sh (3)

37-46: LGTM!

The TLS certificate detection logic is consistent with the managed agent script and works correctly.


48-61: LGTM!

The Redis address configuration correctly uses localhost:6382, allowing the autonomous agent to run alongside the managed agent without port conflicts.


66-67: LGTM!

The TLS and address arguments are correctly injected into the agent startup command.

agent/outbound_test.go (1)

464-464: LGTM!

The test correctly adapts to the extended NewManager signature by passing nil for the new tlsConfig parameter. This is appropriate for a test that doesn't require TLS configuration.

install/helm-repo/argocd-agent-agent/values.yaml (3)

136-136: LGTM!

The default TLS root CA path provides a sensible default for users and aligns with conventional mount paths.


138-151: LGTM!

The Redis TLS configuration is comprehensive and well-documented. TLS is appropriately enabled by default with secure settings, aligning with the PR objectives. The string values ("true"/"false") are appropriate for ConfigMap usage.


153-163: LGTM!

The network policy configuration is a good security enhancement that allows users to restrict Redis traffic. The default selectors and structure are appropriate.

test/e2e/README.md (1)

41-65: Unable to verify the InsecureSkipVerify claim due to repository access limitations.

The repository is currently inaccessible for automated verification. However, the review comment raises a valid concern: the documentation states that "principal and agents use InsecureSkipVerify: true" when connecting to Redis via LoadBalancer addresses, but this conflicts with the described behavior of startup scripts that use localhost port-forwards (where localhost should be in the certificate SANs and proper certificate validation should work).

This discrepancy needs manual verification by examining:

  1. The actual Redis client configuration in the principal and agent code
  2. Whether E2E tests genuinely use InsecureSkipVerify or perform proper certificate validation
  3. The difference between LoadBalancer connections (mentioned in docs) vs. localhost port-forward connections (described in startup scripts)
internal/argocd/cluster/manager_test.go (1)

11-11: NewManager signature update is wired correctly in tests

Importing cacheutil and passing cacheutil.RedisCompressionGZip plus a trailing nil for the TLS config matches the updated NewManager signature; test behavior remains the same and looks correct.

Also applies to: 57-58, 78-79

docs/configuration/redis-tls.md (1)

1-411: Redis TLS doc is consistent with implementation and manifests

The new documentation cleanly matches the flags, env vars, ConfigMap keys, volume mount paths, and defaults introduced in the code/manifests; it provides enough guidance and the security caveats around *_INSECURE are clear.

cmd/argocd-agent/agent.go (1)

74-77: Agent Redis TLS flags and wiring look correct

The new Redis TLS flags/envs are correctly bound, and the options passed into the agent (WithRedisTLSEnabled, WithRedisTLSInsecure, WithRedisTLSCAPath) implement the documented behavior (CLI default off, K8s/Helm on, optional CA or insecure skip) without obvious edge-case issues.

Also applies to: 184-194, 236-245

install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (1)

136-153: Helm agent deployment TLS wiring is consistent and non-breaking

The new Redis TLS env vars and the conditional redis-tls-ca volume/volumeMount are consistent with the agent ConfigMap keys and agent CLI expectations; the optional: true and .Values.redisTLS.secretName guard make this backwards-compatible.

Also applies to: 232-236, 258-266

install/kubernetes/agent/agent-params-cm.yaml (1)

88-99: Agent Redis TLS ConfigMap defaults align with deployment and docs

The new agent.redis.tls.* entries are consistent with the agent deployment envs, default to TLS-on with a CA path that matches the mounted secret, and keep insecure mode explicitly off by default.

install/kubernetes/principal/principal-deployment.yaml (1)

233-274: Principal deployment Redis TLS configuration is consistent and safe by default

The new Redis TLS env vars, redis-proxy port, and redis-server-tls / redis-upstream-tls-ca volumes are wired consistently with the documented paths and argocd-redis-tls secret; marking the secret volumes as optional: true keeps the manifest robust while still enabling TLS by default when the secret is present.

Also applies to: 280-287, 302-307, 324-339

hack/dev-env/Procfile.e2e (1)

1-6: I was unable to access the repository directly due to a clone failure. However, based on the review comment structure and the concern it raises, I can provide a rewritten version that maintains the verification request while being clear about what needs to be confirmed:


Confirm that ARGOCD_AGENT_REDIS_ADDRESS is properly consumed by agent startup scripts

The Procfile sets ARGOCD_AGENT_REDIS_ADDRESS when invoking start-agent-managed.sh and start-agent-autonomous.sh, but the agent code may expect a different environment variable or command-line flag. Verify that:

  1. start-agent-managed.sh and start-agent-autonomous.sh translate ARGOCD_AGENT_REDIS_ADDRESS into a --redis-addr CLI flag or pass it through correctly to the agent process
  2. The agent executable does not default to a hardcoded Redis address if the expected variable is absent
  3. The e2e setup actually uses the forwarded Redis ports (6380/6381/6382) rather than falling back to defaults

If the scripts or agent code expect REDIS_ADDR instead of ARGOCD_AGENT_REDIS_ADDRESS, either rename the variable here or update the scripts accordingly.

agent/inbound_redis.go (1)

51-54: LGTM - Clean TLS configuration fields.

The addition of these three fields provides a clear and straightforward mechanism to control Redis TLS behavior.

install/helm-repo/argocd-agent-agent/README.md (1)

68-72: LGTM - Clear Redis TLS configuration documentation.

The Redis TLS configuration is well-documented with sensible defaults (enabled: "true", CA path, and secret name). The inline documentation clearly indicates that insecure mode is for development only.

agent/options.go (1)

111-133: LGTM - Redis TLS option setters follow established patterns.

The three new option setters (WithRedisTLSEnabled, WithRedisTLSCAPath, WithRedisTLSInsecure) are implemented consistently with existing option setters in the file. The pattern of setting the field and returning nil is appropriate.

test/run-e2e.sh (1)

24-76: Good enforcement of Redis TLS as a hard requirement for E2E tests.

The comprehensive verification checks (certificates, secrets, and deployment configuration) across all vclusters ensure that E2E tests run only in a properly secured environment. The clear error messages with remediation steps are helpful for developers.

hack/dev-env/gen-redis-tls-certs.sh (1)

28-123: Comprehensive certificate generation with proper SANs.

The script generates certificates for all necessary components (control-plane, proxy, and agent vclusters) with appropriate Subject Alternative Names covering localhost, IP addresses, and cluster DNS. The idempotency checks ensure the script can be re-run safely.

docs/getting-started/kubernetes/index.md (2)

159-282: Redis TLS setup steps are consistent with manifests and defaults

Generation of CA/server certs, shared argocd-redis-tls secret, Redis TLS args, and verification flow all line up with the principal/agent manifests and default paths. No issues from a functional perspective.


389-478: Workload-cluster Redis TLS mirrors control-plane flow correctly

The workload-cluster TLS instructions reuse the same CA, secret structure, args, and verification pattern, which keeps the principal/workload Redis configuration aligned and predictable. Looks good.

internal/argocd/cluster/informer_test.go (1)

3-15: Tests correctly adapted to extended NewManager signature

Using cacheutil.RedisCompressionGZip and a trailing nil TLS argument keeps the tests aligned with the new constructor while preserving previous behavior. No further changes needed here.

Also applies to: 17-51, 67-88, 96-116

install/kubernetes/principal/principal-params-cm.yaml (1)

140-166: Principal Redis TLS ConfigMap defaults align with deployment wiring

The new principal.redis.tls.* keys (enable flag, server cert/key paths, server/CA secret names, upstream CA path, and insecure switch) match the principal Deployment’s volume mounts and the ServerOptions fields. Enabling TLS by default here is consistent with the PR’s objective, and the “INSECURE” comment on the upstream flag is clear.

install/kubernetes/agent/agent-deployment.yaml (3)

149-166: LGTM! Redis TLS environment variables properly configured.

The three Redis TLS environment variables follow the established pattern and are appropriately marked as optional, ensuring backward compatibility with existing deployments.


193-195: LGTM! Volume mount configured securely.

The redis-tls-ca volume is correctly mounted as read-only, following security best practices.


205-211: LGTM! Secret-backed volume configured correctly.

The redis-tls-ca volume is properly configured with optional: true, preventing deployment failures when TLS is not enabled while maintaining compatibility with TLS-enabled configurations.

internal/argocd/cluster/manager.go (1)

26-26: LGTM! TLS configuration properly integrated.

The TLS config parameter is correctly threaded through NewManager to NewClusterCacheInstance, enabling TLS-protected Redis connections for cluster caching. The nil-able *tls.Config type allows optional TLS configuration while maintaining backward compatibility at the implementation level.

Also applies to: 71-71, 81-81

hack/dev-env/configure-redis-tls.sh (2)

23-42: LGTM! Context validation is clear and robust.

The case statement properly validates the context parameter and maps it to the appropriate certificate prefix with helpful error messages for invalid inputs.


47-54: LGTM! Cleanup trap follows best practices.

The trap ensures the original kubectl context is restored on exit, preventing side effects from the script execution.

principal/redisproxy/redisproxy.go (4)

65-76: LGTM! TLS configuration fields well-organized.

The TLS configuration fields clearly separate server-side and upstream concerns, and support both in-memory and file-based certificate loading for flexibility.


98-128: LGTM! TLS configuration API is clean and straightforward.

The setter methods provide a clear API for configuring both server-side and upstream TLS, supporting multiple configuration sources.


167-211: LGTM! TLS listener creation properly implemented.

The Start() method cleanly handles both TLS and non-TLS modes with appropriate logging and error handling.


847-908: LGTM! Upstream TLS connection properly implemented.

The TLS upgrade logic for upstream Redis connections correctly handles CA certificate validation from both in-memory and file sources, SNI configuration, and test-mode insecure skip verify. The conditional TLS enablement (line 864) allows for flexible deployment modes including TLS-terminating proxy scenarios.

principal/options.go (2)

80-88: LGTM! Redis TLS configuration fields well-structured.

The new Redis TLS fields follow the established pattern for TLS configuration in ServerOptions, maintaining consistency with existing TLS fields and supporting flexible configuration sources.


492-548: LGTM! Redis TLS option functions follow established patterns.

The six new Redis TLS configuration functions are well-implemented, following the existing ServerOption pattern consistently. They provide flexible configuration through files, secrets, and direct values, with appropriate error handling and integration with the tlsutil package.

Comment on lines 258 to 288
// Configure Redis TLS
opts = append(opts, principal.WithRedisTLSEnabled(redisTLSEnabled))
if redisTLSEnabled {
// Redis proxy server TLS (for incoming connections from Argo CD)
if redisServerTLSCertPath != "" && redisServerTLSKeyPath != "" {
logrus.Infof("Loading Redis proxy server TLS configuration from files cert=%s and key=%s", redisServerTLSCertPath, redisServerTLSKeyPath)
opts = append(opts, principal.WithRedisServerTLSFromPath(redisServerTLSCertPath, redisServerTLSKeyPath))
} else if (redisServerTLSCertPath != "" && redisServerTLSKeyPath == "") || (redisServerTLSCertPath == "" && redisServerTLSKeyPath != "") {
cmdutil.Fatal("Both --redis-server-tls-cert and --redis-server-tls-key have to be given")
} else {
logrus.Infof("Loading Redis proxy server TLS certificate from secret %s/%s", namespace, redisServerTLSSecretName)
opts = append(opts, principal.WithRedisServerTLSFromSecret(kubeConfig.Clientset, namespace, redisServerTLSSecretName))
}

// Redis upstream TLS (for connections to principal's argocd-redis)
if redisUpstreamTLSInsecure {
logrus.Warn("INSECURE: Not verifying upstream Redis TLS certificate")
opts = append(opts, principal.WithRedisUpstreamTLSInsecure(true))
} else if redisUpstreamTLSCAPath != "" {
logrus.Infof("Loading Redis upstream CA certificate from file %s", redisUpstreamTLSCAPath)
opts = append(opts, principal.WithRedisUpstreamTLSCAFromFile(redisUpstreamTLSCAPath))
} else {
logrus.Infof("Loading Redis upstream CA certificate from secret %s/%s", namespace, redisUpstreamTLSCASecretName)
opts = append(opts, principal.WithRedisUpstreamTLSCAFromSecret(kubeConfig.Clientset, namespace, redisUpstreamTLSCASecretName, "tls.crt"))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's view the file around the mentioned lines
sed -n '250,290p' cmd/argocd-agent/principal.go

Repository: argoproj-labs/argocd-agent

Length of output: 196


🏁 Script executed:

#!/bin/bash
# Try using GitHub CLI to get the file content
gh api repos/argoproj-labs/argocd-agent/contents/cmd/argocd-agent/principal.go --jq '.content' 2>/dev/null | base64 -d | sed -n '250,290p'

Repository: argoproj-labs/argocd-agent

Length of output: 196


Based on my analysis of the code snippet provided in the review comment, the issue is legitimate and valid:

The Redis upstream TLS configuration lacks mutual exclusivity validation, allowing conflicting configuration modes to be silently ignored.

The review correctly identifies that:

  1. The if-else-if chain (lines 274-282) ensures only one code path executes, but doesn't validate that the user hasn't specified multiple conflicting modes
  2. If a user specifies both --redis-upstream-tls-insecure=true and --redis-upstream-ca-path=/some/path, the insecure flag silently wins without warning
  3. This contrasts with the server cert/key validation (lines 265-266) which explicitly rejects partial/conflicting configurations
  4. The suggested fix properly mirrors the server cert/key validation pattern

The suggestion to add explicit mutual exclusivity validation is sound and improves configuration robustness by preventing silent configuration drops.


Verify mutual exclusivity of upstream TLS configuration modes.

The Redis TLS configuration properly validates that server cert/key are provided together (lines 265-266). However, the upstream TLS configuration (lines 273-282) has three mutually exclusive modes but no validation:

  • redisUpstreamTLSInsecure (skip verification)
  • redisUpstreamTLSCAPath (CA from file)
  • redisUpstreamTLSCASecretName (CA from secret)

If multiple modes are specified simultaneously, the if-else-if chain means the first match wins (insecure > file > secret), silently dropping user-specified configurations. This creates poor user experience and potential misconfiguration risks.

Consider adding validation to ensure only one upstream TLS mode is specified, similar to the server cert/key validation:

if redisTLSEnabled {
    // Validate upstream TLS configuration - only one mode allowed
    modesSet := 0
    if redisUpstreamTLSInsecure {
        modesSet++
    }
    if redisUpstreamTLSCAPath != "" {
        modesSet++
    }
    if redisUpstreamTLSCASecretName != "" {
        modesSet++
    }
    if modesSet > 1 {
        cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
    }
    
    // ... existing configuration logic
}
🤖 Prompt for AI Agents
In cmd/argocd-agent/principal.go around lines 258 to 283, the Redis upstream TLS
handling uses an if/else chain that silently prefers one mode over others when
multiple upstream TLS flags are provided; add explicit mutual-exclusivity
validation before that chain by counting how many of redisUpstreamTLSInsecure,
redisUpstreamTLSCAPath (non-empty) and redisUpstreamTLSCASecretName (non-empty)
are set and calling cmdutil.Fatal with a clear message if more than one is
provided, then proceed with the existing if/else logic unchanged.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 3df4a33 to 211af17 Compare November 27, 2025 15:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
hack/dev-env/gen-redis-tls-certs.sh (1)

14-26: Don’t hide OpenSSL errors; let failures surface and stop the script.

All the openssl calls currently send stderr to /dev/null, which makes certificate generation failures very hard to diagnose, even with set -e in place. It’d be better to let stderr through (and optionally add explicit exit checks) so a broken OpenSSL invocation clearly reports why it failed and the script aborts.

For example:

-    openssl genrsa -out "${CREDS_DIR}/ca.key" 4096 2>/dev/null
+    openssl genrsa -out "${CREDS_DIR}/ca.key" 4096
+    if [[ $? -ne 0 ]]; then
+        echo "Failed to generate CA private key" >&2
+        exit 1
+    fi
-    openssl req -new -x509 -days 3650 -key "${CREDS_DIR}/ca.key" \
-        -out "${CREDS_DIR}/ca.crt" \
-        -subj "/C=US/ST=State/L=City/O=Organization/OU=Unit/CN=Redis CA" 2>/dev/null
+    openssl req -new -x509 -days 3650 -key "${CREDS_DIR}/ca.key" \
+        -out "${CREDS_DIR}/ca.crt" \
+        -subj "/C=US/ST=State/L=City/O=Organization/OU=Unit/CN=Redis CA"
+    if [[ $? -ne 0 ]]; then
+        echo "Failed to generate CA certificate" >&2
+        exit 1
+    fi

and apply the same pattern (remove 2>/dev/null, add clear error messages if desired) to the other openssl genrsa/req/x509 calls in this script.

Also applies to: 31-32, 47-58, 61-64, 79-90, 93-96, 111-121

hack/dev-env/start-e2e.sh (1)

58-58: Avoid masking kubectl errors when exporting REDIS_PASSWORD

Combining export with command substitution can hide failures from kubectl and is what ShellCheck SC2155 is warning about (also raised in an earlier review).

Splitting assignment and export (and checking the exit code) makes failures explicit:

-export REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode)
+REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode) || {
+  echo "Failed to read Redis password from argocd-redis secret in vcluster-agent-managed" >&2
+  exit 1
+}
+export REDIS_PASSWORD
hack/dev-env/configure-redis-tls.sh (1)

61-66: Validate all required certificate/key files before creating the secret

Right now the script only checks for creds/redis-tls/ca.crt. If the per-context *.crt or *.key is missing, kubectl create secret will fail with a less obvious error. This was already raised in an earlier review and is still applicable.

You can make the error clearer by validating all three files up front:

 # Check certificates exist
 if [ ! -f "creds/redis-tls/ca.crt" ]; then
     echo "Error: Redis TLS certificates not found"
     echo "Please run: ./gen-redis-tls-certs.sh"
     exit 1
 fi
+
+if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ]; then
+    echo "Error: Redis TLS certificate or key not found for ${REDIS_CERT_PREFIX}"
+    echo "Please run: ./gen-redis-tls-certs.sh"
+    exit 1
+fi

Also applies to: 81-88

🧹 Nitpick comments (10)
cmd/argocd-agent/principal.go (2)

258-288: Redis TLS wiring is sound; consider clarifying precedence and adding tests.

The overall flow (redisTLSEnabled gate, server TLS from path vs secret, upstream TLS with insecure/CA path/CA secret) mirrors existing TLS patterns and looks correct. Two refinements to consider:

  1. Silent precedence between upstream CA path and secret
    When both a CA path and a (possibly customized) CA secret are configured, the CA path branch wins and the secret is ignored, with no warning. That’s safe but can surprise operators troubleshooting TLS. A lightweight improvement would be to log (or optionally fatal on) the case where a non-default CA secret name is set alongside a CA path, e.g. log that the secret is being ignored in favor of the file-based CA. This keeps behavior but makes it explicit.

  2. Tests for flag/env combinations and TLS behavior
    Given the number of new flags and the Codecov report pointing out missing coverage in this file, it would be valuable to add unit tests around:

    • redisTLSEnabled true/false.
    • Server TLS: (cert+key), partial, and secret-based.
    • Upstream TLS: insecure vs CA path vs default secret, including precedence behavior.
      Even a small table-driven test on NewPrincipalRunCommand option wiring or a constructor helper would help lock in these semantics.

Overall, the wiring itself looks correct; this is mainly about making edge-case behavior explicit and test-backed.


419-441: Redis TLS flags and env bindings look good; minor help-text tweak optional.

The new flags and env variable bindings are consistent with existing patterns (ARGOCD_PRINCIPAL_*), and the separation between server TLS and upstream TLS is clear. As a minor polish, you might clarify in the --redis-tls-enabled description that it controls both the proxy’s listening TLS and the upstream TLS to argocd-redis (since the code configures both) to avoid ambiguity in CLI help output.

agent/options.go (1)

127-133: Consider adding runtime warning for insecure mode.

The comment indicates this option is "for testing only," but there's no runtime warning when this insecure mode is enabled. Consider adding a warning log message when TLS verification is disabled to alert operators of the security implications in production environments.

Example:

func WithRedisTLSInsecure(insecure bool) AgentOption {
	return func(o *Agent) error {
		o.redisProxyMsgHandler.redisTLSInsecure = insecure
		if insecure {
			log().Warn("INSECURE: Redis TLS certificate verification disabled. This should only be used for testing.")
		}
		return nil
	}
}
principal/server.go (1)

400-427: Consider extracting CA loading logic into a helper function.

The CA certificate loading logic (lines 413-424) is duplicated in multiple files (agent/agent.go, principal/redisproxy/redisproxy.go). Consider extracting this into a shared helper function to improve maintainability.

Example helper:

// In internal/tlsutil or similar package
func LoadCACertPool(caPath string) (*x509.CertPool, error) {
	caCert, err := os.ReadFile(caPath)
	if err != nil {
		return nil, fmt.Errorf("failed to read CA certificate from %s: %w", caPath, err)
	}
	caCertPool := x509.NewCertPool()
	if !caCertPool.AppendCertsFromPEM(caCert) {
		return nil, fmt.Errorf("failed to parse CA certificate from %s", caPath)
	}
	return caCertPool, nil
}
principal/redisproxy/redisproxy.go (1)

141-156: Simplify in-memory certificate handling.

The PKCS8 marshaling (line 145) and parsing (line 154) appears to be validation only, as the parsed result is discarded. This roundtrip is unnecessary. The private key can be assigned directly to cert.PrivateKey.

Apply this diff:

 	} else if rp.tlsServerCert != nil && rp.tlsServerKey != nil {
 		// Convert cert and key to tls.Certificate
 		certDER := rp.tlsServerCert.Raw
-		// For private key, we need to marshal it
-		keyDER, err := x509.MarshalPKCS8PrivateKey(rp.tlsServerKey)
-		if err != nil {
-			return nil, fmt.Errorf("failed to marshal private key: %w", err)
-		}
 		cert.Certificate = [][]byte{certDER}
 		cert.PrivateKey = rp.tlsServerKey
 		cert.Leaf = rp.tlsServerCert
-
-		// Try to parse the key
-		if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil {
-			return nil, fmt.Errorf("failed to parse private key: %w", err)
-		}
 	} else {
install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (1)

136-153: Redis TLS env and volume wiring look correct (consider non‑optional CA secret).

The new ARGOCD_AGENT_REDIS_TLS_* env vars and the redis-tls-ca mount/volume are consistent with the ConfigMap keys and documented CA path (/app/config/redis-tls/ca.crt), so the wiring itself looks good.

One behavioral nuance: the redis-tls-ca secret is marked optional: true, so the pod will still start if the TLS secret is missing and the agent will only fail later at runtime. If you’d prefer a fail‑fast configuration error when TLS is enabled but the CA secret is absent, you could drop optional: true on that secret.

Also applies to: 232-236, 257-266

hack/dev-env/start-agent-managed.sh (1)

37-62: Managed-agent Redis TLS handling is correct; consider de-duping with autonomous script.

The managed-agent script’s Redis TLS and address handling matches the autonomous script and the Procfile port‑forwards (localhost:6381), so behavior looks correct.

If these scripts evolve further, you might consider factoring the shared Redis TLS/address logic into a small helper (or sourcing a common start-agent-common.sh) to avoid future drift between managed and autonomous modes.

Also applies to: 66-67

test/run-e2e.sh (1)

24-70: Redis TLS precheck is solid; consider stricter context detection

The TLS gating logic (cert presence + per-context secret and tls-port check) looks good and aligns with the “TLS-only E2E” objective.

Minor robustness improvement: in the loop, kubectl config get-contexts | grep -q "${CONTEXT}" will succeed on substring matches and silently skip missing contexts. If a vcluster context is missing, it might be clearer to fail early.

You could tighten this and fail when a context is absent:

-for CONTEXT in vcluster-control-plane vcluster-agent-autonomous vcluster-agent-managed; do
-    if kubectl config get-contexts | grep -q "${CONTEXT}"; then
+for CONTEXT in vcluster-control-plane vcluster-agent-autonomous vcluster-agent-managed; do
+    if kubectl config get-contexts | awk 'NR>1 { print $2 }' | grep -qx "${CONTEXT}"; then
         echo "Checking Redis TLS in ${CONTEXT}..."
         # ...
         echo "✓ Redis TLS configured in ${CONTEXT}"
-    fi
+    else
+        echo "ERROR: kube context ${CONTEXT} is not configured; missing setup?" >&2
+        exit 1
+    fi
 done
hack/dev-env/start-principal.sh (1)

23-43: Fix trap quoting to avoid ShellCheck SC2064 warning

The port-forward logic looks good, but ShellCheck is right that the trap should avoid expanding $PORT_FORWARD_PID at definition time.

You can keep behavior and silence SC2064 by using single quotes and quoting the variable inside:

-       # Cleanup function to kill port-forward on exit
-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       # Cleanup function to kill port-forward on exit
+       trap 'kill "$PORT_FORWARD_PID" 2>/dev/null || true' EXIT

This expands PORT_FORWARD_PID when the trap runs, not when it’s set, and follows common shell best practices.

test/e2e/fixture/cluster.go (1)

40-50: Redis TLS wiring for E2E cache clients is consistent with the TLS-only test requirement

The additions to ClusterDetails and getCacheInstance correctly gate TLS usage on the new *RedisTLSEnabled flags and build a tls.Config with MinVersion: tls.VersionTLS12. Given this file is strictly under test/e2e, using InsecureSkipVerify: true here is an acceptable trade-off to keep tests working against dynamically addressed Redis endpoints while still enforcing encrypted transport.

The updated getManagedAgentRedisConfig / getPrincipalRedisConfig logic to:

  • Prefer LoadBalancer ingress IP/hostname,
  • Fall back to spec.loadBalancerIP, then ClusterIP,
  • And unconditionally set the *RedisTLSEnabled flags to true,

matches the PR goal that Redis-with-TLS is now the “happy path” for tests and will loudly fail if TLS isn’t actually configured.

If you later stabilize the Redis hostnames to always match certificate SANs, you might consider tightening this further by dropping InsecureSkipVerify and wiring in a RootCAs pool from the test CA, but that’s an optional hardening step and not required for this PR.

Also applies to: 165-195, 225-333

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3df4a33 and 211af17.

📒 Files selected for processing (37)
  • Makefile (1 hunks)
  • agent/agent.go (2 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (3 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (2 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (2 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/server.go (3 hunks)
  • test/e2e/README.md (2 hunks)
  • test/e2e/fixture/cluster.go (7 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (16)
  • install/kubernetes/agent/agent-params-cm.yaml
  • internal/argocd/cluster/cluster_test.go
  • hack/dev-env/configure-argocd-redis-tls.sh
  • install/helm-repo/argocd-agent-agent/README.md
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • agent/inbound_redis.go
  • internal/argocd/cluster/cluster.go
  • cmd/argocd-agent/agent.go
  • principal/options.go
  • install/kubernetes/principal/principal-params-cm.yaml
  • install/helm-repo/argocd-agent-agent/values.yaml
  • install/kubernetes/agent/agent-deployment.yaml
  • agent/outbound_test.go
  • install/kubernetes/principal/principal-deployment.yaml
  • docs/getting-started/kubernetes/index.md
  • Makefile
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/run-e2e.sh
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • hack/dev-env/Procfile.e2e
  • test/e2e/README.md
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (7)
internal/argocd/cluster/manager_test.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
principal/server.go (2)
internal/logging/logging.go (3)
  • Warn (300-302)
  • Debug (290-292)
  • Info (295-297)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
internal/argocd/cluster/informer_test.go (2)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
agent/options.go (1)
agent/agent.go (2)
  • AgentOption (136-136)
  • Agent (65-117)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (169-185)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

hack/dev-env/start-e2e.sh

[warning] 58-58: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Lint Go code
  • GitHub Check: Run unit tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (18)
cmd/argocd-agent/principal.go (1)

90-97: Redis TLS option variables are well-scoped and consistent.

Names and grouping align with existing Redis options and distinguish clearly between server and upstream TLS concerns; no issues from a readability or correctness standpoint.

agent/options.go (2)

111-117: LGTM!

The WithRedisTLSEnabled option follows the established pattern for agent options and correctly enables TLS for Redis connections.


119-125: LGTM!

The CA path setter is correctly implemented. File validation occurs later during runtime initialization, which is an appropriate design choice.

principal/server.go (1)

352-371: LGTM!

The Redis proxy TLS configuration correctly handles both server-side TLS (for incoming Argo CD connections) and upstream TLS (for connections to principal's Redis), with appropriate support for path-based and in-memory certificates.

internal/argocd/cluster/manager.go (1)

71-81: LGTM!

The TLS configuration parameter is correctly added to NewManager and propagated to NewClusterCacheInstance. The API design allows nil to be passed for non-TLS scenarios, maintaining backward compatibility.

principal/redisproxy/redisproxy.go (4)

65-76: LGTM!

The TLS configuration fields are well-structured, with clear separation between server-side TLS (for incoming Argo CD connections) and upstream TLS (for connections to principal's Redis). Supporting both path-based and in-memory certificates provides good flexibility.


98-128: LGTM!

The TLS setter methods follow a clean, straightforward pattern for configuration. Validation occurs later during TLS config creation or connection establishment, which is an appropriate design choice.


173-194: LGTM!

The TLS listener setup is correctly implemented with appropriate error handling and clear logging to distinguish between TLS and non-TLS modes.


863-905: Verify the TLS wrapping condition.

Line 864 checks if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure). This means TLS wrapping only occurs if both tlsEnabled is true and at least one upstream TLS option is set.

Is this the intended behavior? Should upstream TLS be enabled whenever tlsEnabled is true, even without CA configuration? The current logic might skip TLS wrapping if tlsEnabled is true but none of the CA/insecure options are set.

Please verify the intended behavior:

  • Should TLS be used for upstream Redis whenever tlsEnabled is true?
  • Or should it only use TLS when CA/insecure options are explicitly configured?

If upstream TLS should always be enabled when tlsEnabled is true, consider:

-	if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+	if rp.tlsEnabled {
 		tlsConfig := &tls.Config{
 			MinVersion: tls.VersionTLS12,
 		}
internal/argocd/cluster/manager_test.go (1)

57-57: LGTM!

The test updates correctly pass nil for the TLS configuration parameter, maintaining test coverage for non-TLS scenarios. The API change is properly reflected in all test cases.

Also applies to: 78-78

internal/argocd/cluster/informer_test.go (1)

19-19: LGTM!

All test cases are consistently updated to pass the new TLS configuration parameter. The nil values are appropriate for testing non-TLS scenarios, and the tests continue to validate the core cluster informer functionality.

Also applies to: 33-33, 50-50, 87-87, 115-115

install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)

93-101: Redis TLS agent params wiring looks consistent.

The new agent.redis.tls.* keys and defaults line up with the Helm values and deployment env wiring; I don’t see issues here.

hack/dev-env/start-agent-autonomous.sh (1)

37-62: Dev Redis TLS/address handling is coherent with the new TLS setup.

Detecting the local CA, constructing --redis-tls-* flags, and defaulting to localhost:6382 (with matching port‑forward guidance) all look correct and align with the cert generation script and Procfile wiring.

Also applies to: 66-67

test/e2e/README.md (1)

27-66: E2E Redis TLS documentation aligns well with the new tooling.

The new note and “Redis TLS” section clearly describe how TLS is auto‑configured for make setup-e2e, how to regenerate/reconfigure certs, and why InsecureSkipVerify is used in tests. Wording and commands match the added dev‑env scripts.

hack/dev-env/Procfile.e2e (1)

1-6: Procfile wiring for Redis port‑forwards and agents looks consistent.

Mapping Redis to 6380/6381/6382 and then starting principal/agents (with the agents honoring ARGOCD_AGENT_REDIS_ADDRESS) lines up with the start‑agent scripts and the TLS/localhost assumptions; this looks good.

hack/dev-env/start-principal.sh (1)

56-74: TLS enablement via REDIS_TLS_ARGS is consistent with the new Redis TLS flow

The detection of Redis TLS certs under ${SCRIPTPATH}/creds/redis-tls and conditional population of REDIS_TLS_ARGS is clear and matches the new --redis-* flags exposed by the principal. Passing $REDIS_TLS_ARGS into the go run ... principal invocation is a straightforward way to keep TLS optional but default-on when certs are present.

No functional issues spotted here.

hack/dev-env/start-e2e.sh (1)

50-55: Localhost Redis addresses fit the TLS/port-forward design

Hard-coding the Redis addresses to localhost:6380/6381/6382 and delegating port-forwards to goreman/Procfile.e2e matches the certificate SAN strategy and simplifies TLS validation for E2E.

This looks consistent with the new principal dev script and the Redis TLS configuration flow.

hack/dev-env/configure-redis-tls.sh (1)

47-55: Redis TLS configuration script is thorough and idempotent

The script’s overall flow—mapping context to cert prefix, preserving/restoring the initial kube context via a trap, creating/updating the argocd-redis-tls secret with --dry-run=client | kubectl apply, and patching volumes, volumeMounts, and args only when needed—looks well thought out and idempotent for the dev/e2e environment.

The JSON patches for both initial and append cases on volumes/volumeMounts and the rollout wait logic are appropriate for ensuring Redis ends up in TLS-only mode before proceeding.

Also applies to: 93-181, 188-204

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/clusterinfo_test.go (1)

150-156: Inconsistent timeout: this assertion still uses 30s/1s.

The final re-connected check for the autonomous agent uses 30*time.Second, 1*time.Second, while all other similar assertions in this file were updated to 60*time.Second, 2*time.Second. This inconsistency could lead to flaky tests under the same port-forward latency conditions mentioned in the comments.

Apply this diff for consistency:

 	requires.Eventually(func() bool {
 		return fixture.HasConnectionStatus(fixture.AgentAutonomousName, appv1.ConnectionState{
 			Status:     appv1.ConnectionStatusSuccessful,
 			Message:    fmt.Sprintf(message, fixture.AgentAutonomousName, "connected"),
 			ModifiedAt: &metav1.Time{Time: time.Now()},
 		}, clusterDetail)
-	}, 30*time.Second, 1*time.Second)
+	}, 60*time.Second, 2*time.Second)
🧹 Nitpick comments (8)
principal/resource.go (1)

39-39: Provide justification for the 3x timeout increase.

The timeout has been increased from 10 to 30 seconds without explanation. While this may be necessary to accommodate TLS handshake and encryption overhead introduced by this PR, the lack of documentation makes it unclear whether this change masks underlying performance issues or is genuinely required.

Please clarify why this increase is needed and consider documenting it in a comment. Additionally, as noted in the TODO above, making this timeout configurable would allow better tuning for different deployment scenarios, especially given the variance in TLS overhead across environments.

principal/listen.go (1)

174-196: Helpful logging additions for debugging the startup flow.

The logging statements provide useful visibility into the WebSocket enablement path and server startup sequence, which will help with troubleshooting.

Minor formatting suggestion: consider removing the emoji (line 174) and leading spaces in log messages (lines 176, 196) for consistency with standard structured logging conventions.

Apply this diff for consistent log message formatting:

-	log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled")
+	log().WithField("enableWebSocket", s.enableWebSocket).Info("Checking if WebSocket is enabled")
 	if s.enableWebSocket {
-		log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
+		log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
 		opts := []grpchttp1server.Option{grpchttp1server.PreferGRPCWeb(true)}
 
 		downgradingHandler := grpchttp1server.CreateDowngradingHandler(s.grpcServer, http.NotFoundHandler(), opts...)
 		go func() {
 			log().Info("Starting gRPC server.Serve() - server is now accepting connections")
 			err = s.grpcServer.Serve(s.listener.l)
-			log().WithError(err).Warn(" gRPC server.Serve() exited")
+			log().WithError(err).Warn("gRPC server.Serve() exited")
 			errch <- err
 		}()
hack/dev-env/start-agent-managed.sh (1)

63-74: Consider restricting permissions on extracted TLS credentials.

The extracted TLS private key is written to /tmp/agent-managed-tls.key with default permissions, potentially making it readable by other users on shared systems.

Apply restrictive permissions before writing sensitive files:

 # Extract mTLS client certificates and CA from Kubernetes secret for agent authentication
 echo "Extracting mTLS client certificates and CA from Kubernetes..."
 TLS_CERT_PATH="/tmp/agent-managed-tls.crt"
 TLS_KEY_PATH="/tmp/agent-managed-tls.key"
 ROOT_CA_PATH="/tmp/agent-managed-ca.crt"
+
+# Set restrictive permissions for private key
+umask 077
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \
   -o jsonpath='{.data.tls\.crt}' | base64 -d > "${TLS_CERT_PATH}"
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \
   -o jsonpath='{.data.tls\.key}' | base64 -d > "${TLS_KEY_PATH}"
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-ca \
   -o jsonpath='{.data.tls\.crt}' | base64 -d > "${ROOT_CA_PATH}"
+# Restore default umask
+umask 022
 echo " mTLS client certificates and CA extracted"
hack/dev-env/start-principal.sh (1)

41-42: Use single quotes in trap to defer variable expansion (shellcheck SC2064).

While the current code works because $PORT_FORWARD_PID is set before the trap, using single quotes is the conventional and safer pattern.

-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       trap 'kill $PORT_FORWARD_PID 2>/dev/null || true' EXIT
docs/configuration/redis-tls.md (2)

487-494: Add language specifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified. Since this is script output, use text or console.

-```
+```text
 Generating Redis TLS certificates in hack/dev-env/creds/redis-tls...

498-513: Add language specifiers to remaining script output blocks.

Same issue as above - these console output examples should have a language specifier for markdownlint compliance.

-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Redis Deployment for TLS                     ║
-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Argo CD Components for Redis TLS             ║

Also applies to: 516-532

test/e2e/redis_proxy_test.go (1)

120-123: Hard-coded sleep for SSE stream stabilization.

While the 5-second sleep addresses the race condition mentioned in the comment, it's a fixed delay that may be insufficient under heavy load or excessive in fast environments. Consider using a more deterministic approach if flakiness persists.

An alternative would be to wait for an initial SSE message (e.g., the current resource tree state) before proceeding, though the current approach is pragmatic for E2E tests.

test/e2e/fixture/cluster.go (1)

259-267: Cleanup doesn't explicitly close Redis connections.

CleanupRedisCachedClients clears the cache map but doesn't explicitly close the underlying Redis connections. While Go's garbage collector will eventually clean them up, explicit closure ensures immediate resource release and avoids connection pool exhaustion in long test runs.

Consider closing the Redis clients explicitly. The appstatecache.Cache wraps a cacheutil.Cache which has a redisClient. You may need to expose or track the underlying redis.Client to call Close():

// If the underlying redis.Client is accessible, close it explicitly
// For now, this may require refactoring getCacheInstance to return both
// the cache and the client, or using a wrapper struct

If the current approach works reliably in tests without connection issues, this can be deferred.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 211af17 and 6247404.

📒 Files selected for processing (26)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • Makefile
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • hack/dev-env/start-agent-autonomous.sh
  • install/helm-repo/argocd-agent-agent/values.schema.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-agent-managed.sh
  • test/e2e/rp_test.go
  • test/run-e2e.sh
  • hack/dev-env/Procfile.e2e
  • test/e2e/README.md
🧬 Code graph analysis (6)
principal/auth.go (1)
internal/logging/logging.go (2)
  • Trace (285-287)
  • Warn (300-302)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
  • Event (112-115)
internal/logging/logfields/logfields.go (1)
  • Event (34-34)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

487-487: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


498-498: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


516-516: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

🔇 Additional comments (35)
test/e2e/README.md (1)

21-107: Unable to verify referenced helper scripts and documentation due to repository access limitations.

The documentation structure and Redis TLS guidance are well-presented, but the verification of referenced files could not be completed in this environment. The following items require manual verification by the developer or CI system:

  • hack/dev-env/gen-redis-tls-certs.sh (executable)
  • hack/dev-env/configure-redis-tls.sh (executable)
  • hack/dev-env/configure-argocd-redis-tls.sh (executable)
  • hack/dev-env/reverse-tunnel/setup.sh (executable)
  • hack/dev-env/reverse-tunnel/README.md (exists)

Confirm these files exist and are properly executable before merging.

principal/listen.go (1)

224-230: LGTM! Clear service registration logging.

The logging statements provide good visibility into the gRPC service registration flow, making it easier to debug startup issues and verify that all services are registered successfully.

principal/tracker/tracking.go (1)

75-78: Unable to verify review comment due to repository access limitation.

The review comment cannot be verified at this time because the repository is inaccessible. To properly assess the concerns about the buffered channel change at lines 75-78, the following would need to be verified:

  1. Whether a real deadlock scenario exists between processRedisEventResponse and sendSynchronousRedisMessageToAgent
  2. Whether this change is directly related to the Redis TLS PR objective or should be split into a separate PR
  3. Whether a buffer capacity of 1 is sufficient for the actual send/receive patterns in the code
  4. Whether the buffering addresses the root cause or masks a deeper synchronization issue

Without access to the codebase to examine the sender/receiver implementations and their usage patterns, the original review comment's concerns remain unresolved.

principal/auth.go (1)

154-164: LGTM - Debug logging additions for auth interceptor.

The trace-level logging provides useful debugging information for authentication flow. The emoji prefixes add visual distinction in logs, which can be helpful during debugging sessions.

Consider documenting the emoji convention if used elsewhere, or removing them to maintain consistent log formatting across the codebase.

test/e2e/fixture/argoclient.go (1)

316-334: LGTM - Environment variable override for ArgoCD server endpoint.

The early return when ARGOCD_SERVER_ADDRESS is set provides useful flexibility for E2E tests, particularly when dynamic LoadBalancer addresses don't match certificate SANs. The fallback to K8s service lookup is preserved correctly.

test/e2e/fixture/fixture.go (3)

109-154: Increased polling timeouts for deletion operations.

The timeout increase from 60 to 120 seconds accommodates potential TLS handshake overhead and certificate validation delays during E2E tests.


229-241: Improved cleanup resilience with non-fatal warnings.

Using warnings instead of returning errors prevents cleanup failures from cascading and failing entire test suites. The DeepCopy() usage correctly avoids mutating loop variables when adjusting namespace/name.


457-470: Graceful degradation when Redis is unavailable.

The cleanup now logs a warning and continues if Redis is unavailable (e.g., port-forward died), rather than failing the cleanup. This improves test reliability.

hack/dev-env/start-agent-managed.sh (1)

37-61: LGTM - Redis TLS detection and address configuration.

The conditional TLS enablement based on certificate presence is a clean pattern. The port-forward guidance message helps developers understand the setup requirements.

hack/dev-env/start-principal.sh (3)

23-43: LGTM - Port-forward setup with cleanup trap.

The port-forward establishment with PID tracking, validation, and cleanup trap is a robust pattern for local development. The 2-second wait allows time for the connection to stabilize.


58-76: LGTM - Redis TLS certificate detection.

Properly checks for all three required files (cert, key, CA) before enabling TLS. The descriptive comments about certificate SANs help future maintainers understand the setup.


84-86: Undefined variable MTLS_ARGS referenced.

$MTLS_ARGS is used but not defined in this script. If it's intentionally optional (set externally), this is fine; otherwise, it may cause unintended behavior.

internal/argocd/cluster/cluster.go (2)

135-142: LGTM - Defensive initialization of ConnectionState.

Good defensive programming: initializing ConnectionState when the agent first connects prevents potential nil pointer issues and provides meaningful status information.


176-191: TLS configuration support added to Redis cache initialization.

The signature change to accept *tls.Config enables TLS for Redis connections. Passing nil for tlsConfig maintains backward compatibility (no TLS).

Note: Verification of caller updates could not be completed due to repository access limitations. Manual verification is required to ensure all calls to NewClusterCacheInstance have been updated to pass the tlsConfig parameter.

hack/dev-env/Procfile.e2e (1)

1-7: Port-forward configuration for TLS-enabled Redis connections.

The port-forward setup allows TLS certificate validation to work correctly since localhost is included in certificate SANs. The staggered sleep delays (3s for principal, 5s for agents) ensure port-forwards are established before components start.

Verify that MANAGED_AGENT_REDIS_ADDR and AUTONOMOUS_AGENT_REDIS_ADDR environment variables are defined in your development environment setup or sourced before running this Procfile, as they are required by lines 6-7 but not defined locally in this file.

agent/agent.go (2)

323-345: TLS configuration implementation looks correct.

The TLS configuration for the cluster cache Redis client is well-structured:

  • Uses minimum TLS 1.2 (appropriate security baseline)
  • Properly handles insecure mode with warning log (line 330) - this addresses the previous review feedback
  • Correctly loads and parses CA certificate with appropriate error handling

One minor observation: The error message at line 339 could be more consistent with the message at line 335 by using %w for error wrapping.

-			return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s", a.redisProxyMsgHandler.redisTLSCAPath)
+			return nil, fmt.Errorf("failed to parse CA certificate for cluster cache from %s: no valid certificates found", a.redisProxyMsgHandler.redisTLSCAPath)

445-460: Good improvement: immediate cluster cache info update on startup.

The refactored goroutine correctly:

  • Sends an initial update immediately on startup (line 448) rather than waiting for the first ticker interval
  • Properly defers ticker.Stop() for cleanup
  • Handles context cancellation appropriately

This ensures the principal receives cluster cache info promptly after agent startup.

cmd/argocd-agent/principal.go (3)

258-288: Redis TLS configuration logic is well-structured.

The implementation correctly:

  • Validates mutual exclusivity between --redis-upstream-tls-insecure and --redis-upstream-ca-path (lines 273-275)
  • Uses the secret as the default fallback when neither insecure nor CA path is specified (lines 285-286)
  • Logs appropriate messages for each configuration path

The previous review comment suggested validating all three modes (insecure, CA path, CA secret) as mutually exclusive. However, the current behavior is actually reasonable: the secret serves as a default when no explicit configuration is provided, which is a common pattern. If you prefer explicit mutual exclusivity for all three, let me know.


419-441: CLI flags are well-defined with consistent naming.

The Redis TLS flags follow the established patterns:

  • Consistent ARGOCD_PRINCIPAL_REDIS_* environment variable naming
  • Reasonable defaults (false for CLI, argocd-redis-tls for secret names)
  • Clear descriptions for each flag

471-471: Timeout increase is reasonable for production reliability.

The 30-second timeout for secret retrieval allows for network latency and Kubernetes API server load, which is appropriate for production environments.

test/e2e/clusterinfo_test.go (1)

108-115: Timeout increases are appropriate and well-documented.

The increased timeouts (60s/2s) with explanatory comments appropriately account for port-forward latency in long test runs. This should improve test stability in CI environments.

Also applies to: 123-129

test/e2e/rp_test.go (3)

162-169: Good refactoring: centralized endpoint and secret retrieval.

Using fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret helpers:

  • Reduces code duplication across tests
  • Centralizes the logic for environment variable checks and Kubernetes fallback
  • Makes tests more maintainable when endpoint retrieval logic changes

The helper at fixture/argoclient.go:314-336 properly checks ARGOCD_SERVER_ADDRESS environment variable first, then falls back to Kubernetes service lookup.


295-306: Consistent refactoring across test functions.

The same fixture helper pattern is applied correctly here, maintaining consistency with Test_ResourceProxy_Argo.


509-510: Minor formatting change, no functional impact.

docs/configuration/redis-tls.md (2)

1-17: Comprehensive and well-structured documentation.

This documentation thoroughly covers:

  • Architecture overview with clear diagrams
  • Quick start guides for different environments (dev, E2E, production)
  • Detailed configuration options for both principal and agent
  • Troubleshooting section with practical solutions
  • Security best practices

The table of contents and section organization make it easy to navigate.


736-755: Security best practices are appropriately scoped.

The security recommendations cover essential practices:

  • Strong key sizes (4096-bit RSA)
  • Appropriate certificate validity (1 year)
  • Private key protection with RBAC
  • Certificate rotation planning
  • Clear warning against insecure options in production
hack/dev-env/gen-redis-tls-certs.sh (1)

1-10: Well-structured certificate generation script.

The script properly handles idempotency with file existence checks, uses set -e for error handling, and generates appropriate SANs for each component. The cleanup of temporary files (CSR, EXT, SRL) is good practice.

test/run-e2e.sh (2)

24-45: Good enforcement of TLS prerequisites.

The script properly validates TLS certificates exist before running tests and provides clear, actionable error messages guiding users to run the setup scripts. This aligns with the PR objective of making Redis TLS mandatory for E2E tests.


82-115: Environment variable exports are macOS-only; verify Linux CI Redis connectivity strategy.

The Redis address environment variables (ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS, MANAGED_AGENT_REDIS_ADDR, AUTONOMOUS_AGENT_REDIS_ADDR, ARGOCD_SERVER_ADDRESS) are only set when running on macOS. Confirm that Linux CI environments have a strategy for accessing Redis services—either through MetalLB LoadBalancer service IPs or by setting these environment variables explicitly for Linux as well.

test/e2e/redis_proxy_test.go (2)

210-237: Good retry handling for transient Redis connection issues.

Wrapping the ResourceTree call in an Eventually block with explicit error logging handles the EOF errors mentioned in the comment. This is a robust pattern for E2E tests dealing with TLS-enabled Redis connections that may experience transient failures.


642-653: Appropriate SSE transport configuration.

The Timeout: 0 settings are correct for SSE streams which are long-lived connections. The IdleConnTimeout: 300s helps maintain connections during test execution. The InsecureSkipVerify: true is documented in the PR as intentional for E2E tests with dynamic LoadBalancer addresses.

hack/dev-env/configure-redis-tls.sh (2)

81-121: Good practice: scaling down components before TLS transition.

Scaling down ArgoCD components before enabling Redis TLS prevents SSL handshake errors from pods attempting non-TLS connections. Storing replica counts in a ConfigMap enables proper restoration by the companion script.


199-215: Correct Redis TLS configuration.

The Redis args properly configure TLS-only mode (--port 0 disables plain TCP, --tls-port 6379 enables TLS). The --tls-auth-clients no setting means clients authenticate via password only, not mutual TLS, which is appropriate for this use case.

test/e2e/fixture/cluster.go (2)

182-201: TLS configuration with InsecureSkipVerify is appropriate for E2E tests.

Using InsecureSkipVerify: true for E2E tests is explicitly documented in the PR description as a workaround for dynamic LoadBalancer addresses that may not match certificate SANs. The MinVersion: tls.VersionTLS12 ensures a reasonable security baseline.


319-327: Good: TLS enabled by default with environment override support.

Enabling TLS by default (ManagedAgentRedisTLSEnabled = true) aligns with the PR objective. The environment variable override (MANAGED_AGENT_REDIS_ADDR) supports local development with port-forwards, which is consistent with the macOS handling in run-e2e.sh.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 2 times, most recently from cdfef36 to 402f482 Compare December 4, 2025 12:13
Signed-off-by: Rizwana777 <[email protected]>
@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch 2 times, most recently from 40d7b3c to 8b47b98 Compare December 4, 2025 13:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/e2e/rp_test.go (1)

162-169: Good refactor using fixture helpers.

Replacing direct Kubernetes lookups with fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret centralizes configuration retrieval and supports environment variable overrides (e.g., ARGOCD_SERVER_ADDRESS), improving test flexibility.

♻️ Duplicate comments (6)
docs/configuration/redis-tls.md (3)

487-495: Add language identifier to fenced code block.

This output example block should have a language identifier. Based on markdownlint feedback and past review comments, apply this fix:

-```
+```text
 Generating Redis TLS certificates in hack/dev-env/creds/redis-tls...
 Generating CA key and certificate...
 ...
-```
+```

This resolves the MD040 linting error.

Based on learnings, past review flagged this same issue which was marked as addressed in commit 6247404, but the static analysis tool still reports it.


498-514: Add language identifier to fenced code block.

This output example block should have a language identifier:

-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Redis Deployment for TLS                     ║
 ...
-```
+```

Based on learnings, this was flagged in a past review and marked as addressed, but the linter still reports it.


516-533: Add language identifier to fenced code block.

This output example block should have a language identifier:

-```
+```text
 ╔══════════════════════════════════════════════════════════╗
 ║  Configure Argo CD Components for Redis TLS             ║
 ...
-```
+```

Based on learnings, this was flagged in a past review and marked as addressed, but the linter still reports it.

hack/dev-env/gen-redis-tls-certs.sh (1)

68-86: Guard against empty LOCAL_IP when emitting SAN IP entry.

If LOCAL_IP ends up empty, the extension file will contain IP.3 = , which can cause OpenSSL failures or produce an invalid certificate; only add IP.3 when a non-empty IP was detected.

-    cat > "${CREDS_DIR}/redis-proxy.ext" <<EOF
-subjectAltName = @alt_names
-[alt_names]
-DNS.1 = argocd-redis-proxy
-DNS.2 = argocd-redis-proxy.argocd
-DNS.3 = argocd-redis-proxy.argocd.svc
-DNS.4 = argocd-redis-proxy.argocd.svc.cluster.local
-DNS.5 = localhost
-DNS.6 = rathole-container-internal
-IP.1 = 127.0.0.1
-IP.2 = 127.0.0.2
-IP.3 = ${LOCAL_IP}
-EOF
+    cat > "${CREDS_DIR}/redis-proxy.ext" <<EOF
+subjectAltName = @alt_names
+[alt_names]
+DNS.1 = argocd-redis-proxy
+DNS.2 = argocd-redis-proxy.argocd
+DNS.3 = argocd-redis-proxy.argocd.svc
+DNS.4 = argocd-redis-proxy.argocd.svc.cluster.local
+DNS.5 = localhost
+DNS.6 = rathole-container-internal
+IP.1 = 127.0.0.1
+IP.2 = 127.0.0.2
+EOF
+
+    if [ -n "${LOCAL_IP}" ]; then
+        echo "IP.3 = ${LOCAL_IP}" >> "${CREDS_DIR}/redis-proxy.ext"
+    fi
hack/dev-env/configure-redis-tls.sh (1)

61-66: CA certificate validation still missing.

The validation checks for server certificate and key but not for ca.crt, which is used at Line 128 when creating the secret. This is the same issue flagged in previous reviews.

Apply this diff to add CA validation:

 # Check certificates exist
-if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ]; then
-    echo "Error: Redis TLS certificate or key not found for ${REDIS_CERT_PREFIX}"
+if [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.crt" ] || [ ! -f "creds/redis-tls/${REDIS_CERT_PREFIX}.key" ] || [ ! -f "creds/redis-tls/ca.crt" ]; then
+    echo "Error: Redis TLS certificates not found (${REDIS_CERT_PREFIX}.crt, ${REDIS_CERT_PREFIX}.key, or ca.crt)"
     echo "Please run: ./gen-redis-tls-certs.sh"
     exit 1
 fi
cmd/argocd-agent/principal.go (1)

272-288: Incomplete mutual exclusivity validation for upstream TLS modes.

Lines 273-275 validate that --redis-upstream-tls-insecure and --redis-upstream-ca-path are mutually exclusive, but don't check whether --redis-upstream-ca-path and --redis-upstream-ca-secret-name are both specified. If both are provided, Line 281 silently takes precedence, ignoring the secret configuration without warning.

Consider validating all three modes for mutual exclusivity:

+				// Validate upstream TLS configuration - only one mode allowed
+				modesSet := 0
+				if redisUpstreamTLSInsecure {
+					modesSet++
+				}
+				if redisUpstreamTLSCAPath != "" {
+					modesSet++
+				}
+				if redisUpstreamTLSCASecretName != "" {
+					modesSet++
+				}
+				if modesSet > 1 {
+					cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
+				}
+
-				// Validate upstream TLS configuration - insecure and CA path are mutually exclusive
-				if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" {
-					cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path")
-				}
-
 				// Redis upstream TLS (for connections to principal's argocd-redis)
 				if redisUpstreamTLSInsecure {
🧹 Nitpick comments (4)
principal/resource.go (1)

39-39: Timeout extension appropriate for TLS operations.

The increase from 10 to 30 seconds accommodates longer TLS handshake and certificate loading operations introduced by Redis TLS support across the system.

Consider the existing TODO at line 38—making this timeout configurable would provide better flexibility for different deployment scenarios.

principal/redisproxy/redisproxy.go (1)

131-165: Unused parse result at line 154.

Line 154 parses the private key but discards the result. If this is validation-only, the error check suffices. Otherwise, consider removing the parse call.

Apply this diff to remove the unused parse:

 		cert.PrivateKey = rp.tlsServerKey
 		cert.Leaf = rp.tlsServerCert
 
-		// Try to parse the key
-		if _, err := x509.ParsePKCS8PrivateKey(keyDER); err != nil {
-			return nil, fmt.Errorf("failed to parse private key: %w", err)
-		}
 	} else {
 		return nil, fmt.Errorf("no TLS certificate configured")
 	}
hack/dev-env/start-principal.sh (1)

23-43: Minor: adjust trap quoting to satisfy ShellCheck SC2064.

The trap currently interpolates PORT_FORWARD_PID at definition time; switching to a single-quoted trap and quoting the variable inside avoids SC2064 and is the idiomatic form while preserving behavior.

-       # Cleanup function to kill port-forward on exit
-       trap "kill $PORT_FORWARD_PID 2>/dev/null || true" EXIT
+       # Cleanup function to kill port-forward on exit
+       trap 'kill "$PORT_FORWARD_PID" 2>/dev/null || true' EXIT

Also applies to: 58-76, 84-85

test/e2e/fixture/fixture.go (1)

109-155: Best-effort cleanup behavior change looks intentional; consider whether other paths should match.

Doubling the deletion wait loops, using DeepCopy for cross-namespace waits, and downgrading many application/AppProject cleanup failures to warnings will reduce e2e flakiness, but may leave residual resources when deletions keep failing; if you want fully consistent best-effort semantics, you might also convert the remaining repo/namespace cleanup errors to warnings, otherwise this mixed strategy seems reasonable for tests.

Also applies to: 159-171, 218-292, 294-357, 457-470

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6247404 and 8b47b98.

📒 Files selected for processing (47)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • agent/inbound_redis.go (3 hunks)
  • agent/options.go (1 hunks)
  • agent/outbound_test.go (1 hunks)
  • cmd/argocd-agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (2 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/README.md (3 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml (2 hunks)
  • install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • install/helm-repo/argocd-agent-agent/values.yaml (1 hunks)
  • install/kubernetes/agent/agent-deployment.yaml (3 hunks)
  • install/kubernetes/agent/agent-params-cm.yaml (1 hunks)
  • install/kubernetes/principal/principal-deployment.yaml (3 hunks)
  • install/kubernetes/principal/principal-params-cm.yaml (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • internal/argocd/cluster/cluster_test.go (3 hunks)
  • internal/argocd/cluster/informer_test.go (6 hunks)
  • internal/argocd/cluster/manager.go (3 hunks)
  • internal/argocd/cluster/manager_test.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/options.go (2 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/server.go (3 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
  • Makefile
  • principal/listen.go
  • install/helm-repo/argocd-agent-agent/templates/agent-deployment.yaml
  • internal/argocd/cluster/manager_test.go
  • install/kubernetes/principal/principal-deployment.yaml
  • principal/tracker/tracking.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • test/run-e2e.sh
  • install/kubernetes/principal/principal-params-cm.yaml
  • internal/argocd/cluster/informer_test.go
  • docs/getting-started/kubernetes/index.md
  • agent/outbound_test.go
  • test/e2e/fixture/argoclient.go
  • hack/dev-env/start-agent-autonomous.sh
  • cmd/argocd-agent/agent.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/rp_test.go
  • hack/dev-env/start-agent-managed.sh
  • install/kubernetes/agent/agent-deployment.yaml
  • install/kubernetes/agent/agent-params-cm.yaml
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/e2e/README.md
  • hack/dev-env/Procfile.e2e
  • install/helm-repo/argocd-agent-agent/values.yaml
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (12)
internal/argocd/cluster/manager.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
principal/server.go (1)
internal/argocd/cluster/manager.go (1)
  • NewManager (71-119)
agent/inbound_redis.go (2)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
agent/options.go (2)
principal/options.go (1)
  • WithRedisTLSEnabled (493-498)
agent/agent.go (2)
  • AgentOption (136-136)
  • Agent (65-117)
test/e2e/clusterinfo_test.go (1)
test/e2e/fixture/cluster.go (4)
  • HasConnectionStatus (60-74)
  • AgentManagedName (37-37)
  • ClusterDetails (42-56)
  • AgentAutonomousName (38-38)
internal/argocd/cluster/cluster_test.go (1)
test/fake/kube/kubernetes.go (1)
  • NewFakeKubeClient (31-44)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

487-487: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


498-498: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


516-516: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Shellcheck (0.11.0)
hack/dev-env/start-principal.sh

[warning] 42-42: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (29)
principal/auth.go (1)

154-165: Trace logging enhances auth flow observability.

The added trace logs with visual markers help debug authentication flow during development. Trace level is appropriate for detailed debugging without impacting production logs.

test/e2e/rp_test.go (1)

295-305: Consistent fixture helper usage.

The same fixture helper pattern applied here maintains consistency across test functions and eliminates code duplication.

principal/server.go (2)

352-371: Redis proxy TLS configuration properly structured.

The TLS configuration logic correctly prioritizes configuration sources (path-based vs. in-memory) for both server TLS (incoming connections) and upstream TLS (connections to Redis). The conditional structure ensures TLS is only configured when explicitly enabled.


400-426: Cluster manager TLS configuration with robust error handling.

The TLS configuration for the cluster manager includes:

  • Proper error handling for CA certificate file operations (lines 414-421)
  • Certificate parsing validation with clear error messages
  • Security warning for insecure mode (line 408)
  • Appropriate minimum TLS version (1.2)

The CA loading from file has good error propagation that will fail server startup if certificates are misconfigured.

internal/argocd/cluster/cluster.go (2)

176-185: TLS configuration properly integrated into Redis cache.

The updated signature accepts an optional TLS config and correctly wires it into Redis client options. The nil-safe design allows TLS to be disabled when not needed.


135-142: Good defensive initialization of connection state.

When ConnectionState doesn't exist (agent just connected), this code properly initializes it with a Successful status and timestamp. This prevents gaps in connection tracking during cluster cache stats updates.

hack/dev-env/start-e2e.sh (1)

50-59: Static localhost addresses simplify TLS-enabled E2E setup.

Replacing dynamic LoadBalancer IP lookups with static localhost addresses (backed by port-forwards managed by Goreman) ensures TLS certificate validation works correctly, as localhost is included in the certificate SANs.

The separation of REDIS_PASSWORD assignment and export (lines 58-59) correctly addresses the previous shellcheck SC2155 warning, allowing proper error handling if the kubectl command fails.

test/e2e/README.md (1)

21-108: Comprehensive E2E workflow documentation with TLS guidance.

The expanded documentation clearly describes:

  • The multi-step setup process with Redis TLS configured automatically
  • Remote cluster considerations with reverse tunnel setup
  • Multi-terminal workflow requirements
  • Manual TLS reconfiguration procedures
  • Environment detection for local vs CI testing

This significantly improves the developer experience for running TLS-enabled E2E tests.

principal/redisproxy/redisproxy.go (2)

168-211: LGTM: TLS listener implementation is sound.

The TLS-enabled listener is correctly configured with MinVersion set to TLS12, and the branching logic cleanly separates TLS and non-TLS paths with appropriate logging.


847-908: Verify upstream TLS condition covers all intended scenarios.

The condition at line 864 requires tlsEnabled AND at least one of (CA, CAPath, Insecure) to wrap the upstream connection with TLS. Confirm this aligns with the intended behavior—specifically, whether tlsEnabled alone (without CA/CAPath/Insecure) should skip upstream TLS or raise an error.

install/helm-repo/argocd-agent-agent/templates/agent-params-cm.yaml (1)

93-101: LGTM: Redis TLS configuration added correctly.

The three new TLS-related keys (enabled, ca-path, and insecure) are properly documented and bound to Helm values, consistent with the TLS implementation across the codebase.

test/e2e/clusterinfo_test.go (1)

108-115: LGTM: Timeout increases accommodate TLS latency.

The timeout increases from 30s to 60s with adjusted polling intervals are appropriate for handling potential port-forward latency in long test runs, and the explanatory comments clarify the rationale.

Also applies to: 123-129, 141-142

internal/argocd/cluster/cluster_test.go (1)

36-36: LGTM: Test updated for new TLS parameter.

The nil TLS config parameter correctly aligns test call sites with the updated NewManager signature, appropriately passing nil for tests that don't exercise TLS functionality.

Also applies to: 225-225, 304-304

install/helm-repo/argocd-agent-agent/values.yaml (1)

136-162: LGTM: TLS and network policy configuration added.

The new redisTLS configuration block and networkPolicy settings are well-documented and use secure defaults (TLS enabled by default, insecure mode disabled), consistent with the broader TLS implementation.

install/helm-repo/argocd-agent-agent/README.md (1)

45-50: LGTM: Documentation updated for TLS configuration.

The documentation entries for redisTLS, networkPolicy, and tlsRootCAPath accurately reflect the corresponding values.yaml changes.

Also applies to: 68-72, 96-96

install/kubernetes/agent/agent-params-cm.yaml (1)

88-99: LGTM: Kubernetes manifest updated with Redis TLS configuration.

The three new Redis TLS configuration keys are properly documented with secure defaults and mount paths that align with the deployment configuration and Helm templates.

test/e2e/sync_test.go (1)

371-371: Verify hook name "before" matches the test fixture in test/data/pre-sync.

The hook Job name was changed from "pre-post-sync-before" to "before" at lines 371 and 466. Ensure the Job name in the test fixture corresponds to this value to maintain test data consistency.

Also applies to: 466-466

hack/dev-env/Procfile.e2e (1)

1-7: Procfile e2e Redis/server port-forwards and startup ordering look consistent.

Port mappings for Redis (6380/6381/6382) match the defaults used in the dev start scripts, and placing the port-forwards before principal/agent startup should ensure TLS-capable Redis endpoints are reachable when processes start.

agent/options.go (1)

111-133: Redis TLS AgentOptions cleanly mirror existing option pattern.

The new WithRedisTLSEnabled / WithRedisTLSCAPath / WithRedisTLSInsecure helpers follow the same style as the existing Redis options and provide straightforward wiring into redisProxyMsgHandler without changing other Agent behavior.

hack/dev-env/start-agent-managed.sh (1)

37-83: Managed agent Redis TLS and mTLS wiring look sound.

Enabling Redis TLS based on creds/redis-tls/ca.crt, defaulting the Redis address to localhost:6381 to match the port-forward, and extracting mTLS client cert/key/CA into /tmp for --tls-client-cert/--tls-client-key/--root-ca-path gives a coherent, reproducible dev/e2e setup.

internal/argocd/cluster/manager.go (1)

24-45: TLS config propagation into cluster cache is consistent with cache constructor.

Adding the *tls.Config parameter to NewManager and forwarding it to NewClusterCacheInstance cleanly wires Redis TLS into the cluster cache without altering other manager responsibilities.

Also applies to: 70-82

agent/inbound_redis.go (1)

20-24: Redis TLS client configuration in getRedisClientAndCache is robust and well-scoped.

Conditionally creating tls.Config (TLS 1.2+), warning and setting InsecureSkipVerify only when explicitly requested, and otherwise loading a CA from redisTLSCAPath (or falling back to system CAs with a warning) is a solid pattern for securing the Redis connection while keeping dev/e2e knobs available.

Also applies to: 51-55, 345-372

agent/agent.go (1)

323-349: LGTM! TLS configuration properly implemented.

The cluster cache TLS configuration correctly:

  • Creates TLS config with minimum TLS 1.2
  • Logs a warning for insecure mode (Line 330)
  • Loads and validates CA certificates with clear error messages
  • Passes the config to cluster cache initialization
hack/dev-env/configure-argocd-redis-tls.sh (1)

1-261: Well-structured TLS configuration script.

The script demonstrates good practices:

  • Idempotent operations with existence checks before patching
  • Clear user-facing messages and error handling
  • Proper scaling sequence (scale down/configure/scale up) to prevent connection errors during TLS transition
  • Context switching with cleanup on exit

Based on learnings, this is appropriate for E2E test environments.

test/e2e/redis_proxy_test.go (1)

120-123: Good resilience improvements for E2E tests.

The changes address race conditions and improve test reliability:

  • 5-second delay prevents pod deletion before Redis SUBSCRIBE is active (Lines 120-123)
  • Message draining logic ensures all available SSE messages are processed (Lines 188-208)
  • Retry logic for ResourceTree calls handles transient Redis EOF errors (Lines 211-237)
  • Buffered channel prevents message loss (Line 588)

These are appropriate enhancements for test stability.

Also applies to: 188-208, 211-237

install/kubernetes/agent/agent-deployment.yaml (1)

149-166: TLS configuration correctly integrated into agent deployment.

The manifest changes properly:

  • Add TLS environment variables from ConfigMap with optional: true
  • Mount the redis-tls-ca secret at /app/config/redis-tls
  • Mark both the volume and secret as optional to support non-TLS deployments

Also applies to: 193-211

test/e2e/fixture/cluster.go (2)

227-267: Excellent addition of Redis client caching.

The new getCachedCacheInstance and CleanupRedisCachedClients functions prevent connection leaks in E2E tests by reusing Redis clients across test operations. The mutex protection ensures thread-safety.


206-217: Appropriate timeout and pool configuration for E2E tests.

The Redis client configuration is well-tuned for port-forward latency:

  • Generous timeouts (DialTimeout: 10s, ReadTimeout: 30s)
  • Connection pooling (PoolSize: 10, retries: 3)
  • Idle connection management

These settings are suitable for test environments where network latency is higher.

principal/options.go (1)

492-548: Redis TLS options follow established patterns.

The new ServerOption functions are well-structured and consistent with existing options:

  • WithRedisTLSEnabled for feature toggle
  • WithRedisServerTLSFromPath and WithRedisServerTLSFromSecret for server TLS
  • WithRedisUpstreamTLSCAFromFile and WithRedisUpstreamTLSCAFromSecret for upstream CA
  • WithRedisUpstreamTLSInsecure for test scenarios

All use the standard option pattern and leverage existing tlsutil helpers.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 8b47b98 to 81c3402 Compare December 4, 2025 13:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
docs/configuration/redis-tls.md (1)

149-183: Tag remaining bare fenced code blocks with a language (markdownlint MD040)

There are still a few fenced blocks without a language, which markdownlint flags (MD040):

  • “How the tunnel works” ASCII diagram around line 150.
  • Script output examples for:
    • gen-redis-tls-certs.sh
    • configure-redis-tls.sh
    • configure-argocd-redis-tls.sh

To keep linters happy and clarify formatting, consider tagging them as plain text, e.g.:

-``` 
+```text
 Argo CD Server (remote vcluster) 
 ...
-``` 
+``` 

…and similarly for each of the script-output blocks.

Also applies to: 475-521

🧹 Nitpick comments (9)
principal/auth.go (1)

154-163: Trace logging in unaryAuthInterceptor is fine but consider toning down style

The added trace/Warn logs are helpful for debugging, but:

  • They partially duplicate information already logged inside authenticate, and
  • The emojis in log messages may not play well with log parsers or existing logging conventions.

Consider reusing a structured logger context (e.g., add module/client fields as elsewhere in this file) and dropping emojis to keep logs machine-friendly and consistent.

test/e2e/fixture/argoclient.go (1)

27-33: Env override for Argo CD server endpoint is useful; clarify expected format

Using ARGOCD_SERVER_ADDRESS as a fast path is handy for TLS-aware E2E runs and avoids an extra K8s call. Since NewArgoClient builds https:// URLs by treating endpoint as the Host, this env var should be host:port (no scheme).

Consider:

  • Documenting that expectation where this env var is set, and/or
  • Adding a lightweight sanity check (e.g., rejecting values starting with http:// or https://) to fail fast on misconfiguration.

Also applies to: 315-337

principal/listen.go (1)

172-199: New gRPC/WebSocket and service-registration logging is functionally safe

The added Info-level logs around WebSocket enablement, server startup, and service registration improve startup visibility without changing behavior.

One nit: as in auth.go, the emoji-heavy messages (🔧, etc.) might not fit all log ingestion/alerting setups. If you want to keep logs easily greppable and machine-friendly, consider switching to plain-text messages while retaining the same structure and fields.

Also applies to: 224-230

hack/dev-env/start-agent-autonomous.sh (1)

37-83: Autonomous agent Redis TLS wiring looks correct; consider temp-file cleanup

The script correctly:

  • Enables Redis TLS when creds/redis-tls/ca.crt exists and passes --redis-tls-enabled/--redis-tls-ca-path.
  • Defaults ARGOCD_AGENT_REDIS_ADDRESS to localhost:6382 for local E2E with a clear port-forward hint.
  • Extracts client cert/key and CA from Kubernetes secrets and passes them via --tls-client-cert/--tls-client-key/--root-ca-path.

For local dev/E2E this is fine as-is. If you want to tighten things slightly, you could switch the /tmp/... paths to mktemp files and register a trap to remove them on exit, so TLS materials don’t linger longer than necessary.

hack/dev-env/configure-argocd-redis-tls.sh (1)

52-216: Patching logic is pragmatic for dev/E2E; consider surfacing failures

The pattern of:

  • Checking for existing redis-tls-ca volumes/volumeMounts and --redis-use-tls args, and
  • Applying JSON patches with ... || true

gives you an idempotent script that won’t die if the manifests drift slightly, which is good for local/E2E usage.

One trade-off is that if a future manifest change causes a patch to fail (e.g., args or volumes arrays are removed/renamed), the script will silently skip adding TLS CA mounts/flags and you’ll only see failures later when components can’t talk to Redis.

Not urgent, but for easier debugging you might consider:

  • Logging a warning when a patch fails (e.g., capture stderr/stdout and echo a “could not patch X for Redis TLS” line), or
  • Tightening the presence checks (e.g., verifying args/volumes arrays exist) so failures are more explicit.

This would keep the script resilient while making TLS misconfigurations easier to diagnose.

hack/dev-env/start-agent-managed.sh (1)

37-83: Managed agent TLS and Redis address wiring look correct

The script correctly:

  • Enables Redis TLS when the dev CA is present and passes the CA path via --redis-tls-* flags.
  • Defaults the Redis address to localhost:6381 (aligned with the Procfile port-forward) while allowing override via ARGOCD_AGENT_REDIS_ADDRESS.
  • Extracts the agent mTLS cert/key and CA from Kubernetes secrets and injects them into the agent flags.

This matches the documented E2E flow and ensures proper certificate validation over the localhost port‑forward, while still allowing non‑TLS operation in ad‑hoc dev setups.

test/e2e/fixture/fixture.go (1)

107-155: Fixture cleanup and Redis-backed cluster info reset are safer and more robust

  • The bounded deletion loops and the WaitForDeletion polling remain clear and avoid unbounded waits.
  • Switching to DeepCopy() for applications and AppProjects before mutating namespace/name prevents subtle bugs caused by reusing the range loop variable.
  • The new resetManagedAgentClusterInfo helper, invoked at the end of CleanUp, ensures the managed agent’s cluster info in Redis is reset between tests, and the choice to log (rather than fail) when Redis is unavailable is appropriate for E2E teardown.

Also applies to: 218-266, 294-357, 457-471

agent/agent.go (1)

443-460: Ensure cacheRefreshInterval is always positive before starting the cluster cache info ticker

The new goroutine that sends initial and periodic cluster cache info updates for both managed and autonomous agents is a good consolidation of behaviour. However, time.NewTicker(a.cacheRefreshInterval) will panic if cacheRefreshInterval is zero or negative, so it’s important that:

  • a.cacheRefreshInterval is always initialized to a positive duration via options or defaults before Start is called, or
  • a defensive check is added here to guard against an uninitialized value.
hack/dev-env/start-e2e.sh (1)

19-48: Consider removing unused helper function.

The getExternalLoadBalancerIP function is no longer called after switching to localhost-based addresses. While it may have future utility, removing unused code improves maintainability.

Apply this diff to remove the unused function:

-# getExternalLoadBalancerIP will set EXTERNAL_IP with the load balancer hostname from the specified Service
-getExternalLoadBalancerIP() {
-  SERVICE_NAME=$1
-
-  MAX_ATTEMPTS=120
-
-  for ((i=1; i<=MAX_ATTEMPTS; i++)); do
-    
-    echo ""
-    EXTERNAL_IP=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
-    EXTERNAL_HOST=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
-
-    if [ -n "$EXTERNAL_IP" ]; then
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_IP"
-      break
-    elif [ -n "$EXTERNAL_HOST" ]; then
-      echo "External host for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_HOST"
-      EXTERNAL_IP=$EXTERNAL_HOST
-      break
-    else
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT not yet available, attempting again in 5 seconds..."
-      sleep 5
-    fi
-  done
-
-  if [ $i -gt $MAX_ATTEMPTS ]; then
-    echo "Failed to obtain external IP after $MAX_ATTEMPTS attempts."
-    exit 1
-  fi
-
-}
-
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b47b98 and 81c3402.

📒 Files selected for processing (29)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • principal/tracker/tracking.go
  • test/e2e/sync_test.go
  • test/run-e2e.sh
  • test/e2e/redis_proxy_test.go
  • principal/resource.go
  • test/e2e/clusterinfo_test.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • cmd/argocd-agent/principal.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • Makefile
  • test/e2e/rp_test.go
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-agent-managed.sh
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/e2e/README.md
🧬 Code graph analysis (7)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
principal/listen.go (4)
internal/logging/logging.go (2)
  • Info (295-297)
  • Warn (300-302)
pkg/api/grpc/authapi/auth_grpc.pb.go (1)
  • RegisterAuthenticationServer (83-85)
pkg/api/grpc/versionapi/version_grpc.pb.go (1)
  • RegisterVersionServer (69-71)
pkg/api/grpc/eventstreamapi/eventstream_grpc.pb.go (1)
  • RegisterEventStreamServer (144-146)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
test/e2e/fixture/fixture.go (1)
test/e2e/fixture/cluster.go (3)
  • ClusterDetails (42-56)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
principal/auth.go (1)
internal/logging/logging.go (2)
  • Trace (285-287)
  • Warn (300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

test/e2e/README.md

32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (22)
Makefile (1)

59-79: Redis TLS setup in setup-e2e looks consistent with the E2E story

The new block cleanly wires the TLS cert generation and per-cluster Redis/Argo CD TLS configuration into make setup-e2e, matching the “TLS required for E2E” design. No functional issues from the Makefile side.

docs/getting-started/kubernetes/index.md (1)

159-230: Redis TLS steps in the Kubernetes getting-started guide are consistent and clear

The new Sections 2.4 and 4.4 cleanly walk through:

  • Generating CA/server certs,
  • Creating the shared argocd-redis-tls secret,
  • Patching argocd-redis to enable TLS-only,
  • Verifying with redis-cli --tls,

and mirror the control-plane vs workload-cluster story correctly. The cross-link to the dedicated “Redis TLS Configuration” doc ties it together nicely.

No issues from a correctness or usability standpoint.

Also applies to: 337-381, 646-646

test/e2e/README.md (1)

21-107: E2E flow and Redis TLS documentation are coherent and aligned with the scripts

The new multi‑step flow (setup, optional reverse tunnel, start principal/agents, run tests) together with the Redis TLS section matches the dev scripts and TLS wiring (make setup-e2e, reverse-tunnel, Redis TLS cert/config scripts, and port‑forwards). The note about InsecureSkipVerify being limited to the test fixture while agents/principal do full TLS validation is clear and appropriate for E2E usage.

hack/dev-env/Procfile.e2e (1)

1-7: Procfile-based Redis and Argo CD port-forwards look consistent with the E2E flow

The added pf-* entries correctly establish Redis port-forwards for the three vclusters and an Argo CD server port-forward, and the staggered startup of principal and agents fits with the new TLS/localhost-based Redis configuration. Note that the principal script also starts a Redis port-forward by default; see the comment on hack/dev-env/start-principal.sh to avoid double port-forwarding on localhost:6380.

test/e2e/rp_test.go (1)

161-245: Fixture helpers for Argo endpoint and admin secret are a solid cleanup

Switching to fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret, and then building the client via fixture.NewArgoClient, removes duplicate K8s plumbing in the tests and aligns them with the TLS-aware endpoint discovery used elsewhere. The resulting Argo login and application flows are unchanged and easier to maintain.

Also applies to: 294-307

hack/dev-env/gen-redis-tls-certs.sh (1)

1-150: Redis TLS cert generation is robust and idempotent

The script cleanly generates a CA plus per‑role Redis certificates with appropriate SANs (including localhost/loopback), skips regeneration when artifacts exist, conditionally adds the local IP, and cleans up temporary files. With set -e and no stderr suppression on OpenSSL commands, failures will be surfaced instead of silently ignored.

hack/dev-env/start-principal.sh (1)

58-76: Redis TLS argument construction for principal is consistent with the dev CA layout

The TLS detection block correctly checks for the proxy cert/key and CA in creds/redis-tls and, when present, passes them via --redis-tls-enabled, --redis-server-tls-cert/--redis-server-tls-key, and --redis-upstream-ca-path. This lines up with the certs generated by gen-redis-tls-certs.sh and ensures proper validation for both localhost port‑forward and reverse‑tunnel scenarios.

agent/agent.go (1)

17-24: Cluster cache Redis client now correctly honors Redis TLS settings

The new clusterCacheTLSConfig construction mirrors the Redis proxy’s TLS behaviour:

  • When Redis TLS is enabled, the cluster cache client enforces TLS 1.2+.
  • If redisTLSInsecure is set, a clear warning is logged and certificate verification is disabled.
  • Otherwise, when redisTLSCAPath is provided, the CA bundle is loaded and set as RootCAs, and any read/parse errors fail agent construction with an explicit error.

Passing this TLS config into cluster.NewClusterCacheInstance ensures the cluster cache uses the same secure Redis connection settings as the proxy.

Also applies to: 323-346

internal/argocd/cluster/cluster.go (2)

176-192: LGTM! TLS configuration properly integrated.

The TLS configuration is correctly passed through to the Redis client options. The tlsConfig parameter allows for optional TLS (nil is acceptable for non-TLS connections), and the Redis client will handle nil TLSConfig appropriately.


135-142: Connection state initialization is appropriate.

The fallback initialization when no existing ConnectionState is present correctly sets a Successful status with a timestamp. This ensures the cluster appears connected when cache stats are first received from an agent, which is the expected behavior.

hack/dev-env/start-e2e.sh (1)

50-59: Well-structured E2E test configuration.

The localhost-based Redis addresses are appropriate for TLS certificate validation in E2E tests, and the REDIS_PASSWORD retrieval is now correctly split into separate assignment and export to avoid masking kubectl errors (addressing the previous shellcheck warning).

test/e2e/fixture/cluster.go (4)

182-201: TLS configuration appropriate for E2E tests.

The use of InsecureSkipVerify: true is intentional for E2E tests to accommodate dynamic LoadBalancer addresses, as noted in the PR description. The TLS encryption is still enabled, which is the primary security goal.


206-217: Well-tuned Redis client configuration for E2E.

The generous timeouts, connection pool sizing, and retry configuration are appropriate for E2E test scenarios with port-forwarding latency. The settings balance test reliability with resource usage.


227-267: Effective connection leak prevention.

The caching mechanism with mutex protection prevents connection leaks across test runs. The cache key design (source + address) correctly handles multiple Redis instances, and the CleanupRedisCachedClients function enables proper test teardown.


154-165: Helpful debugging additions.

The added log statements provide useful context for troubleshooting E2E test failures, especially when investigating TLS-enabled Redis connectivity issues.

hack/dev-env/configure-redis-tls.sh (4)

61-66: Certificate validation is complete.

All required TLS certificate files are now validated (server certificate, key, and CA), addressing previous review feedback. The error message clearly guides users to run the certificate generation script.


81-122: Excellent transition strategy to prevent SSL errors.

Scaling down Argo CD components before enabling Redis TLS prevents connection errors during the transition. Saving replica counts to a ConfigMap for restoration by the configure-argocd-redis-tls.sh script is a clean approach to maintaining the desired deployment state.


138-196: Robust idempotent patching logic.

The script correctly checks for existing volumes and volumeMounts before patching, and handles both empty/null arrays and existing arrays. The JSON patch operations are well-structured with proper error handling.


199-215: Verify TLS configuration flags for Redis.

The Redis TLS configuration uses --tls-auth-clients no (line 212), which disables client certificate authentication. Confirm this is intentional for the E2E environment and aligns with how Redis clients in the codebase are configured.

principal/redisproxy/redisproxy.go (3)

130-154: TLS configuration builder handles both cert sources correctly.

The function properly supports both file-based and in-memory certificates with appropriate error handling. Setting MinVersion to TLS 1.2 is a reasonable baseline for compatibility while maintaining security.


157-183: TLS listener setup is well-structured.

The TLS-enabled listener creation with fallback to plain TCP is correctly implemented. The logging clearly indicates whether TLS is enabled, which aids troubleshooting.


836-897: Upstream TLS connection handling is thorough.

The method properly:

  • Establishes the TCP connection first
  • Conditionally wraps with TLS based on configuration
  • Handles CA from pool, CA from file, and insecure mode
  • Extracts SNI correctly from the address
  • Performs the TLS handshake
  • Cleans up the connection on errors

The warning log for InsecureSkipVerify (Line 859) is good practice.

Comment on lines +224 to +310
REPO_SERVER_REPLICAS=$(kubectl get configmap argocd-redis-tls-replicas -n ${NAMESPACE} -o jsonpath='{.data.repo-server}' 2>/dev/null || echo "1")
CONTROLLER_REPLICAS=$(kubectl get configmap argocd-redis-tls-replicas -n ${NAMESPACE} -o jsonpath='{.data.application-controller}' 2>/dev/null || echo "1")
SERVER_REPLICAS=$(kubectl get configmap argocd-redis-tls-replicas -n ${NAMESPACE} -o jsonpath='{.data.server}' 2>/dev/null || echo "1")

# Ensure we have at least 1 replica
[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replica guard logic does not actually enforce “at least 1 replica”

The intent here is clear:

# Ensure we have at least 1 replica
[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
...

But in POSIX shells cmd1 || cmd2 && cmd3 is parsed as cmd1 || (cmd2 && cmd3). That means:

  • If REPO_SERVER_REPLICAS="0", the first test succeeds and short-circuits, and the assignment never runs.
  • Only the “empty string” case ends up being forced to "1".

So if the argocd-redis-tls-replicas ConfigMap ever contains "0" (e.g., due to a prior scaling step or a partial setup), this script will happily scale the components back to 0 replicas instead of “at least 1”, contrary to the comment and to what you probably want for E2E.

A simple, explicit if is safer and clearer:

-# Ensure we have at least 1 replica
-[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
-[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
-[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
+# Ensure we have at least 1 replica
+if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then
+  REPO_SERVER_REPLICAS="1"
+fi
+if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then
+  CONTROLLER_REPLICAS="1"
+fi
+if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then
+  SERVER_REPLICAS="1"
+fi

This matches the comment and guarantees that, in the “unknown/zero” cases, the components get scaled back up so Redis TLS configuration can actually be exercised.

🤖 Prompt for AI Agents
In hack/dev-env/configure-argocd-redis-tls.sh around lines 224 to 231, the
current guard using '[ cond ] || [ cond ] && VAR="1"' is parsed as 'cond1 ||
(cond2 && assign)' so a "0" value will not be corrected; replace each compound
expression with an explicit POSIX if statement, e.g. if [
"$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ]; then
REPO_SERVER_REPLICAS="1"; fi (and do the same for CONTROLLER_REPLICAS and
SERVER_REPLICAS) so that both "0" and empty values are reliably set to "1".

Comment on lines +853 to 894
if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
tlsConfig := &tls.Config{
MinVersion: tls.VersionTLS12,
}

if rp.upstreamTLSInsecure {
logCtx.Warn("INSECURE: Not verifying upstream Redis TLS certificate")
tlsConfig.InsecureSkipVerify = true
} else if rp.upstreamTLSCA != nil {
tlsConfig.RootCAs = rp.upstreamTLSCA
logCtx.Trace("Using provided CA certificate pool for upstream Redis TLS")
} else if rp.upstreamTLSCAPath != "" {
caCert, err := os.ReadFile(rp.upstreamTLSCAPath)
if err != nil {
conn.Close()
return nil, fmt.Errorf("failed to read CA certificate: %w", err)
}
caCertPool := x509.NewCertPool()
if !caCertPool.AppendCertsFromPEM(caCert) {
conn.Close()
return nil, fmt.Errorf("failed to append CA certificate")
}
tlsConfig.RootCAs = caCertPool
logCtx.Debugf("Using CA certificate from %s for upstream Redis TLS", rp.upstreamTLSCAPath)
}

// Extract hostname from address for SNI
hostname := rp.principalRedisAddress
if h, _, err := net.SplitHostPort(rp.principalRedisAddress); err == nil {
hostname = h
}
tlsConfig.ServerName = hostname

tlsConn := tls.Client(conn, tlsConfig)
if err := tlsConn.Handshake(); err != nil {
conn.Close()
return nil, fmt.Errorf("TLS handshake failed: %w", err)
}

logCtx.Trace("Established TLS connection to upstream Redis")
return tlsConn, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure TLS is required for upstream when server TLS is enabled.

The upstream TLS connection is only established when rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) (Line 853). This means if the Redis proxy server has TLS enabled but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP. This could expose sensitive data in transit within the cluster.

Consider enforcing upstream TLS when server TLS is enabled, or at minimum, log a warning when this configuration mismatch occurs:

 	// If TLS is enabled for upstream, wrap the connection with TLS
-	if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+	hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure
+	
+	if rp.tlsEnabled && !hasUpstreamTLSConfig {
+		logCtx.Warn("Redis proxy server has TLS enabled, but no upstream TLS configuration provided. Connection to principal Redis will be unencrypted.")
+	}
+	
+	if rp.tlsEnabled && hasUpstreamTLSConfig {
 		tlsConfig := &tls.Config{

Committable suggestion skipped: line range outside the PR's diff.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 81c3402 to 4a2ee37 Compare December 4, 2025 14:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
install/helm-repo/argocd-agent-agent/values.schema.json (1)

302-330: Type inconsistency persists: redisTLS.enabled/insecure vs networkPolicy.enabled use different schema patterns.

The redisTLS.enabled and redisTLS.insecure fields use anyOf with string enum ["true", "false"] plus boolean type, while networkPolicy.enabled uses a simple boolean type. This creates inconsistency despite a prior review comment marking this as addressed.

Recommendation: Either standardize all boolean flags to use type: "boolean" for clarity, or document why string enums are needed (likely for environment variable compatibility in templates). The pattern already exists for tlsClientInSecure (line 152–158), so the choice should be applied consistently across all similar fields.

Also applies to: 337-340

cmd/argocd-agent/principal.go (1)

272-287: Incomplete mutual exclusivity validation for upstream TLS modes.

The validation at lines 273-275 only checks --redis-upstream-tls-insecure against --redis-upstream-ca-path, but there are three mutually exclusive upstream TLS modes:

  1. --redis-upstream-tls-insecure (skip verification)
  2. --redis-upstream-ca-path (CA from file)
  3. --redis-upstream-ca-secret-name (CA from secret — has default value)

If a user specifies --redis-upstream-tls-insecure=true without also explicitly setting the CA secret name to empty, the insecure mode silently wins over the default secret. This behavior may be intentional, but it differs from the explicit validation done for insecure vs CA path.

Consider whether the current validation is sufficient for your use case, or if you need to also check for explicit user-provided --redis-upstream-ca-secret-name values that conflict with insecure mode.

hack/dev-env/configure-argocd-redis-tls.sh (1)

228-231: Replica guard logic still has operator precedence issue.

The shell operator precedence means [ "$X" = "0" ] || [ -z "$X" ] && X="1" is parsed as cond1 || (cond2 && assign). When REPO_SERVER_REPLICAS="0", the first test succeeds and short-circuits, so the assignment never runs.

Apply this fix:

 # Ensure we have at least 1 replica
-[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
-[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
-[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
+if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then
+  REPO_SERVER_REPLICAS="1"
+fi
+if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then
+  CONTROLLER_REPLICAS="1"
+fi
+if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then
+  SERVER_REPLICAS="1"
+fi
hack/dev-env/start-principal.sh (1)

23-29: Defaulting Redis address and delegating port-forward to Procfile is correct

Using localhost:6380 as the default ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS and leaving the actual port‑forward to Procfile (or manual kubectl port-forward) cleanly resolves the earlier conflict and keeps this script focused on principal startup.

principal/redisproxy/redisproxy.go (1)

836-897: Avoid silently downgrading upstream Redis to plaintext when server TLS is enabled

With the current condition:

if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
    // wrap conn in TLS
}

if the proxy server has TLS enabled but no upstream TLS config is provided, the upstream connection stays unencrypted. That’s a surprising and weaker posture for a “Redis TLS by default” setup, and can leak data in‑cluster while clients believe they’re on a fully‑TLS path.

Recommend at least logging a clear warning when rp.tlsEnabled is true but no upstream TLS config is present, and strongly consider enforcing TLS (e.g., treat that configuration as an error) so misconfiguration is caught early.

For example:

hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure

if rp.tlsEnabled && !hasUpstreamTLSConfig {
    logCtx.Warn("Redis proxy TLS is enabled but no upstream Redis TLS configuration is set; upstream traffic will be plaintext")
}

if rp.tlsEnabled && hasUpstreamTLSConfig {
    // current TLS wrapping logic
}
🧹 Nitpick comments (11)
principal/auth.go (1)

154-165: LGTM! Observability improvements for auth flow.

The trace logging additions improve visibility into the authentication interceptor's decision path. The emoji markers (🔵🟢🟡🔴) provide visual cues for different paths, and the log levels are appropriate (Trace for flow, Warn for failures).

Note: While these changes are orthogonal to the main PR objective (Redis TLS), they're valuable observability improvements.

Optional: Consider whether emoji markers in logs might cause issues with log aggregation or parsing systems in your environment. If so, you could replace them with text prefixes like [RECV], [NOAUTH], [AUTH_REQ], [AUTH_FAIL]. However, since these are Trace-level logs (typically disabled in production), the risk is minimal.

principal/listen.go (2)

174-199: Normalize WebSocket / gRPC startup & shutdown logging

The added logs help clarify which mode is used, but there are a couple of polish points:

  • Log messages have leading spaces (" WebSocket is ENABLED...", " gRPC server.Serve() exited"), which will look odd and make grepping harder.
  • The emoji in Line 174 may be inconsistent with the rest of the project’s logging style.
  • In the WebSocket branch you now log startup (Line 186) but not shutdown, while in the gRPC branch you log both startup and exit (Lines 194–197). For symmetry and debugging, consider adding a WithError(err) log after ServeTLS returns as well, and possibly downgrading the exit log to Debug or only warning on unexpected errors.

These are non-blocking, but tightening them up would keep logs cleaner and more consistent.


224-231: Reassess verbosity and level of new gRPC service registration logs

The per-service Info logs make startup more transparent, but four Info-level lines here may become noisy in larger deployments:

  • Consider either collapsing into a single Info message listing all registered services, or moving the detailed per-service logs to Debug.
  • The initial “Registering gRPC services on principal” message (Line 224) is useful; the three “... registered successfully” lines could be demoted if log volume is a concern.

No functional issues, just a suggestion to balance observability vs log noise.

hack/dev-env/start-e2e.sh (1)

19-48: Consider removing unused getExternalLoadBalancerIP function.

This function is no longer called in the script since the switch to static localhost addresses. Dead code increases maintenance burden.

-# getExternalLoadBalancerIP will set EXTERNAL_IP with the load balancer hostname from the specified Service
-getExternalLoadBalancerIP() {
-  SERVICE_NAME=$1
-
-  MAX_ATTEMPTS=120
-
-  for ((i=1; i<=MAX_ATTEMPTS; i++)); do
-    
-    echo ""
-    EXTERNAL_IP=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
-    EXTERNAL_HOST=$(kubectl get svc $SERVICE_NAME $K8S_CONTEXT $K8S_NAMESPACE -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
-
-    if [ -n "$EXTERNAL_IP" ]; then
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_IP"
-      break
-    elif [ -n "$EXTERNAL_HOST" ]; then
-      echo "External host for $SERVICE_NAME on $K8S_CONTEXT is $EXTERNAL_HOST"
-      EXTERNAL_IP=$EXTERNAL_HOST
-      break
-    else
-      echo "External IP for $SERVICE_NAME on $K8S_CONTEXT not yet available, attempting again in 5 seconds..."
-      sleep 5
-    fi
-  done
-
-  if [ $i -gt $MAX_ATTEMPTS ]; then
-    echo "Failed to obtain external IP after $MAX_ATTEMPTS attempts."
-    exit 1
-  fi
-
-}
hack/dev-env/configure-argocd-redis-tls.sh (1)

29-31: Consider using --context flag instead of switching global context.

Using kubectl config use-context modifies the user's kubeconfig globally, which could cause issues if the script is interrupted or if parallel operations are running. Consider using kubectl --context=${CONTEXT} for each command instead.

-# Switch context
-echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+# Use context flag for all kubectl commands instead of switching globally
+KUBECTL="kubectl --context=${CONTEXT}"

Then replace all kubectl calls with ${KUBECTL}.

test/e2e/fixture/cluster.go (1)

259-267: CleanupRedisCachedClients should explicitly close Redis connections.

The cleanup function only clears the map and relies on garbage collection. Redis clients should be explicitly closed to release connections immediately and avoid potential resource leaks during test suite execution.

 // CleanupRedisCachedClients closes all cached Redis clients (should be called at end of test suite)
 func CleanupRedisCachedClients() {
 	cachedRedisClientMutex.Lock()
 	defer cachedRedisClientMutex.Unlock()

 	fmt.Printf("Cleaning up %d cached Redis clients\n", len(cachedRedisClients))
+	// Note: appstatecache.Cache doesn't expose Close() method, so we rely on GC
+	// If connection leaks become an issue, consider storing the underlying redis.Client
+	// separately to enable explicit Close() calls
 	// Clear the cache map - connections will be garbage collected
 	cachedRedisClients = make(map[string]*appstatecache.Cache)
 }

Alternatively, if the underlying redis.Client can be stored separately, implement explicit closure:

// Store both cache and client for proper cleanup
type cachedRedisEntry struct {
    cache  *appstatecache.Cache
    client *redis.Client
}
principal/redisproxy/redisproxy.go (1)

65-154: TLS server configuration looks sound; consider preloading CA if needed later

The added TLS fields and createServerTLSConfig correctly handle both file-based and in‑memory cert+key, and enforce TLS 1.2+. If this proxy ever becomes connection‑heavy, you might later consider preloading / reusing cert material (rather than rebuilding tls.Certificate from fields on each start), but it’s not required for current usage.

test/e2e/fixture/fixture.go (2)

229-291: Treating cleanup failures as warnings is appropriate for E2E tests

The new fmt.Printf("Warning: ...") paths during application/AppProject cleanup ensure teardown issues (especially on remote/slow clusters) don’t cascade into hard test failures. That’s a good trade‑off for E2E stability.

Also applies to: 269-291, 295-357, 372-373


457-471: Guard resetManagedAgentClusterInfo against nil clusterDetails

resetManagedAgentClusterInfo assumes clusterDetails is non‑nil. That’s true when called via BaseSuite, but CleanUp is exported and could be invoked with a nil pointer elsewhere, leading to a panic when getCachedCacheInstance dereferences it.

Consider a light guard:

func resetManagedAgentClusterInfo(clusterDetails *ClusterDetails) error {
    if clusterDetails == nil {
        return nil
    }
    if err := getCachedCacheInstance(AgentManagedName, clusterDetails).
        SetClusterInfo(AgentClusterServerURL, &argoapp.ClusterInfo{}); err != nil {
        return fmt.Errorf("resetManagedAgentClusterInfo: %w", err)
    }
    return nil
}

Optionally, if you have a CleanupRedisCachedClients helper, calling it from CleanUp after resetManagedAgentClusterInfo would fully reset Redis client state between tests.

hack/dev-env/start-agent-managed.sh (1)

37-62: Consider failing fast when Redis TLS certs are missing in TLS-only setups

The script correctly enables Redis TLS when creds/redis-tls/ca.crt exists and wires --redis-tls-enabled/--redis-tls-ca-path into the agent command, with a sensible default localhost:6381 address for the port‑forward.

Given the rest of the dev/E2E setup now configures Redis as TLS‑only by default, the "running without TLS" fallback path is likely to just produce connection errors later. You might consider turning the “certificates not found” case into a hard failure (or at least a stronger warning) in the e2e flow so misconfigured environments are surfaced early.

Also applies to: 48-62, 63-75, 76-83

hack/dev-env/start-principal.sh (1)

44-62: TLS wiring for principal looks good; consider stricter handling when certs are absent

The detection of redis-proxy.{crt,key} and ca.crt under creds/redis-tls and construction of:

--redis-tls-enabled=true
--redis-server-tls-cert=...
--redis-server-tls-key=...
--redis-upstream-ca-path=...

is consistent with the documented principal Redis TLS options.

Similar to the managed-agent script, now that dev/E2E flows configure Redis as TLS‑only by default, you might want to treat the “certificates not found, running without TLS” branch as a hard failure (or at least a very loud warning) so misconfigured environments don’t just fail later with opaque connection errors.

Also applies to: 64-71

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81c3402 and 4a2ee37.

📒 Files selected for processing (29)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/auth.go (1 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • hack/dev-env/gen-redis-tls-certs.sh
  • principal/resource.go
  • test/e2e/rp_test.go
  • test/run-e2e.sh
  • test/e2e/fixture/argoclient.go
  • hack/dev-env/start-agent-autonomous.sh
  • test/e2e/clusterinfo_test.go
  • docs/getting-started/kubernetes/index.md
  • Makefile
  • test/e2e/redis_proxy_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/README.md
  • hack/dev-env/configure-argocd-redis-tls.sh
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-e2e.sh
  • hack/dev-env/start-agent-managed.sh
🧬 Code graph analysis (4)
hack/dev-env/configure-argocd-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (1)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
principal/auth.go (1)
internal/logging/logging.go (2)
  • Trace (285-287)
  • Warn (300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (21)
install/helm-repo/argocd-agent-agent/values.schema.json (1)

332-382: Schema structure for networkPolicy.redis is well-formed and permissive for selectors.

The nested object structure correctly allows additionalProperties: true on agentSelector and redisSelector, which is appropriate for Kubernetes label selectors that may include custom labels beyond the documented app.kubernetes.io/name key.

cmd/argocd-agent/principal.go (3)

89-98: LGTM!

The Redis TLS configuration variables are well-organized with a clear comment header, follow consistent naming conventions, and use appropriate types.


420-422: Verify: Redis TLS default value vs PR objective.

The PR title states "redis TLS encryption enabled by default for all connections", but --redis-tls-enabled defaults to false here.

If the intent is for Redis TLS to be enabled by default in production, verify that this is handled in the Helm charts or Kubernetes manifests rather than the CLI defaults. If the CLI should also default to TLS enabled, this would need to change to true.


471-471: Significant timeout increase noted.

The secret retrieval timeout increased from 2 seconds to 30 seconds. This improves reliability for slow Kubernetes API responses, but extends the startup failure time if secrets are misconfigured or unavailable. The tradeoff seems reasonable for production environments.

test/e2e/sync_test.go (1)

371-373: Pre-sync hook Job name alignment looks good

Updating the hook Job name to "before" in both tests keeps the termination assertions intact, assuming the manifest in test/data/pre-sync uses the same name. No issues from the test logic perspective.

Also applies to: 465-468

principal/tracker/tracking.go (1)

75-78: Appropriate concurrency fix for request-response pattern.

The buffered channel (capacity 1) correctly prevents deadlock when the sender and receiver operate asynchronously in goroutines. This is the standard pattern for 1:1 request-response scenarios where exactly one response is expected per tracked request.

Verify these assumptions in code review:

  1. Each tracked event receives at most one response (no multiple sends to the same channel)
  2. StopTracking is always called to close the channel and prevent resource leaks
  3. The sender handles scenarios where the channel might be closed before sending
agent/agent.go (2)

323-343: TLS configuration for cluster cache looks well-structured.

The TLS configuration properly:

  • Sets minimum TLS version to 1.2
  • Logs a warning for insecure mode (addressing the previous review comment)
  • Loads and validates CA certificates when a path is provided
  • Returns clear error messages on failure

445-460: Improved startup logic for cluster cache info updates.

Sending an initial update immediately on startup (before waiting for the first ticker interval) improves the time-to-first-sync. The unified code path for both managed and autonomous modes simplifies maintenance.

hack/dev-env/start-e2e.sh (1)

50-59: Static localhost addresses and fixed REDIS_PASSWORD handling look good.

The switch to localhost-based addresses for TLS certificate validation is appropriate for E2E tests. The REDIS_PASSWORD retrieval is now correctly separated into declaration and export (addressing the previous shellcheck warning).

internal/argocd/cluster/cluster.go (2)

135-142: Good defensive initialization of ConnectionState.

Initializing ConnectionState when it doesn't exist yet prevents nil-related issues and provides meaningful status for newly connected agents. The timestamp uses time.Now() which is appropriate since this represents the moment the cache stats update was received.


176-184: TLS configuration properly wired to Redis client.

The tlsConfig parameter is correctly passed through to the Redis client options. This follows the pattern established in the relevant code snippet and integrates cleanly with the existing cache creation logic.

hack/dev-env/configure-argocd-redis-tls.sh (1)

56-70: Idempotency checks and patching pattern looks reasonable for E2E/dev use.

The script properly checks for existing configuration before applying patches, preventing duplicate volumes/mounts/args. The 2>/dev/null || true pattern handles edge cases gracefully for a development script.

test/e2e/fixture/cluster.go (3)

206-217: Generous timeouts and connection pool settings are appropriate for E2E tests.

The extended timeouts (10s dial, 30s read) and retry configuration help handle port-forward latency and test environment variability. The pool size of 10 with min/max idle settings is reasonable for concurrent test load.


180-201: InsecureSkipVerify is acceptable for E2E tests with appropriate comment.

The comment clearly documents that this is for E2E test simplicity. For production code, CA certificate validation would be required (which is implemented elsewhere in this PR).


320-326: Good use of environment variable overrides for local development.

The MANAGED_AGENT_REDIS_ADDR and ARGOCD_PRINCIPAL_REDIS_SERVER_ADDRESS environment variables allow developers to use port-forwarding with localhost addresses while the production code uses service discovery. This aligns with the static localhost addresses exported in start-e2e.sh.

Also applies to: 380-386

principal/redisproxy/redisproxy.go (2)

159-183: TLS listener startup branching is correct

The Start method cleanly switches between tls.Listen and plain net.Listen based on rp.tlsEnabled, with appropriate error logging and success messages; this matches the new TLS configuration surface.


221-270: Connection handling change to use method receiver is fine

Switching handleConnection to call rp.establishConnectionToPrincipalRedis (method receiver) instead of a standalone function keeps Redis proxy state encapsulated without altering behavior.

test/e2e/fixture/fixture.go (1)

109-155: Bounded deletion waits improve test robustness

Capping EnsureDeletion and WaitForDeletion at 120×1s iterations gives deterministic test-time behavior and avoids potential infinite waits on stuck resources; the structure of the retry loops looks correct.

Also applies to: 160-171

test/e2e/README.md (1)

21-108: E2E flow and Redis TLS documentation are clear and consistent

The restructured steps (environment setup, optional reverse tunnel, start processes, run tests) and the explicit Redis TLS section align well with the new scripts and Procfile; the notes about InsecureSkipVerify being test‑fixture only are also clear.

hack/dev-env/Procfile.e2e (1)

1-7: Centralizing port-forwards in Procfile avoids conflicts

Having pf-* entries own the Redis and argocd-server port‑forwards and starting principal/agents afterward resolves the previous “double port‑forward on 6380” issue and gives a clear, reproducible startup model for make start-e2e.

hack/dev-env/configure-redis-tls.sh (1)

1-246: Redis TLS configuration script is robust and idempotent for dev/E2E use

The script cleanly validates cert material, preserves/restores kube context, scales down dependent Argo CD components, creates the argocd-redis-tls secret, patches the deployment with volumes/mounts, and replaces args to enable TLS‑only on 6379. The volume/mount existence checks make it safe to re‑run, which is valuable during iterative E2E setup.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from 4a2ee37 to dd9cf85 Compare December 4, 2025 15:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
docs/configuration/redis-tls.md (1)

149-156: Tag remaining fenced blocks with a language to satisfy markdownlint.

As flagged in a previous review, these code blocks still need language tags. Apply text to the diagram and script output blocks.

- **How the tunnel works:**
-  ```
+**How the tunnel works:**
+  ```text
   Argo CD Server (remote vcluster) 
       → rathole Deployment (remote) 
       → rathole Container (local Mac) 
       → Principal process (local Mac)
-  ```
+  ```

The same fix applies to lines 475, 486, and 504.

🧹 Nitpick comments (3)
test/e2e/redis_proxy_test.go (1)

120-124: The hardcoded sleep is a pragmatic workaround, but consider documenting the root cause.

The 5-second delay to wait for Redis SUBSCRIBE propagation is a reasonable workaround for the race condition. The comment explains the issue well.

If this race condition is specific to the test setup, it might be worth adding a TODO to investigate whether the subscription can be verified more deterministically in the future:

 	// Wait for SSE stream to fully establish and Redis SUBSCRIBE to propagate
 	// This prevents a race condition where the pod is deleted before the subscription is active
+	// TODO: Consider implementing a more deterministic check for subscription readiness
 	t.Log("Waiting for SSE stream to fully establish...")
 	time.Sleep(5 * time.Second)
principal/redisproxy/redisproxy.go (1)

853-894: Consider logging a warning when server TLS is enabled but upstream TLS is not configured.

The upstream TLS connection is only established when rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) (line 853). This means if the Redis proxy server has TLS enabled for incoming connections but no upstream TLS configuration is provided, it will connect to the principal's Redis over plain TCP within the cluster.

While this may be intentional for some deployment scenarios (e.g., trusting internal cluster network), it creates an inconsistent security posture that operators should be aware of.

Consider adding a warning log when this configuration mismatch occurs:

+	// Warn if server TLS is enabled but no upstream TLS is configured
+	hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure
+	if rp.tlsEnabled && !hasUpstreamTLSConfig {
+		logCtx.Warn("Redis proxy server has TLS enabled for incoming connections, but upstream connection to principal Redis will be unencrypted")
+	}
+
 	// If TLS is enabled for upstream, wrap the connection with TLS
-	if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+	if rp.tlsEnabled && hasUpstreamTLSConfig {

This improves operator awareness without enforcing a specific security policy.

hack/dev-env/configure-redis-tls.sh (1)

199-215: Add error handling to args patch.

The args patch lacks error checking (||exit 1) present in other patches (lines 153, 162, 183, 192). If this critical patch fails, the deployment won't have TLS configured, but the script reports success.

Add error handling to the args patch:

 kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
   {
     "op": "replace",
     "path": "/spec/template/spec/containers/0/args",
     "value": [
       "--save", "",
       "--appendonly", "no",
       "--requirepass", "$(REDIS_PASSWORD)",
       "--tls-port", "6379",
       "--port", "0",
       "--tls-cert-file", "/app/tls/tls.crt",
       "--tls-key-file", "/app/tls/tls.key",
       "--tls-ca-cert-file", "/app/tls/ca.crt",
       "--tls-auth-clients", "no"
     ]
   }
-]'
+]' || { echo "Failed to patch Redis args"; exit 1; }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a2ee37 and dd9cf85.

📒 Files selected for processing (28)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • principal/resource.go
  • principal/listen.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • hack/dev-env/start-principal.sh
  • hack/dev-env/start-agent-autonomous.sh
  • hack/dev-env/configure-argocd-redis-tls.sh
  • test/e2e/rp_test.go
  • test/e2e/fixture/argoclient.go
  • test/e2e/clusterinfo_test.go
  • Makefile
  • hack/dev-env/start-e2e.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/README.md
  • test/run-e2e.sh
  • hack/dev-env/start-agent-managed.sh
  • hack/dev-env/Procfile.e2e
🧬 Code graph analysis (5)
principal/tracker/tracking.go (2)
internal/event/event.go (1)
  • Event (112-115)
internal/logging/logfields/logfields.go (1)
  • Event (34-34)
cmd/argocd-agent/principal.go (3)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
hack/dev-env/configure-redis-tls.sh (1)
hack/dev-env/setup-vcluster-env.sh (1)
  • apply (94-247)
agent/agent.go (2)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (3)
internal/logging/logfields/logfields.go (3)
  • Config (127-127)
  • Error (101-101)
  • Client (37-37)
principal/listen.go (1)
  • Listener (54-60)
internal/logging/logging.go (2)
  • Error (305-307)
  • Warn (300-302)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Lint Go code
  • GitHub Check: Run unit tests
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (42)
test/run-e2e.sh (2)

104-115: Verify environment variable names match test code expectations.

Environment variables are exported for local development with inconsistent naming conventions: REDIS_SERVER_ADDRESS vs. REDIS_ADDR suffixes, and inconsistent AGENT prefixes. Confirm these names match what the test code actually consumes.


81-102: The lsof -i :6380 -i :6381 -i :6382 syntax is standard and correct. Multiple -i flags are ORed by default in lsof (matching any of the specified ports), which is the intended behavior—the script warns if none of the three ports are detected. The proposed fix using && operators would incorrectly require all three ports to be listening, changing the detection logic. No action needed.

principal/tracker/tracking.go (1)

75-78: Verify that buffered capacity 1 is sufficient and handles edge cases correctly.

The change from an unbuffered to a buffered channel addresses potential deadlocks when the sender runs before the receiver is ready. However, verify:

  1. One event per request: Confirm that each tracked request receives exactly one response event, ensuring capacity 1 is sufficient.
  2. Closed channel handling: Ensure processRedisEventResponse doesn't send on a closed channel if StopTracking is called prematurely (would cause a panic).
  3. Abandoned channel cleanup: Verify that sendSynchronousRedisMessageToAgent always consumes from the channel or that proper timeout/cleanup mechanisms exist to prevent goroutine leaks.
test/e2e/sync_test.go (1)

371-371: Verify the Job name "before" matches the test data definition.

The pre-sync hook Job name has been updated to "before" in both Test_TerminateOperationManaged (line 371) and Test_TerminateOperationAutonomous (line 466). Confirm this name matches the actual Job resource defined in test/data/pre-sync, since the tests use this name to verify Job cleanup within a 120-second timeout—a mismatch will cause test failures.

Also note: This change appears unrelated to the PR's Redis TLS enablement objectives and may warrant a separate commit for clarity.

Also applies to: 466-466

hack/dev-env/Procfile.e2e (1)

1-7: LGTM! Well-structured E2E process configuration.

The port-forward setup correctly maps Redis ports for each vcluster (6380-6382 → 6379) and the argocd-server port (8444 → 443). The sleep delays appropriately sequence the startup to allow port-forwards to establish before services start.

docs/getting-started/kubernetes/index.md (2)

159-230: Clear and comprehensive Redis TLS setup documentation.

The certificate generation steps are secure (4096-bit RSA keys, appropriate SANs). The warning about Redis TLS being required and the step-by-step kubectl patches are well-documented.


337-381: Good approach to workload cluster TLS setup.

Using distinct file names (redis-workload.key/crt) while reusing the same CA is a clean pattern that prevents confusion. The instructions maintain consistency with the control-plane setup in Section 2.4.

docs/configuration/redis-tls.md (2)

677-697: Solid security best practices section.

The recommendations for strong keys (4096-bit RSA), certificate rotation, and the explicit warning against insecure options in production are valuable. Good callout about using readOnly: true for volume mounts.


1-50: Comprehensive and well-organized Redis TLS documentation.

The documentation provides clear guidance from quick start through production deployment, with thorough troubleshooting. The architecture diagram effectively illustrates the TLS configuration points.

test/e2e/README.md (2)

21-108: Clear and practical E2E test documentation updates.

The multi-terminal workflow is well-explained, and the distinction between local/remote cluster setups helps users understand when reverse tunnel is needed. The Redis TLS section appropriately documents that TLS is mandatory and provides manual reconfiguration steps.


107-108: Good clarification on InsecureSkipVerify usage.

The documentation correctly explains that InsecureSkipVerify: true is used only in test fixtures for cross-environment compatibility, while TLS encryption remains fully enabled. This aligns with the PR description's request for feedback on this approach.

test/e2e/redis_proxy_test.go (3)

187-208: Improved message draining logic with proper retry semantics.

The messagesDrained flag correctly tracks whether any messages were processed, and the drain-all-then-retry pattern is more robust than checking one message at a time. This should reduce flaky test failures.


210-238: Good resilience added to ResourceTree verification.

Wrapping the ResourceTree call in Eventually with proper error handling addresses transient Redis EOF errors that can occur during TLS connection resets. The 30-second timeout with 2-second intervals provides adequate retries.


586-653: Well-configured HTTP transport for SSE streams.

The transport settings are appropriate:

  • Buffered channel (100) prevents message loss during processing
  • Timeout: 0 is correct for long-lived SSE connections
  • IdleConnTimeout: 300s keeps connections alive for extended test runs
  • InsecureSkipVerify: true is documented in the E2E README as test-only behavior
internal/argocd/cluster/cluster.go (3)

18-18: LGTM!

The crypto/tls import is necessary for the new TLS configuration parameter added to NewClusterCacheInstance.


135-142: LGTM!

The initialization of ConnectionState when it doesn't exist provides a sensible default when cluster cache stats are received before an explicit connection status update. This improves agent connection tracking.


176-184: LGTM!

The TLS configuration is properly integrated into the Redis client initialization. The signature change is consistent with the broader TLS enablement across the codebase.

agent/agent.go (3)

19-23: LGTM!

The new imports are necessary for TLS configuration and CA certificate loading from files.


323-343: LGTM!

The TLS configuration logic is well-structured:

  • Properly handles insecure mode with appropriate warning
  • Loads and validates CA certificates from file
  • Provides clear error messages on failure

345-349: LGTM!

The TLS configuration is correctly passed to the cluster cache instance creation, matching the updated signature.

cmd/argocd-agent/principal.go (3)

90-97: LGTM!

The Redis TLS flag variables are well-named and cover all necessary configuration options for both server and upstream TLS.


419-440: LGTM!

The CLI flag definitions are comprehensive and follow consistent naming conventions. Environment variable support is properly integrated with sensible defaults.


471-471: LGTM!

The increased timeout (30s) for TLS configuration retrieval from Kubernetes is reasonable and aligns with the broader TLS enablement changes.

principal/redisproxy/redisproxy.go (3)

21-27: LGTM!

The TLS-related fields are well-structured with clear separation between server (incoming connections) and upstream (outgoing connections) TLS configurations. The comments provide good context.

Also applies to: 65-76


98-154: LGTM!

The TLS configuration methods are well-designed:

  • Clean public API for both server and upstream TLS
  • Proper handling of both file-based and memory-based certificates
  • Appropriate error handling and minimum TLS version

157-183: LGTM!

The Start() method cleanly handles both TLS and non-TLS modes with appropriate logging and error handling.

hack/dev-env/start-agent-managed.sh (4)

37-46: LGTM!

The Redis TLS detection logic is appropriate for a development script, with helpful guidance when certificates are not found.


48-62: LGTM!

The Redis address configuration is well-documented with helpful comments explaining the localhost default and port-forward requirements for TLS validation.


63-74: LGTM!

The mTLS certificate extraction properly retrieves client certificates and CA from Kubernetes secrets. The use of temporary files is appropriate for a development script.


76-90: LGTM!

The agent startup command properly includes all TLS-related arguments (client certificates, Redis TLS, etc.) in a logical order.

hack/dev-env/gen-redis-tls-certs.sh (4)

14-26: LGTM!

The CA generation logic is idempotent and uses strong cryptographic parameters (4096-bit RSA). The conditional generation prevents overwriting existing certificates.


28-58: LGTM!

The certificate generation pattern is well-structured and idempotent. The use of extension files for Subject Alternative Names follows modern OpenSSL practices.


67-90: LGTM!

The local IP detection and conditional SAN addition is well-handled. The script correctly avoids adding an empty IP entry when local IP detection fails, which was a previously identified issue.


138-150: LGTM!

The cleanup of temporary files is appropriate, and the success message provides a helpful summary of all generated certificates.

test/e2e/fixture/fixture.go (4)

109-112: LGTM!

The timeout increases (60s → 120s) are appropriate for TLS-enabled E2E tests, which may experience additional latency from TLS handshakes and port-forwarding in the test environment.

Also applies to: 143-143, 160-160


231-240: LGTM!

The error handling changes improve test cleanup resilience by continuing cleanup even when individual deletions fail. This is appropriate for test teardown where partial cleanup is preferable to complete failure, and warnings ensure issues are still visible.

Also applies to: 256-265, 277-278, 290-291, 312-324, 344-356, 371-373


235-240: LGTM!

The DeepCopy calls properly prevent mutation of loop variables when modifying objects for deletion waiting. This is correct and defensive programming.

Also applies to: 260-265, 316-324, 349-356


457-470: LGTM!

The non-fatal cluster info reset is appropriate for test cleanup scenarios where Redis may be unavailable (e.g., port-forward terminated). Using the cached cache instance is consistent with the broader caching pattern introduced in cluster.go.

test/e2e/fixture/cluster.go (4)

19-22: LGTM!

The new imports and TLS-enabled flags appropriately extend the test fixture to support TLS-enabled Redis configurations.

Also applies to: 44-51


181-224: LGTM!

The TLS configuration with InsecureSkipVerify is appropriately documented for E2E tests. The connection tuning parameters (timeouts, pool sizing, retries) are generous and suitable for handling port-forwarding latency in test environments.


226-267: LGTM!

The Redis client caching mechanism properly prevents connection leaks in E2E tests:

  • Thread-safe with mutex protection
  • Cache key includes address for proper isolation
  • Exported cleanup function for test suite teardown
  • Reasonable reliance on GC for connection cleanup in test code

299-327: LGTM!

The Redis configuration retrieval logic is robust:

  • Comprehensive fallback chain (LoadBalancer → LoadBalancerIP → ClusterIP)
  • TLS always enabled for E2E tests
  • Environment variable overrides support local development workflows
  • Clear error messages document the fallback chain

Also applies to: 360-387

Comment on lines +273 to +287
if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" {
cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path")
}

// Redis upstream TLS (for connections to principal's argocd-redis)
if redisUpstreamTLSInsecure {
logrus.Warn("INSECURE: Not verifying upstream Redis TLS certificate")
opts = append(opts, principal.WithRedisUpstreamTLSInsecure(true))
} else if redisUpstreamTLSCAPath != "" {
logrus.Infof("Loading Redis upstream CA certificate from file %s", redisUpstreamTLSCAPath)
opts = append(opts, principal.WithRedisUpstreamTLSCAFromFile(redisUpstreamTLSCAPath))
} else {
logrus.Infof("Loading Redis upstream CA certificate from secret %s/%s", namespace, redisUpstreamTLSCASecretName)
opts = append(opts, principal.WithRedisUpstreamTLSCAFromSecret(kubeConfig.Clientset, namespace, redisUpstreamTLSCASecretName, "tls.crt"))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add validation for all upstream TLS mode combinations.

The current validation only checks mutual exclusivity between --redis-upstream-tls-insecure and --redis-upstream-ca-path (lines 273-275). However, users could also specify both --redis-upstream-ca-path and --redis-upstream-ca-secret-name simultaneously, which would silently prefer the file-based approach without warning.

Consider expanding the validation to cover all three modes:

// Validate upstream TLS configuration - only one mode allowed
modesSet := 0
if redisUpstreamTLSInsecure {
    modesSet++
}
if redisUpstreamTLSCAPath != "" {
    modesSet++
}
if redisUpstreamTLSCASecretName != "" {
    modesSet++
}
if modesSet > 1 {
    cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
}

This aligns with the mutual exclusivity pattern used for server cert/key validation (lines 265-266).

🤖 Prompt for AI Agents
In cmd/argocd-agent/principal.go around lines 273 to 287, the current check only
prevents specifying --redis-upstream-tls-insecure together with
--redis-upstream-ca-path, but does not reject combinations that include
--redis-upstream-ca-secret-name; update validation to count/validate all three
upstream TLS mode flags (redisUpstreamTLSInsecure, redisUpstreamTLSCAPath,
redisUpstreamTLSCASecretName) and call cmdutil.Fatal with a clear message if
more than one is set so only one mode can be specified (insecure, CA file, or CA
secret).

Comment on lines +70 to +76
kubectl config use-context ${CONTEXT}

# Check Redis Deployment exists
if ! kubectl get deployment argocd-redis -n ${NAMESPACE} &>/dev/null; then
echo "Error: argocd-redis Deployment not found in namespace ${NAMESPACE}"
exit 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Verify context switch succeeded before proceeding.

If kubectl config use-context fails (context doesn't exist or kubectl error), the script continues and may operate on the wrong cluster, causing configuration on an unintended environment.

Add error checking after the context switch:

 # Switch context
 echo "Switching to context: ${CONTEXT}"
-kubectl config use-context ${CONTEXT}
+kubectl config use-context ${CONTEXT} || { echo "Error: Failed to switch to context ${CONTEXT}"; exit 1; }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In hack/dev-env/configure-redis-tls.sh around lines 70 to 76, the script runs
"kubectl config use-context ${CONTEXT}" but does not check whether that command
succeeded; add an immediate check of the command's exit status (or verify
current-context equals ${CONTEXT}) and if it failed print a clear error and exit
non‑zero before continuing, so the subsequent kubectl operations cannot run
against the wrong cluster.

Comment on lines +199 to +215
kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
{
"op": "replace",
"path": "/spec/template/spec/containers/0/args",
"value": [
"--save", "",
"--appendonly", "no",
"--requirepass", "$(REDIS_PASSWORD)",
"--tls-port", "6379",
"--port", "0",
"--tls-cert-file", "/app/tls/tls.crt",
"--tls-key-file", "/app/tls/tls.key",
"--tls-ca-cert-file", "/app/tls/ca.crt",
"--tls-auth-clients", "no"
]
}
]'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix $(REDIS_PASSWORD) interpolation in Redis args.

Line 206 contains "$(REDIS_PASSWORD)" which won't be evaluated in a JSON patch—it's passed literally to Redis as a password string, causing authentication to fail.

Interpolate the environment variable in the script before creating the patch:

-# Update Redis args for TLS
-kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
+# Update Redis args for TLS
+REDIS_PASSWORD="${REDIS_PASSWORD:-}"  # Use env var or empty
+if [ -z "$REDIS_PASSWORD" ]; then
+    echo "Warning: REDIS_PASSWORD not set; password auth will not be configured"
+fi
+kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
   {
     "op": "replace",
     "path": "/spec/template/spec/containers/0/args",
     "value": [
       "--save", "",
       "--appendonly", "no",
-      "--requirepass", "$(REDIS_PASSWORD)",
+      "--requirepass", "'"${REDIS_PASSWORD}"'",
       "--tls-port", "6379",
       "--port", "0",
       "--tls-cert-file", "/app/tls/tls.crt",
       "--tls-key-file", "/app/tls/tls.key",
       "--tls-ca-cert-file", "/app/tls/ca.crt",
       "--tls-auth-clients", "no"
     ]
   }
-]'
+]' || { echo "Failed to update Redis args"; exit 1; }

Alternatively, retrieve the password from the existing Redis secret or deployment if one is already configured.

Comment on lines +62 to +66
if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then
echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!"
echo "Please run: ./hack/dev-env/configure-redis-tls.sh ${CONTEXT}"
exit 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unreliable Redis TLS detection using text grep on JSON output.

Using grep -q "tls-port" to validate TLS configuration is fragile and error-prone:

  • Text matching on JSON can produce false positives if "tls-port" appears in comments or descriptions
  • If field naming differs (e.g., camelCase: tlsPort), the check silently fails without reporting the actual state
  • No confirmation that the field is in the expected location within the deployment spec

Replace text matching with proper JSON parsing:

-        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then
+        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | jq -e '.spec.template.spec.containers[].ports[] | select(.name == "tls-port")' >/dev/null 2>&1; then
             echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!"
🤖 Prompt for AI Agents
In test/run-e2e.sh around lines 62–66, the current TLS check uses a fragile text
grep on kubectl JSON output; replace it with a robust JSON parse using jq to
validate the field exists in the expected location. Specifically, run kubectl -n
argocd get deployment argocd-redis -o json and use jq to check for the TLS port
definition under .spec.template.spec.containers[].ports[].name == "tls-port"
(and also accept variants like "tlsPort" or "tls_port" if your deployment uses
those), or alternatively check for a container arg/env/port field that indicates
TLS is enabled; if the jq test fails, print the relevant JSON snippet for
debugging and exit 1. Ensure errors are explicit (include the snippet or value
found) and avoid simple text grep on the whole JSON output.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from dd9cf85 to a349781 Compare December 4, 2025 16:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
cmd/argocd-agent/principal.go (1)

258-288: Harden upstream Redis TLS mode validation to cover all flag combinations

The overall Redis TLS wiring looks good: server-side TLS validates cert/key pairing and falls back to a secret, and upstream TLS correctly selects between insecure mode, CA file, and CA secret. However, the validation currently only rejects --redis-upstream-tls-insecure together with --redis-upstream-ca-path, while combinations involving --redis-upstream-ca-secret-name (e.g., CA path + CA secret, or insecure + CA secret) are still allowed and silently prefer one mode.

To make configuration failures explicit and align with the stricter server TLS validation, consider validating that at most one upstream TLS mode is set (insecure, CA file, or CA secret):

-				// Validate upstream TLS configuration - insecure and CA path are mutually exclusive
-				if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" {
-					cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path")
-				}
+				// Validate upstream TLS configuration - only one mode may be specified
+				modesSet := 0
+				if redisUpstreamTLSInsecure {
+					modesSet++
+				}
+				if redisUpstreamTLSCAPath != "" {
+					modesSet++
+				}
+				if redisUpstreamTLSCASecretName != "" {
+					modesSet++
+				}
+				if modesSet > 1 {
+					cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
+				}

This keeps the existing selection logic intact while preventing ambiguous configurations from being silently accepted.

hack/dev-env/configure-argocd-redis-tls.sh (1)

309-312: Fix replica guard logic to enforce minimum of 1 replica.

The current shell operator precedence causes the replica guards to fail. The expression [ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1" is parsed as cond1 || (cond2 && assignment), which means:

  • If REPO_SERVER_REPLICAS="0", the first test succeeds and short-circuits, so the assignment never executes
  • Only empty values get set to "1"

This contradicts the comment "Ensure we have at least 1 replica" and could lead to components being scaled to 0 replicas during E2E tests.

Apply this diff to fix the logic:

-# Ensure we have at least 1 replica
-[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
-[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
-[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
+# Ensure we have at least 1 replica
+if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then
+  REPO_SERVER_REPLICAS="1"
+fi
+if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then
+  CONTROLLER_REPLICAS="1"
+fi
+if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then
+  SERVER_REPLICAS="1"
+fi
test/run-e2e.sh (2)

33-45: Validate all required certificate files, not just ca.crt.

The script only checks for ca.crt but does not validate that server.crt and server.key exist. If these files are missing, tests will fail downstream with cryptic TLS errors.

Apply this diff:

 # Check if Redis TLS certificates exist
-if [ ! -f "${REDIS_TLS_DIR}/ca.crt" ]; then
+if [ ! -f "${REDIS_TLS_DIR}/ca.crt" ] || [ ! -f "${REDIS_TLS_DIR}/server.crt" ] || [ ! -f "${REDIS_TLS_DIR}/server.key" ]; then
     echo "ERROR: Redis TLS certificates not found!"
     echo ""
     echo "Redis TLS is REQUIRED for E2E tests (security requirement)."
     echo ""
     echo "Please run the following commands:"
     echo "  ./hack/dev-env/gen-redis-tls-certs.sh"

62-66: Replace text grep with proper JSON parsing for TLS validation.

Using grep -q "tls-port" on JSON output is fragile:

  • Text matching can produce false positives if "tls-port" appears in unexpected locations
  • Does not confirm the field is in the correct location within the deployment spec
  • Provides no debugging information when validation fails

Replace with robust JSON parsing:

-        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | grep -q "tls-port"; then
+        if ! kubectl --context="${CONTEXT}" -n argocd get deployment argocd-redis -o json 2>/dev/null | jq -e '.spec.template.spec.containers[].ports[] | select(.name == "tls-port")' >/dev/null 2>&1; then
             echo "ERROR: Redis Deployment in ${CONTEXT} is not configured with TLS!"
             echo "Please run: ./hack/dev-env/configure-redis-tls.sh ${CONTEXT}"
             exit 1
         fi
🧹 Nitpick comments (3)
principal/listen.go (2)

174-196: Inconsistent log formatting and unrelated changes.

Several issues with the new logging statements:

  1. Emoji in production logs (line 174): The "🔧" emoji may not render correctly in all log aggregation systems and is non-standard for production logging.
  2. Leading whitespace (lines 176, 196): Messages like " WebSocket is ENABLED" and " gRPC server.Serve() exited" have leading spaces, creating inconsistent formatting compared to other log statements.
  3. Disconnect from PR objectives: This PR is focused on enabling Redis TLS encryption by default, but these changes add WebSocket and gRPC server startup logging, which appears unrelated to the stated objectives.

Apply this diff to fix the formatting issues:

-	log().WithField("enableWebSocket", s.enableWebSocket).Info("🔧 Checking if WebSocket is enabled")
+	log().WithField("enableWebSocket", s.enableWebSocket).Info("Checking if WebSocket is enabled")
 	if s.enableWebSocket {
-		log().Info(" WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
+		log().Info("WebSocket is ENABLED - using downgrading HTTP handler instead of native gRPC")
 		opts := []grpchttp1server.Option{grpchttp1server.PreferGRPCWeb(true)}
 
 		downgradingHandler := grpchttp1server.CreateDowngradingHandler(s.grpcServer, http.NotFoundHandler(), opts...)
@@ -193,7 +193,7 @@
 		go func() {
 			log().Info("Starting gRPC server.Serve() - server is now accepting connections")
 			err = s.grpcServer.Serve(s.listener.l)
-			log().WithError(err).Warn(" gRPC server.Serve() exited")
+			log().WithError(err).Warn("gRPC server.Serve() exited")
 			errch <- err
 		}()
 	}

224-230: Service registration logging appears unrelated to PR objectives.

While these logging statements improve observability during service initialization, they appear disconnected from the PR's core objective of enabling Redis TLS encryption by default. Consider whether these observability improvements belong in a separate PR focused on logging enhancements.

test/e2e/fixture/cluster.go (1)

259-267: Consider explicitly closing Redis connections in cleanup.

The current cleanup simply recreates the map and relies on garbage collection to close connections. While this works, explicitly closing the underlying Redis clients would provide more deterministic cleanup.

Consider this enhancement:

 func CleanupRedisCachedClients() {
 	cachedRedisClientMutex.Lock()
 	defer cachedRedisClientMutex.Unlock()
 
 	fmt.Printf("Cleaning up %d cached Redis clients\n", len(cachedRedisClients))
-	// Clear the cache map - connections will be garbage collected
+	// Note: Explicit close is not available on appstatecache.Cache
+	// Connections will be garbage collected when map is cleared
 	cachedRedisClients = make(map[string]*appstatecache.Cache)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd9cf85 and a349781.

📒 Files selected for processing (28)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • hack/dev-env/start-agent-autonomous.sh
🚧 Files skipped from review as they are similar to previous changes (9)
  • principal/resource.go
  • test/e2e/clusterinfo_test.go
  • principal/tracker/tracking.go
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-principal.sh
  • hack/dev-env/gen-redis-tls-certs.sh
  • test/e2e/README.md
  • hack/dev-env/configure-redis-tls.sh
  • test/e2e/sync_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • hack/dev-env/start-e2e.sh
  • hack/dev-env/start-agent-managed.sh
  • Makefile
  • test/run-e2e.sh
  • test/e2e/rp_test.go
🧬 Code graph analysis (5)
test/e2e/fixture/fixture.go (2)
internal/backend/interface.go (1)
  • Namespace (124-127)
test/e2e/fixture/cluster.go (2)
  • AgentManagedName (37-37)
  • AgentClusterServerURL (39-39)
principal/listen.go (1)
internal/logging/logging.go (2)
  • Info (295-297)
  • Warn (300-302)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (2)
internal/logging/logfields/logfields.go (3)
  • Config (127-127)
  • Error (101-101)
  • Client (37-37)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
🪛 markdownlint-cli2 (0.18.1)
docs/configuration/redis-tls.md

150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


475-475: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


486-486: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


504-504: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (30)
install/helm-repo/argocd-agent-agent/values.schema.json (3)

302-330: Well-structured Redis TLS schema with clear configuration options.

The redisTLS property cleanly encapsulates TLS-related configuration (enable/disable, CA path/secret, insecure mode). Descriptions are informative, and the anyOf pattern for enabled and insecure allows both native boolean values and string representations for broader compatibility.


332-383: NetworkPolicy schema correctly models Kubernetes policy structure.

The networkPolicy property provides appropriate control over network policies for the agent and Redis. The nested structure for selectors (agentSelector, redisSelector) properly supports Kubernetes label matching while allowing flexibility via additionalProperties: true. The scoping and descriptions are clear.


302-330:

The type inconsistency between redisTLS fields (using anyOf for string/boolean) and networkPolicy.enabled (using boolean only) was already flagged in a prior review and marked as addressed. Skipping duplicate comment; developer can investigate commit 6247404 if clarification is needed on the design rationale.

cmd/argocd-agent/principal.go (3)

90-98: Redis TLS option variables are coherent with intended configuration

The newly added Redis TLS variables are clearly named and match the later flag wiring and option usage; no issues here.


419-441: Redis TLS flags and env wiring look consistent

Flag names, env variable keys, defaults (notably the shared argocd-redis-tls secret), and help strings are all consistent with the new Redis TLS behavior; no issues from a CLI/config surface perspective.


471-471: Increasing resource proxy TLS secret fetch timeout to 30s is reasonable

Extending the timeout to 30 seconds for loading proxy TLS material from Kubernetes is a safe change and should better tolerate slow API servers without introducing new correctness risks.

test/e2e/fixture/argoclient.go (1)

316-336: LGTM! Environment variable override improves test flexibility.

The addition of the ARGOCD_SERVER_ADDRESS environment variable check before falling back to Kubernetes API queries is a good optimization for test environments. The fallback logic is preserved correctly, ensuring backward compatibility.

hack/dev-env/start-agent-managed.sh (1)

37-90: LGTM! Redis TLS and mTLS configuration properly integrated.

The script correctly:

  • Detects Redis TLS certificates and provides helpful guidance when missing
  • Sets appropriate defaults for local development with clear documentation
  • Extracts mTLS certificates from Kubernetes secrets
  • Passes all necessary TLS arguments to the agent startup

The explicit comments about port-forward requirements are particularly helpful for developers.

docs/configuration/redis-tls.md (1)

1-700: Excellent comprehensive Redis TLS documentation.

This documentation provides thorough coverage of Redis TLS configuration including:

  • Clear architecture diagrams and TLS configuration points
  • Step-by-step quick start for development/testing
  • Detailed certificate management guidance
  • Complete Kubernetes installation instructions
  • Comprehensive troubleshooting section with common issues and solutions
  • Security best practices

The documentation is well-structured with a table of contents and clear separation between development/testing and production scenarios.

test/e2e/fixture/fixture.go (2)

109-171: LGTM! Extended timeouts improve resilience for TLS-enabled Redis.

The timeout increase from 60 to 120 seconds in EnsureDeletion and WaitForDeletion is appropriate for TLS-enabled Redis connections, which may have slightly higher latency during connection establishment and teardown.


200-462: LGTM! Non-fatal cleanup warnings prevent cascading test failures.

The changes to log warnings instead of returning errors during cleanup are appropriate for handling transient issues like port-forward failures. Key improvements:

  • Uses DeepCopy() to avoid mutating loop variables (lines 235, 260, 317, 350)
  • Logs warnings for cleanup failures instead of failing the entire test
  • Gracefully handles Redis unavailability during cluster info reset (lines 457-461)

This makes the test suite more robust in environments with flaky port-forwards or temporary connectivity issues.

test/e2e/rp_test.go (1)

162-169: LGTM! Refactoring to fixture helpers improves consistency.

The refactoring to use fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret eliminates code duplication and centralizes the logic for retrieving test credentials. This aligns with the environment variable override capability added to the fixture helpers.

Also applies to: 295-305

docs/getting-started/kubernetes/index.md (2)

159-229: LGTM! Clear Redis TLS setup instructions with proper warnings.

The new section 2.4 provides comprehensive Redis TLS setup guidance:

  • Clear warning that Redis TLS is required
  • Step-by-step certificate generation with appropriate SANs
  • Deployment patching commands
  • Verification steps
  • Note about automatic TLS configuration in manifests

The instructions are well-structured and include all necessary details for setting up Redis TLS on the control plane.


337-381: LGTM! Workload cluster Redis TLS setup mirrors control plane.

Section 4.4 appropriately repeats the Redis TLS setup for workload clusters with a clear note to reuse the same CA from Step 2.4. The instructions maintain consistency with the control plane setup while properly scoping the certificate generation to the workload cluster context.

hack/dev-env/start-e2e.sh (1)

50-61: LGTM! Static localhost addresses enable TLS certificate validation.

The use of static localhost addresses with fixed ports is appropriate for E2E tests because:

  • localhost is included in the Redis TLS certificate SANs
  • Port-forwards (managed by goreman) provide stable local endpoints
  • Enables proper TLS certificate validation during tests

The Redis password retrieval correctly separates assignment from export, addressing the previous shellcheck warning.

hack/dev-env/configure-argocd-redis-tls.sh (1)

1-342: Overall script design is solid for E2E Redis TLS configuration.

The script provides comprehensive Redis TLS configuration for Argo CD components:

  • Idempotent volume and volumeMount additions with existence checks
  • Clear error messages and exit codes
  • Appropriate handling of different cluster contexts (control-plane vs agent)
  • Graceful scaling with rollout status waits

The replica guard logic issue aside, the script structure and approach are well-designed for the E2E test environment.

agent/agent.go (2)

323-343: LGTM! TLS configuration properly implemented.

The TLS config construction for the cluster cache correctly handles:

  • Warning log when InsecureSkipVerify is enabled (matching principal code)
  • CA certificate loading with clear error messages
  • Proper certificate pool validation

445-460: LGTM! Immediate startup update improves observability.

Sending cluster cache info immediately on startup (before the first ticker interval) ensures the principal receives agent state promptly, improving observability and reducing the delay in initial metrics.

internal/argocd/cluster/cluster.go (2)

175-191: LGTM! TLS integration properly implemented.

The signature change to NewClusterCacheInstance and TLS configuration wiring are correct. The TLSConfig is properly passed through to the Redis client options.


135-142: LGTM! Defensive initialization of ConnectionState.

Initializing ConnectionState when absent ensures cluster info is properly set even when the agent sends cache stats before connection status is explicitly set, preventing nil-reference issues.

test/e2e/fixture/cluster.go (2)

181-201: LGTM! InsecureSkipVerify acceptable for E2E tests.

Using InsecureSkipVerify: true in E2E tests is appropriate given the dynamic LoadBalancer addresses in test environments. The PR objectives explicitly mention this trade-off to retain TLS encryption while accommodating test infrastructure limitations.

Based on learnings, test fixtures under test/ directories do not require production-grade security hardening.


298-327: LGTM! Comprehensive address resolution with TLS enforcement.

The multi-level fallback approach (LoadBalancer → spec.loadBalancerIP → ClusterIP) handles various deployment scenarios well. TLS enforcement and environment variable overrides for local development are appropriate for E2E tests.

Also applies to: 359-387

principal/redisproxy/redisproxy.go (3)

98-128: LGTM! Clean and composable TLS configuration API.

The public setter methods provide a clear and flexible API for configuring TLS. Separating in-memory and file-based certificate configuration is appropriate, and validation is deferred to createServerTLSConfig where it's needed.


130-154: LGTM! Robust TLS configuration with proper error handling.

The createServerTLSConfig method correctly handles both file-based and in-memory certificates, with clear error messages and appropriate TLS version constraints (min TLS 1.2).


852-894: LGTM! TLS handshake and upstream connection properly implemented.

The upstream TLS implementation correctly:

  • Handles InsecureSkipVerify with warning log
  • Supports CA certificate pool from memory or file
  • Configures SNI based on hostname
  • Performs explicit handshake with error handling

Note: A past review suggested warning when server TLS is enabled but upstream TLS is not configured (to avoid unencrypted connections within the cluster). This remains a potential enhancement but is not blocking.

test/run-e2e.sh (1)

81-115: LGTM! macOS development support with helpful warnings.

The port-forward detection and environment variable setup provide a good developer experience for local testing. The non-blocking warning allows CI environments (with MetalLB) to proceed normally.

test/e2e/redis_proxy_test.go (4)

120-123: LGTM! Wait period reduces race condition with subscription activation.

The 5-second wait after SSE stream establishment allows Redis SUBSCRIBE commands to fully propagate before the pod is deleted, reducing race conditions in the test.

Also applies to: 326-329


188-208: LGTM! Message draining improves test reliability.

The enhanced message-handling logic drains all available messages before retrying, preventing false negatives when messages arrive in bursts. The extended 120-second timeout accommodates realistic network latency.

Also applies to: 407-427


211-237: LGTM! Retry logic handles transient Redis connection issues.

Wrapping ResourceTree calls in Eventually blocks with logging handles transient EOF errors and Redis connection issues gracefully, improving test stability in distributed environments.

Also applies to: 430-456


642-653: LGTM! HTTP client configuration optimized for SSE streams.

The HTTP transport settings (no overall timeout, extended idle timeout, connection pooling) are appropriate for long-lived SSE streams. InsecureSkipVerify: true is acceptable for E2E tests.

@Rizwana777 Rizwana777 force-pushed the issue-8091-redis-tls-config branch from a349781 to 6b246bf Compare December 5, 2025 12:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
hack/dev-env/configure-argocd-redis-tls.sh (1)

301-311: Replica guard still fails to enforce minimum of 1 replica

The guard:

# Ensure we have at least 1 replica
[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"

is parsed as cond1 || (cond2 && assign), so if a value is "0" the first test succeeds and the assignment never runs. That means components can be scaled back to 0 replicas, contradicting the comment and breaking the “scale back up after TLS config” intent.

Use explicit if blocks so both "0" and empty values are normalized to "1":

-# Ensure we have at least 1 replica
-[ "$REPO_SERVER_REPLICAS" = "0" ] || [ -z "$REPO_SERVER_REPLICAS" ] && REPO_SERVER_REPLICAS="1"
-[ "$CONTROLLER_REPLICAS" = "0" ] || [ -z "$CONTROLLER_REPLICAS" ] && CONTROLLER_REPLICAS="1"
-[ "$SERVER_REPLICAS" = "0" ] || [ -z "$SERVER_REPLICAS" ] && SERVER_REPLICAS="1"
+# Ensure we have at least 1 replica
+if [ -z "$REPO_SERVER_REPLICAS" ] || [ "$REPO_SERVER_REPLICAS" = "0" ]; then
+  REPO_SERVER_REPLICAS="1"
+fi
+if [ -z "$CONTROLLER_REPLICAS" ] || [ "$CONTROLLER_REPLICAS" = "0" ]; then
+  CONTROLLER_REPLICAS="1"
+fi
+if [ -z "$SERVER_REPLICAS" ] || [ "$SERVER_REPLICAS" = "0" ]; then
+  SERVER_REPLICAS="1"
+fi

This matches the comment and guarantees components are brought back to at least one replica for TLS-enabled testing.

cmd/argocd-agent/principal.go (1)

90-98: Upstream Redis TLS modes still aren’t fully mutually exclusive

Within the redisTLSEnabled block you only guard against --redis-upstream-tls-insecure being combined with --redis-upstream-ca-path:

if redisUpstreamTLSInsecure && redisUpstreamTLSCAPath != "" {
    cmdutil.Fatal("Cannot specify both --redis-upstream-tls-insecure and --redis-upstream-ca-path")
}

But there are three upstream TLS modes:

  • --redis-upstream-tls-insecure
  • --redis-upstream-ca-path
  • --redis-upstream-ca-secret-name

Users can still specify conflicting combinations such as insecure+secret or ca-path+secret; the current if/else if/else chain will silently prefer one, dropping the others.

To make configuration predictable (and consistent with the strict pairing you already enforce for server cert/key), validate that at most one mode is set before applying the options, e.g.:

if redisTLSEnabled {
    // Validate upstream TLS configuration - only one mode allowed
    modesSet := 0
    if redisUpstreamTLSInsecure {
        modesSet++
    }
    if redisUpstreamTLSCAPath != "" {
        modesSet++
    }
    if redisUpstreamTLSCASecretName != "" {
        modesSet++
    }
    if modesSet > 1 {
        cmdutil.Fatal("Only one Redis upstream TLS mode can be specified: --redis-upstream-tls-insecure, --redis-upstream-ca-path, or --redis-upstream-ca-secret-name")
    }

    // existing server TLS + upstream TLS selection logic...
}

This prevents ambiguous configurations and aligns the upstream TLS behavior with the rest of the principal’s TLS validation.

Also applies to: 258-288, 419-441

hack/dev-env/start-e2e.sh (1)

50-59: Fail fast if Redis password lookup fails.

If the kubectl get secret or base64 --decode step fails, REDIS_PASSWORD will be empty but the script will still start the E2E stack, leading to confusing Redis auth failures later.

Recommend checking the command result and the value before exporting:

-REDIS_PASSWORD=$(kubectl get secret argocd-redis --context=vcluster-agent-managed -n argocd -o jsonpath='{.data.auth}' | base64 --decode)
-export REDIS_PASSWORD
+REDIS_PASSWORD=$(kubectl get secret argocd-redis \
+  --context=vcluster-agent-managed \
+  -n argocd \
+  -o jsonpath='{.data.auth}' | base64 --decode)
+if [ -z "${REDIS_PASSWORD}" ]; then
+  echo "Error: Failed to retrieve Redis password from argocd-redis secret in vcluster-agent-managed/argocd" >&2
+  exit 1
+fi
+export REDIS_PASSWORD

This makes Redis auth problems surface immediately when starting E2E.

hack/dev-env/configure-redis-tls.sh (1)

198-215: Fix literal "$(REDIS_PASSWORD)" in Redis args patch (TLS + auth will break).

Inside the JSON patch, "$(REDIS_PASSWORD)" is single-quoted, so the shell never expands the environment variable. Redis will literally be configured with the password $(REDIS_PASSWORD), which won’t match the secret and will break all authenticated connections.

You should expand REDIS_PASSWORD before or during patch construction and (optionally) warn if it’s unset. For example:

-# Update Redis args for TLS
-kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
+# Update Redis args for TLS
+REDIS_PASSWORD="${REDIS_PASSWORD:-}"
+if [ -z "${REDIS_PASSWORD}" ]; then
+    echo "Warning: REDIS_PASSWORD not set; Redis will be configured without a usable password value"
+fi
+
+kubectl patch deployment argocd-redis -n ${NAMESPACE} --type='json' -p='[
   {
     "op": "replace",
     "path": "/spec/template/spec/containers/0/args",
     "value": [
       "--save", "",
       "--appendonly", "no",
-      "--requirepass", "$(REDIS_PASSWORD)",
+      "--requirepass", "'"${REDIS_PASSWORD}"'",
       "--tls-port", "6379",
       "--port", "0",
       "--tls-cert-file", "/app/tls/tls.crt",
       "--tls-key-file", "/app/tls/tls.key",
       "--tls-ca-cert-file", "/app/tls/ca.crt",
       "--tls-auth-clients", "no"
     ]
   }
 ]'

You may also want to hard‑fail if REDIS_PASSWORD is empty to avoid silently misconfiguring Redis in dev/e2e.

principal/redisproxy/redisproxy.go (1)

836-897: Warn (or fail) when proxy TLS is enabled but upstream TLS is not, to avoid silent plaintext hops.

In establishConnectionToPrincipalRedis, upstream TLS is only used when:

if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
    // wrap with TLS
}

If tlsEnabled is true but no upstream TLS config is provided, the proxy will:

  • Terminate TLS from Argo CD on the proxy, but
  • Connect to principal Redis over plain TCP,

creating a surprising “TLS‑terminated at proxy only” hop that contradicts the PR goal of “Redis TLS encryption enabled by default for all connections”.

Consider making this mismatch explicit, e.g.:

-    // If TLS is enabled for upstream, wrap the connection with TLS
-    if rp.tlsEnabled && (rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure) {
+    hasUpstreamTLSConfig := rp.upstreamTLSCA != nil || rp.upstreamTLSCAPath != "" || rp.upstreamTLSInsecure
+
+    if rp.tlsEnabled && !hasUpstreamTLSConfig {
+        logCtx.Warn("Redis proxy TLS is enabled, but no upstream TLS configuration provided; connection to principal Redis will be unencrypted")
+    }
+
+    // If TLS is enabled for upstream, wrap the connection with TLS
+    if rp.tlsEnabled && hasUpstreamTLSConfig {
         tlsConfig := &tls.Config{
             MinVersion: tls.VersionTLS12,
         }
         // ... existing CA / InsecureSkipVerify logic ...

Optionally, you might also allow hasUpstreamTLSConfig to trigger TLS even when tlsEnabled is false, if you anticipate scenarios where only the proxy→Redis hop should be encrypted.

🧹 Nitpick comments (4)
hack/dev-env/start-agent-managed.sh (1)

63-74: Consider cleanup for temporary credential files.

The mTLS certificates are extracted to /tmp files but never cleaned up. While acceptable for development, consider adding a trap to remove these files on script exit:

 TLS_CERT_PATH="/tmp/agent-managed-tls.crt"
 TLS_KEY_PATH="/tmp/agent-managed-tls.key"
 ROOT_CA_PATH="/tmp/agent-managed-ca.crt"
+
+# Cleanup temp files on exit
+trap 'rm -f "${TLS_CERT_PATH}" "${TLS_KEY_PATH}" "${ROOT_CA_PATH}"' EXIT
+
 kubectl --context vcluster-agent-managed -n argocd get secret argocd-agent-client-tls \

This prevents credential accumulation in /tmp and follows security best practices.

test/e2e/fixture/fixture.go (1)

109-171: Cleanup robustness improvements are reasonable; consider minor hardening

The extended 120s deletion waits and the shift to warning-only errors in CleanUp plus use of DeepCopy() for principal/managed Application and AppProject waits all improve e2e stability without changing production behavior. One small follow-up you might consider (optional) is:

  • Guarding resetManagedAgentClusterInfo against a nil clusterDetails to make it safer if CleanUp is ever reused outside BaseSuite.SetupSuite.
  • If deletion timing keeps growing, factoring the “spin for up to N seconds with 1s sleep” pattern into a helper that can use context deadlines instead of manual counters.

These are non-blocking and the current changes are fine for e2e usage.

Also applies to: 218-291, 295-375, 457-471

hack/dev-env/start-agent-autonomous.sh (1)

37-47: Redis TLS and mTLS wiring in dev script looks correct; consider ephemeral key files

The script correctly:

  • Detects the Redis TLS CA and enables --redis-tls-enabled/--redis-tls-ca-path.
  • Defaults --redis-addr to a localhost port-forward.
  • Extracts agent client cert/key/CA and passes them via --tls-client-cert/--tls-client-key/--root-ca-path.

For local dev this is fine. As an optional hardening, you could write the TLS material to mktemp paths and trap a cleanup (rm) on exit to avoid leaving private keys in /tmp across runs.

Also applies to: 48-62, 63-75, 79-83

test/e2e/redis_proxy_test.go (1)

120-137: SSE stream and Redis proxy e2e reliability improvements look solid

The added 5s post-connect delay, buffered msgChan with “drain all messages” semantics, and the Eventually-wrapped ResourceTree calls with logging should significantly reduce flakes from subscription races and transient EOFs. The SSE client transport is correctly tuned for long-lived streams (no global timeout, longer idle) and using InsecureSkipVerify is acceptable here given these are TLS-only e2e tests, not production code.

Also applies to: 184-237, 326-337, 402-457, 584-670

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a349781 and 6b246bf.

📒 Files selected for processing (28)
  • Makefile (1 hunks)
  • agent/agent.go (3 hunks)
  • cmd/argocd-agent/principal.go (4 hunks)
  • docs/configuration/redis-tls.md (1 hunks)
  • docs/getting-started/kubernetes/index.md (3 hunks)
  • hack/dev-env/Procfile.e2e (1 hunks)
  • hack/dev-env/configure-argocd-redis-tls.sh (1 hunks)
  • hack/dev-env/configure-redis-tls.sh (1 hunks)
  • hack/dev-env/gen-redis-tls-certs.sh (1 hunks)
  • hack/dev-env/start-agent-autonomous.sh (1 hunks)
  • hack/dev-env/start-agent-managed.sh (1 hunks)
  • hack/dev-env/start-e2e.sh (1 hunks)
  • hack/dev-env/start-principal.sh (2 hunks)
  • install/helm-repo/argocd-agent-agent/values.schema.json (1 hunks)
  • internal/argocd/cluster/cluster.go (3 hunks)
  • principal/listen.go (3 hunks)
  • principal/redisproxy/redisproxy.go (5 hunks)
  • principal/resource.go (1 hunks)
  • principal/tracker/tracking.go (1 hunks)
  • test/e2e/README.md (1 hunks)
  • test/e2e/clusterinfo_test.go (2 hunks)
  • test/e2e/fixture/argoclient.go (2 hunks)
  • test/e2e/fixture/cluster.go (9 hunks)
  • test/e2e/fixture/fixture.go (11 hunks)
  • test/e2e/redis_proxy_test.go (6 hunks)
  • test/e2e/rp_test.go (2 hunks)
  • test/e2e/sync_test.go (2 hunks)
  • test/run-e2e.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • principal/resource.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • principal/listen.go
  • install/helm-repo/argocd-agent-agent/values.schema.json
  • test/run-e2e.sh
  • internal/argocd/cluster/cluster.go
  • test/e2e/clusterinfo_test.go
  • principal/tracker/tracking.go
  • Makefile
  • test/e2e/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T10:37:28.008Z
Learnt from: chetan-rns
Repo: argoproj-labs/argocd-agent PR: 632
File: test/data/pre-sync/pre-sync-job.yaml:2-17
Timestamp: 2025-10-31T10:37:28.008Z
Learning: In the argocd-agent repository, Kubernetes manifests under test/data/ directories are test fixtures for e2e testing and do not require production-grade security hardening such as securityContext settings for runAsNonRoot or allowPrivilegeEscalation.

Applied to files:

  • test/e2e/rp_test.go
  • hack/dev-env/start-agent-managed.sh
  • hack/dev-env/Procfile.e2e
  • hack/dev-env/start-e2e.sh
🧬 Code graph analysis (4)
test/e2e/rp_test.go (1)
test/e2e/fixture/argoclient.go (3)
  • GetArgoCDServerEndpoint (315-337)
  • GetInitialAdminSecret (302-313)
  • NewArgoClient (52-66)
cmd/argocd-agent/principal.go (4)
agent/options.go (1)
  • WithRedisTLSEnabled (112-117)
principal/options.go (6)
  • WithRedisTLSEnabled (493-498)
  • WithRedisServerTLSFromPath (501-507)
  • WithRedisServerTLSFromSecret (510-520)
  • WithRedisUpstreamTLSInsecure (543-548)
  • WithRedisUpstreamTLSCAFromFile (523-528)
  • WithRedisUpstreamTLSCAFromSecret (531-540)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/env/env.go (2)
  • BoolWithDefault (30-39)
  • StringWithDefault (46-55)
agent/agent.go (3)
internal/logging/logfields/logfields.go (1)
  • Config (127-127)
internal/logging/logging.go (1)
  • Warn (300-302)
internal/argocd/cluster/cluster.go (1)
  • NewClusterCacheInstance (176-192)
principal/redisproxy/redisproxy.go (2)
principal/listen.go (1)
  • Listener (54-60)
internal/logging/logging.go (3)
  • Error (305-307)
  • Warn (300-302)
  • Trace (285-287)
🪛 markdownlint-cli2 (0.18.1)
docs/getting-started/kubernetes/index.md

229-229: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Run end-to-end tests
  • GitHub Check: Run unit tests
  • GitHub Check: Lint Go code
  • GitHub Check: Build & cache Go code
  • GitHub Check: Build and push image
  • GitHub Check: Analyze (go)
🔇 Additional comments (22)
test/e2e/fixture/argoclient.go (1)

316-336: LGTM!

The environment variable check is a clean optimization that avoids unnecessary Kubernetes API calls when the server address is explicitly provided. The fallback logic is preserved, maintaining backward compatibility.

hack/dev-env/start-agent-managed.sh (3)

37-46: LGTM!

The Redis TLS detection logic is clean and provides helpful user guidance. Checking for the CA certificate presence is the right approach to determine whether TLS should be enabled.


48-61: LGTM!

The Redis address handling with sensible defaults and clear port-forward instructions is well-designed for local development workflows.


76-90: LGTM!

The startup command properly integrates the Redis TLS arguments with existing mTLS configuration. The variable expansion and flag ordering are correct.

hack/dev-env/start-principal.sh (2)

23-29: LGTM!

The Redis address defaulting is correctly implemented. As noted in the past review, this script no longer starts its own port-forward, avoiding conflicts with Procfile.e2e while providing a sensible default for TLS-friendly connections.


42-62: LGTM!

The Redis TLS configuration properly checks for all required certificate files and constructs the appropriate arguments. The inline comments about SANs (localhost, rathole-container-internal, local IP) are helpful for understanding the certificate requirements.

agent/agent.go (2)

323-343: LGTM!

The TLS configuration for cluster cache Redis is well-implemented:

  • Proper TLS 1.2 minimum version
  • Warning log for insecure mode (addresses past review feedback)
  • Clean CA certificate loading with descriptive error messages
  • Appropriate error handling

443-460: LGTM!

The updated cluster cache info logic is an improvement:

  • Immediate update on startup provides faster feedback
  • Consistent behavior for both agent modes
  • Proper cleanup with ticker.Stop()
test/e2e/fixture/cluster.go (5)

180-201: LGTM!

Using InsecureSkipVerify: true for E2E tests is acceptable to accommodate dynamic LoadBalancer addresses (as noted in the PR objectives). The TLS encryption is retained, which still provides value for testing the TLS code paths.


206-218: LGTM!

The generous timeout and connection pool settings are appropriate for E2E test environments, especially considering the port-forward latency mentioned in the comments. The retry configuration with exponential backoff is sensible.


298-327: LGTM!

The Redis address resolution with multiple fallbacks (LoadBalancer ingress → LoadBalancerIP → ClusterIP) is robust and handles various cluster configurations. The environment variable override for local development is a good addition. Setting ManagedAgentRedisTLSEnabled = true aligns with the PR objective of Redis TLS being required for E2E tests.


359-387: LGTM!

The principal Redis configuration mirrors the managed agent approach with the same robust fallback chain. Consistent behavior across both configurations is good for maintainability.


226-267: Verify cleanup function is invoked at test suite end.

The Redis client caching infrastructure prevents connection leaks. Confirm that CleanupRedisCachedClients() is called in your test suite's teardown or cleanup phase to ensure cached connections are properly released.

test/e2e/sync_test.go (1)

369-378: LGTM!

The pre-sync hook Job name correction from "pre-post-sync-before" to "before" aligns the test expectations with the actual test fixture. This is a straightforward test maintenance update.

Also applies to: 464-473

docs/configuration/redis-tls.md (4)

1-68: LGTM!

The overview and architecture sections provide clear explanations of the three TLS configuration points (Redis Proxy Server TLS, Upstream Redis TLS, Agent Redis TLS) and how they fit together. The text-based architecture diagram effectively illustrates the flow.


70-247: LGTM!

The quick start and local development sections are comprehensive and practical:

  • Clear statement that Redis TLS is required for E2E tests
  • Well-structured explanations of local vcluster vs. remote vcluster setups
  • Reverse tunnel documentation addresses a real need
  • Manual testing steps align with the provided scripts

306-368: LGTM!

The configuration tables provide excellent reference documentation with all flags, environment variables, and defaults clearly documented. The distinction between Kubernetes/Helm defaults (true) and CLI defaults (false) is important and well-noted.


468-700: LGTM!

The troubleshooting section addresses real-world issues comprehensively:

  • Script output examples help users understand what to expect
  • Common problems (connection refused, certificate verification, port-forward instability) are well-covered
  • Security best practices provide appropriate guidance
  • Solutions are actionable and clear
hack/dev-env/Procfile.e2e (1)

1-7: Verify environment variables are set.

The Procfile uses environment variables MANAGED_AGENT_REDIS_ADDR and AUTONOMOUS_AGENT_REDIS_ADDR on lines 6-7 for agent startup. Ensure these are exported by the parent script (e.g., start-e2e or Makefile) before invoking this Procfile.

test/e2e/rp_test.go (1)

295-305: Fixture-based Argo endpoint and admin secret lookup looks good

Switching to fixture.GetArgoCDServerEndpoint and fixture.GetInitialAdminSecret keeps this suite consistent with other e2e tests and centralizes TLS/endpoint logic; the updated NewArgoClient usage is sound and maintains the existing behavior.

hack/dev-env/gen-redis-tls-certs.sh (1)

14-27: Redis TLS cert generation script is sound and addresses earlier concerns

The script now:

  • Lets openssl errors surface (no 2>/dev/null) under set -e.
  • Generates appropriate SANs for control-plane, proxy, and agent Redis instances.
  • Only appends IP.3 = ${LOCAL_IP} when LOCAL_IP is non-empty, avoiding invalid SANs.

This looks correct and is appropriately idempotent for dev/e2e usage.

Also applies to: 60-91, 105-136

docs/getting-started/kubernetes/index.md (1)

159-230: Redis TLS documentation is consistent with the tooling and default behavior

The new sections:

  • Clearly mark Redis TLS as required and walk through CA/server cert generation, argocd-redis-tls secret creation, and TLS enablement on both control-plane and workload clusters.
  • Reuse the same CA for workload Redis and mirror the deployment patches used by the dev/e2e scripts.
  • Cross-link to the dedicated Redis TLS configuration doc.

This aligns with the CLI flags and helper scripts added in this PR and provides a practical, copy-pastable path for users.

Also applies to: 337-381, 646-646

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redis proxy should support TLS (inbound and outbound)

2 participants