Skip to content

AWS + Azure: add read-only sanity checks + AWS RI exchange#3

Open
crisjermaglasang wants to merge 5 commits intomainfrom
dev
Open

AWS + Azure: add read-only sanity checks + AWS RI exchange#3
crisjermaglasang wants to merge 5 commits intomainfrom
dev

Conversation

@crisjermaglasang
Copy link
Collaborator

@crisjermaglasang crisjermaglasang commented Feb 11, 2026

  • Adds v1 AWS & Azure deployment sanity checks (dry-run / read-only) with JSON reports + CI workflows.
  • Adds AWS Convertible RI exchange: quote-only by default; execute requires explicit ack + spend cap.

@crisjermaglasang crisjermaglasang changed the title Azure AWS + Azure: add read-only sanity checks + AWS RI exchange Feb 11, 2026
@cristim
Copy link
Member

cristim commented Feb 12, 2026

@crisjermaglasang thank you for your contribution, really appreciate it.

I asked Claude Code to help me review it and it came up with the below output, which I largely agree with.

Please address the below issues before we can merge this.

PR #3 Review: AWS + Azure read-only sanity checks + AWS RI exchange

+1025 / -1 across 17 files (3 commits)


Summary

Adds three features:

  1. AWS sanity checks — read-only STS/EC2/RDS probes with JSON report output
  2. Azure sanity checks — read-only az CLI probes with JSON report output
  3. AWS Convertible RI exchange — quote-only by default; execute requires --ack YES + spend cap

The RI exchange safety model is well-designed (quote-first, explicit ack, spend cap guardrail). The sanity check framework is clean and useful. However, there are several issues that should be addressed before merging.


Issues Found

Critical: Compiled binaries committed to the repo

Three compiled binaries are included in the diff:

  • sanity (root)
  • azure-sanity (root)
  • ri-exchange (root)

These should not be checked into git. Add them to .gitignore.

Critical: Sensitive data committed in report files

Three JSON report files contain real credentials/identifiers:

  • sanity_report.json — AWS account ID 816582314462, IAM ARN with username, IAM user ID AIDA34IAZJHPPYVORASPA
  • azure_sanity_report.json — Azure tenant ID, subscription ID, subscription name, user email
  • ri_exchange_quote.json — AWS account ID

These files are test artifacts that should be .gitignore-d, not committed. Even though the info is relatively low-risk, it's bad practice to commit cloud account identifiers to a public repo.

Critical: CI workflows won't trigger from current location

The GitHub Actions workflows are placed at:

ci_cd_sanity_tests/.github/workflows/aws_sanity.yml
ci_cd_sanity_tests/.github/workflows/azure_sanity.yml

GitHub only picks up workflows from .github/workflows/ at the repository root. These workflows will never run from their current location.

Major: Hardcoded AWS account ID in CI workflow

ci_cd_sanity_tests/.github/workflows/aws_sanity.yml line 16:

EXPECTED_ACCOUNT: "816582314462" # change per test account

This should reference a GitHub secret, not be hardcoded.

Major: Azure sanity uses shell-out + fragile string matching

ci_cd_sanity_tests/pkg/sanity/azure/azure.go shells out to az CLI and validates subscription/tenant IDs with strings.Contains:

if !strings.Contains(string(out), fmt.Sprintf(`"id": "%s"`, opts.ExpectedSubID)) {

This is fragile — it depends on exact JSON formatting from the CLI (key spacing, quote style). Since the rest of the codebase uses Azure SDK v2, the sanity checks should use it too, or at minimum use encoding/json to parse the output properly.

Minor: ci_cd_sanity_tests/providers/aws/client.go is unused

The Clients struct and NewClients function are never referenced. The AWS sanity checks (aws.go) create clients inline, and the RI exchange code (ri_exchange.go) has its own loadCfg helper. This file should either be used or removed.

Minor: No unit tests

Despite the PR being called "sanity tests", none of the Go packages include unit tests. The RI exchange logic in particular (spend cap comparison, decimal parsing, quote validation) would benefit from table-driven tests.

Minor: az account show called twice

In azure.go, az account show -o json is called once unconditionally (line 87), then again if ExpectedSubID or ExpectedTenantID is set (line 94). The second call could just reuse the output from the first.

Nit: go.mod change

The sts dependency moves from indirect to direct — this is correct since the new code directly imports sts.


What the PR Gets Right

  • RI exchange safety model is solid — quote-first flow, explicit --ack YES gating, and --max-payment-due-usd spend cap with big.Rat precision for financial comparison
  • Account assertion guard — both sanity and RI exchange verify the active AWS account matches expectations before proceeding
  • Clean report frameworkreport.Report is reusable across cloud providers with consistent JSON output
  • Read-only design — sanity checks are genuinely non-mutating (Describe/List calls only)

Verdict

Good concept, needs cleanup before merge. The committed binaries, sensitive report files, and misplaced CI workflows are the blockers. The Azure string-matching fragility and missing tests are secondary concerns.

Recommended: Request changes — remove binaries and report files, add them to .gitignore, move CI workflows to .github/workflows/, remove the hardcoded account ID, and delete the unused client.go.


🤖 Generated with Claude Code

@cristim
Copy link
Member

cristim commented Feb 18, 2026

Note: This review was generated by Claude (AI). Some observations may be inaccurate — please verify against the actual code.

@crisjermaglasang Here's a thorough review of this PR:

Summary

This PR adds three new capabilities under ci_cd_sanity_tests/:

  1. AWS sanity checks — read-only dry-run (STS identity, EC2/RDS list) with JSON report
  2. Azure sanity checks — read-only dry-run (account, resource groups, VMs) with JSON report
  3. AWS Convertible RI exchange — quote-only by default, execute requires explicit --ack YES + spend cap

Plus two CI workflows (.github/workflows/aws_sanity.yml, azure_sanity.yml) and .gitignore updates.


Critical Issues

1. Binary blobs permanently in git history

The first three commits added compiled Go binaries to the repo:

  • sanity (13 MB) in commit 7a3061c1
  • ri-exchange (12.4 MB) in commit 441e807a
  • azure-sanity (3.2 MB) in commit 44421f4a

These were deleted in commit f76602d7, but they're still permanently in git history, bloating the repo by ~28 MB. The .gitignore was updated to prevent future accidents, but the damage is done. Consider squashing or using git filter-repo before merging, or at minimum squash-merging this PR.

2. go.mod: sts moved from indirect to direct dependency

The only change to go.mod is moving github.com/aws/aws-sdk-go-v2/service/sts from // indirect to a direct require. This is correct since ci_cd_sanity_tests/pkg/sanity/aws/aws.go and ri_exchange.go import STS directly. However, go.sum doesn't appear to be updated in the PR diff — verify this compiles cleanly.


Security Concerns

3. Hardcoded AWS account ID in early commit

The initial commit (7a3061c1) had the workflow file with a hardcoded account ID:

EXPECTED_ACCOUNT: "816582314462" # change per test account

This was later changed to use ${{ secrets.AWS_EXPECTED_ACCOUNT_ID }} in the cleanup commit. However, the hardcoded value is still in git history. While AWS account IDs aren't secret per se, it's worth being aware of.

4. Azure sanity uses exec.CommandContext to shell out to az CLI

In ci_cd_sanity_tests/pkg/sanity/azure/azure.go, the Azure checks shell out to the az CLI via exec.CommandContext. Concerns:

  • No input sanitization on opts.SubscriptionID — it's passed directly to exec.Command args. While exec.Command doesn't invoke a shell (so no injection), the subscription ID could theoretically contain unusual characters.
  • Combined output is captured via cmd.CombinedOutput() and stored in the JSON report details. If stderr contains sensitive information (e.g., token errors), it ends up in the report artifact uploaded to GitHub.
  • Inconsistency: AWS uses the Go SDK; Azure shells out. Consider using the Azure SDK for Go instead for consistency, testability, and avoiding a runtime dependency on the az CLI.

Design & Code Quality

5. Dead code removed in cleanup

ci_cd_sanity_tests/providers/aws/client.go was added in the first commit and deleted in the cleanup commit. It was a thin wrapper creating EC2/RDS/STS clients that was never imported (aws.go creates clients inline). Clean history via squashing would eliminate this churn.

6. The report.go package is well-designed

Clean, minimal JSON report structure with PASS/FAIL/SKIP statuses, timestamps, and details. Good reuse between AWS and Azure. The HasFailures() method provides a simple exit-code mechanism. Nice work.

7. RI exchange has good safety guardrails

  • Quote-only by default (--execute flag required)
  • Explicit --ack YES required for execution
  • --max-payment-due-usd spend cap with big.Rat precision
  • Account ID verification before execution
  • Pre-exchange quote + validation check

This is solid design for a potentially costly operation.

8. Error handling in ri-exchange/main.go write() function

func write(v any, path string) {
    b, _ := json.MarshalIndent(v, "", "  ")
    _ = os.WriteFile(path, b, 0644)
}

Both errors are silently discarded. If the marshal or write fails, the user gets no indication. At minimum, log to stderr.

9. CI workflows run go test ./... — too broad

Both workflows run go test ./... -count=1 in the build step. This runs all tests in the repo, not just the sanity test code. This will be slow and could fail for reasons unrelated to the sanity checks. Consider scoping to go test ./ci_cd_sanity_tests/... -count=1.

10. No go.sum update

The PR moves STS from indirect to direct in go.mod but there's no corresponding go.sum change. This may cause go mod tidy to complain or the build to fail.


Minor Issues

11. File permissions

All new files under ci_cd_sanity_tests/ were added with 100755 (executable). Go source files should typically be 100644.

12. Commit message style inconsistency

  • Sanity_Tests for AWS — uses underscore, not conventional commit format
  • Other commits are descriptive but lack scope prefixes
  • Consider standardizing to feat(ci): ... or test(sanity): ...

13. Azure timeout has unexplained +30s padding

ctx, cancel := context.WithTimeout(context.Background(), time.Duration(*timeoutSec+30)*time.Second)

The extra 30s buffer beyond the user-specified timeout is undocumented. Should be documented or removed.

14. Sample output files in git history

sanity_report.json and ri_exchange_quote.json were committed and then deleted. They contained real AWS account data (account IDs, ARNs). Squashing would clean this up.


Recommendations

  1. Squash-merge this PR to eliminate binary blobs and sensitive data from history
  2. Add Azure SDK for Go instead of shelling out to az CLI
  3. Scope CI test runs to ./ci_cd_sanity_tests/... instead of ./...
  4. Fix error handling in the write() helper
  5. Fix file permissions to 644 for .go and .yml files
  6. Run go mod tidy to ensure go.sum is consistent

Overall the architecture is sound — the sanity check framework is clean and extensible, and the RI exchange has appropriate safety mechanisms. The main concerns are repo hygiene (binaries in history, file permissions) and the Azure CLI shelling approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments