-
Notifications
You must be signed in to change notification settings - Fork 1.5k
CORS-3657: Default Azure Installs to Marketplace Images #10076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@patrickdillon: This pull request references CORS-3657 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cc @sadasu |
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool 👍 I just have a few code comments above.
pkg/asset/rhcos/image.go
Outdated
| azi = ext.Marketplace.Azure.NoPurchasePlan.Gen2.URN() | ||
| if gen == "V1" { | ||
| if mkt.Azure.NoPurchasePlan.Gen1 == nil { | ||
| return "", fmt.Errorf("a HyperVGeneration 1 instance was selected but no Gen1 marketplace imagge is available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return "", fmt.Errorf("a HyperVGeneration 1 instance was selected but no Gen1 marketplace imagge is available") | |
| return "", fmt.Errorf("a HyperVGeneration 1 instance was selected but no Gen1 marketplace image is available") |
nit: typo 😁
e93be5a to
257c891
Compare
|
Feedback has been addressed. |
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Code looks good to m 🚀
|
@gpei For testing, I think the main areas not covered by these existing e2es are:
|
|
Thanks for the suggestions @patrickdillon, I ran a set of tests on features that I think are most likely affected by the boot image change. Besides the GovCloud and Gen 1 VM type (which you mentioned), these features also include the following:
Overall ResultsMost of the functions work correctly, including the two test points you specifically pointed out. During installation, the boot image specified in the
Worker Nodes with HyperV Gen1 SupportFor worker nodes using a type that only supports HyperV Gen1: Test Artifacts: machines.json The workers are provisioned from the Gen1 image correctly "image": {
"offer": "aro4",
"publisher": "azureopenshift",
"resourceID": "",
"sku": "aro_419",
"type": "MarketplaceNoPlan",
"version": "419.6.20250523"
}MAG (Azure Gov cloud) JobTest Artifacts: machines.json The master and worker machines are provisioned from the image specified in "image": {
"offer": "aro4",
"publisher": "azureopenshift",
"resourceID": "",
"sku": "419-v2",
"type": "MarketplaceNoPlan",
"version": "419.6.20250523"
}Failure scenariosCurrently, two test cases have been identified as problematic: one related to ConfidentialVM and the other to ASH. Issue #1: ConfidentialVMProblem DescriptionFor the ConfidentialVM testing on compute nodes, the worker machines failed to be provisioned because the image doesn't have ConfidentialVM setting enabled. Error Log Location: machine-controller.logError Message: I consulted @jinyunma about this error, and she told me that a previous bug (OCPBUGS-41300) that also had a similar error. We updated the image's Issue #2: Azure Stack Hub (ASH)Problem DescriptionAnother issue is on ASH. I noticed that the Error Log Location: machine-controller.logError Message: Failed Image Configuration in the worker machineset"image": {
"offer": "",
"publisher": "",
"resourceID": "/subscriptions/d751283a-64fa-401b-92a1-58f1750ac0a7/resourceGroups/ci-op-8sfli1mq-9cb48/providers/Microsoft.Compute/images/ci-op-8sfli1mq-9cb48-t2glj",
"sku": "",
"version": ""
}Working Image Configuration (Normal Nightly payload on ASH) in the worker machinesetWhile in a normal nightly payload ASH installation where the workers are successfully provisioned, the image configuration is: "image": {
"offer": "",
"publisher": "",
"resourceID": "/resourceGroups/ci-op-8bfzbv3z-0da13/providers/Microsoft.Compute/images/ci-op-8bfzbv3z-0da13-97mgz",
"sku": "",
"version": ""
}Notice: The ARM Architecture TestingCurrent StatusBecause the cluster-bot can't build ARM arch payload, so the ARM installation test can't be completed for now. We can only wait until this PR is merged into the nightly payload. While in the test results I can see so far, the bootstrap and master created by the installer correctly use the ARM image specified in the Output: Summary
|
|
/hold Looks like there's a bug in how the azure stack image is specified. Even more so, I think we are blocked until we can get confidential image support. Thanks for the awesome help @gpei |
257c891 to
6ec00c7
Compare
|
@gpei I checked into the confidential VM situation, and it would require separate marketplace images for both ConfidentialVM and Trusted Launch. So we are going to stick with image galleries for that scenario (and move everything else to marketplace). The latest commit SHOULD (I admit I have not tested) handle confidential VM, as well as a fix for azurestack. It looks like the e2e should cover azurestack. Can you test confidential VM? |
|
@patrickdillon Ack, thanks for the quick fix! |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-confidentialvm-vmgueststateonly-mini-perm-f7 |
|
@gpei: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7528b8c0-c4f5-11f0-9ff3-80dfb6c3839d-0 |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-confidential-trustedlaunch-mini-perm-f7 |
|
@gpei: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a65cda20-c4f5-11f0-9f5b-3e54ea60f0e6-0 |
|
/test e2e-azurestack |
|
Morning Patrick. It appears that the ASH issue has been fixed. I can see that the worker was successfully created in another test job. The ultimate failure is the API version incompatibility issue we already know about. Regarding Confidentialvm, in addition to the problem that Thuan pointed out that the custom image was not created during the confidentialvm installation, there might also be a potential permissions issue. I see the following error in azure-ipi-confidentialvm-vmgueststateonly-mini-perm job, which looks like missing But, this permission should have been granted in our CI when testing the Azure minimal permission installation, so this may need further investigation. |
6ec00c7 to
dad8a4e
Compare
|
/payload-job periodic-ci-openshift-verification-tests-main-installation-nightly-4.21-azure-ipi-confidentialvm-vmgueststateonly-no-mini-perm-f7 |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-confidential-trustedlaunch-mini-perm-f7 |
|
@patrickdillon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e96b2fa0-c712-11f0-912f-90c85796e079-0 |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-confidentialvm-vmgueststateonly-f28-destructive |
|
@patrickdillon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f97e0610-c712-11f0-90f3-77d70f3c8aca-0 |
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Waiting on testing 😁
Unfortunately we have a lot of big functions in the installer, and that is not likely to change. Therefore bumping the cyclomatic complexity threshold so the linter starts complaining at a threshold of 40 rather than 30. Also remove the tenv linter as it is deprecated.
Marketplace images do not support confidential VMs or trusted launch, so when machinesets use confidential VMs the installer will still create an image gallery compatible with the security settings.
This commit updates default value handling when loading the install config to set values in machine pools based on the defaultMachinePlatform. By populating the values directly in the install config, we can avoid repetitive checks throughout the codebase to ensure the default machine platform is applied to the relevant machine pool.
89e5f67 to
8fa9860
Compare
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-confidentialvm-vmgueststateonly-f28-destructive |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.21-amd64-nightly-azure-ipi-confidential-trustedlaunch-mini-perm-f7 |
|
@patrickdillon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1af02bf0-c71a-11f0-9600-422088630755-0 |
|
@patrickdillon: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/253e91a0-c71a-11f0-9004-bb53ea4d7b62-0 |
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tthvo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-azurestack Job was unable to connect to the instance. If we can't confirm testing on this, let's move forward and we can fix in a followup bug. |
|
looks like azurestack reached install phase, will keep watching that but I'm going to go ahead and verify this to get it into the merge queue /verified by a group effort |
|
@patrickdillon: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override ci/prow/e2e-aws-ovn rhel repo outage |
|
@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-aws-ovn In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Confidential and ASH jobs are using the correct images, all the master and worker machines were created as expected. |
6661dac
into
openshift:main
Switch to using Azure marketplace images by default. This will bypass the steps where the installer uploads a VHD and creates a managed image for the cluster, which will still happen on OKD (we are investigating whether a public Shared Image Gallery can be used to achieve the same results for OKD).
The code in the last commit is ripe for refactoring, but the refactoring would be substantial and this PR is already complex enough, so that will need to be a follow up.