-
Notifications
You must be signed in to change notification settings - Fork 462
MCO-1961: Rework MC's OSImageURL merge logic #5488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-1961: Rework MC's OSImageURL merge logic #5488
Conversation
|
@pablintino: This pull request references MCO-1961 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally makes sense to me, one thought inline
pkg/controller/template/render.go
Outdated
| // Note: Previously, the OSImageURL was set here (as well as in the rendered MC) to facilitate | ||
| // overrides. Now, all the OSImageURL's from generated MCs are ignored and only user provided | ||
| // MCs with the OSImageURL set are considered during the merge process. | ||
| // mcfg.Spec.OSImageURL = ctrlcommon.GetDefaultBaseImageContainer(config.ControllerConfigSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking through the possibilities, I think this PR should be safe and gets us to a more clear state. The two issues I can think of with this change is:
- as the original comment states, for clusters that upgrade to a version with this change, their OSImageURL in any MCs with
GeneratedByControllerVersionAnnotationKeywill no longer get updated, essentially freezing them in time. Now, they don't actually get used anymore with your above change, but it may be confusing for a user to see a really old osimage referenced in MCs, and not be used. Perhaps we should add logic to explicitly disable that somewhere - with this change in place, new clusters will no longer have any osImageURL by default in any component MachineConfig. So if I e.g. had an OSImage override and now want to remove it, I can't immediately see from MCs what image I'll be going back to until the MCO renders it. This is probably fine since I doubt anyone actually needed that info, and we can point users to look at available osimagestreams instead with the new changes. Just wanted to leave that as a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuqi-zhang I agree on both points. For the first one I don't have a better option than documenting how the image URLs work (maybe we can try to leverage the documentation of the upcoming dual-stream).
For the former, I've explicitly cleared the URL so upgrading clusters will reset their template MC's URLs.
0cf18d5 to
5844300
Compare
Template-generated MachineConfigs (e.g., 01-master-kubelet, 01-master-container-runtime) were setting OSImageURL on the final rendered MachineConfig, even though only user-provided MachineConfigs should be able to override this field. The root cause was that template generation explicitly sets OSImageURL on all generated MCs, and the merge logic (MergeMachineConfigs) was treating all MCs equally when determining the final OSImageURL value. This meant template-generated MCs would always propagate the base OS image URL to the rendered MC, making it impossible for the system to distinguish between a default value and an intentional override. This commit fixes the issue by modifying MergeMachineConfigs to skip any MachineConfig with the machineconfiguration.openshift.io/generated-by-controller-version annotation when evaluating OSImageURL and BaseOSExtensionsContainerImage overrides. This ensures that only user-provided MachineConfigs can override these fields, while still allowing template-generated MCs to have the field populated (which is necessary due to resourcemerge not blanking out previously-set values during upgrades). The same logic is applied to BaseOSExtensionsContainerImage for consistency.
5844300 to
9d52d68
Compare
| return ctrl.syncFailingStatus(cfg, err) | ||
| } | ||
|
|
||
| // TODO: To be removed when 4.22 is EOL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we just be able to remove this in 4.23? Or is that what you meant here (4.22 EoL seems to imply the full 3-year(?) cycle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuqi-zhang Well, I think this code should be in place till all the legacy supported clusters are migrated or EoL. I mean, if someone with a 4.12 cluster upgrades, I'll want to clear his MCs.
Cause, now thinking, is it mandatory to upgrade versions one by one? If so, yes, if they need to pass through 4.22, we can remove the code in 4.23.
| // clearImageURLs clears OSImageURL and BaseOSExtensionsContainerImage from template-generated | ||
| // MachineConfigs. This is a one-time migration operation to remove these fields from generated | ||
| // MachineConfigs, as they should only be set on rendered or user-provided MachineConfigs. | ||
| // The function uses sync.Once to ensure it runs only once per controller lifecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controller lifecycle = each time the pod restarts in this case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct!
|
@pablintino: This pull request references MCO-1961 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/lgtm Thanks for the discussion and testing. Let's get this in for dual stream |
|
@yuqi-zhang: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pablintino, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@pablintino: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
2c3a1eb
into
openshift:main
- What I did
Template-generated MachineConfigs (e.g., 01-master-kubelet, 01-master-container-runtime) were setting OSImageURL on the final rendered MachineConfig, even though only user-provided MachineConfigs should be able to override this field.
The root cause was that template generation explicitly sets OSImageURL on all generated MCs, and the merge logic (MergeMachineConfigs) was treating all MCs equally when determining the final OSImageURL value. This meant template-generated MCs would always propagate the base OS image URL to the rendered MC, making it impossible for the system to distinguish between a default value and an intentional override.
This commit fixes the issue by modifying MergeMachineConfigs to skip any MachineConfig with the
machineconfiguration.openshift.io/generated-by-controller-version annotation when evaluating OSImageURL and BaseOSExtensionsContainerImage overrides. This ensures that only user-provided MachineConfigs can override these fields, while still allowing template-generated MCs to have the field populated (which is necessary due to resourcemerge not blanking out previously-set values during upgrades).
The same logic is applied to BaseOSExtensionsContainerImage for consistency.
- How to verify it
Ideally, we would like to test that running clusters:
- Description for the changelog
Prevent template MachineConfigs from overriding OSImageURL