add a script to verify PRR has all the sections #5685

kannon92 · 2025-11-09T02:29:44Z

One-line PR description: Add a script to verify PRR template.

Issue link:

Other comments:

k8s-ci-robot · 2025-11-09T02:29:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kannon92
Once this PR has been reviewed and has the lgtm label, please assign dims for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2025-11-12T20:59:55Z

Didn't we already have such a proposal before? I'm not sure anymore and a quick search didn't find it.

kannon92 · 2025-11-13T14:43:04Z

Didn't we already have such a proposal before? I'm not sure anymore and a quick search didn't find it.

I haven't heard anything concrete.

The one I heard about was the move the KEP issue metadata to the KEP but we never acted on that.

pohly · 2025-11-17T10:35:05Z

hack/verify-prr-template.py

+
+# Define the expected PRR structure
+# Each subsection contains a list of required question patterns (first few words to match)
+PRR_STRUCTURE = {


Can you perhaps load this from the template file? Then there's a single source of truth for which sections are required.

Agreed. You could simplify this a lot by just using mdtoc (the tool we use to generate the TOC):

KEP_TEMPLATE="keps/NNNN-kep-template/README.md" PRR_FILTER='/^- \[Production Readiness Review Questionnaire\]/ {flag=1; print; next} /^- \[/ {flag=0} flag' TEMPLATE_PRR="$(mdtoc $KEP_TEMPLATE | awk "$PRR_FILTER")" KEP_PRR="$(mdtoc $KEP | awk "$PRR_FILTER")" diff <(echo "$TEMPLATE_PRR") <(echo "$KEP_PRR")

e.g. using KEP="keps/sig-node/1287-in-place-update-pod-resources/README.md" I get:

3,7d2 < - [How can this feature be enabled / disabled in a live cluster?](#how-can-this-feature-be-enabled--disabled-in-a-live-cluster) < - [Does enabling the feature change any default behavior?](#does-enabling-the-feature-change-any-default-behavior) < - [Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?](#can-the-feature-be-disabled-once-it-has-been-enabled-ie-can-we-roll-back-the-enablement) < - [What happens if we reenable the feature if it was previously rolled back?](#what-happens-if-we-reenable-the-feature-if-it-was-previously-rolled-back) < - [Are there any tests for feature enablement/disablement?](#are-there-any-tests-for-feature-enablementdisablement) 9,12d3 < - [How can a rollout or rollback fail? Can it impact already running workloads?](#how-can-a-rollout-or-rollback-fail-can-it-impact-already-running-workloads) < - [What specific metrics should inform a rollback?](#what-specific-metrics-should-inform-a-rollback) < - [Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?](#were-upgrade-and-rollback-tested-was-the-upgrade-downgrade-upgrade-path-tested) < - [Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?](#is-the-rollout-accompanied-by-any-deprecations-andor-removals-of-features-apis-fields-of-api-types-flags-etc) 14,18d4 < - [How can an operator determine if the feature is in use by workloads?](#how-can-an-operator-determine-if-the-feature-is-in-use-by-workloads) < - [How can someone using this feature know that it is working for their instance?](#how-can-someone-using-this-feature-know-that-it-is-working-for-their-instance) < - [What are the reasonable SLOs (Service Level Objectives) for the enhancement?](#what-are-the-reasonable-slos-service-level-objectives-for-the-enhancement) < - [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) < - [Are there any missing metrics that would be useful to have to improve observability of this feature?](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature) 20d5 < - [Does this feature depend on any specific services running in the cluster?](#does-this-feature-depend-on-any-specific-services-running-in-the-cluster) 22,28d6 < - [Will enabling / using this feature result in any new API calls?](#will-enabling--using-this-feature-result-in-any-new-api-calls) < - [Will enabling / using this feature result in introducing new API types?](#will-enabling--using-this-feature-result-in-introducing-new-api-types) < - [Will enabling / using this feature result in any new calls to the cloud provider?](#will-enabling--using-this-feature-result-in-any-new-calls-to-the-cloud-provider) < - [Will enabling / using this feature result in increasing size or count of the existing API objects?](#will-enabling--using-this-feature-result-in-increasing-size-or-count-of-the-existing-api-objects) < - [Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?](#will-enabling--using-this-feature-result-in-increasing-time-taken-by-any-operations-covered-by-existing-slisslos) < - [Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?](#will-enabling--using-this-feature-result-in-non-negligible-increase-of-resource-usage-cpu-ram-disk-io--in-any-components) < - [Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?](#can-enabling--using-this-feature-result-in-resource-exhaustion-of-some-node-resources-pids-sockets-inodes-etc) 30,32d7 < - [How does this feature react if the API server and/or etcd is unavailable?](#how-does-this-feature-react-if-the-api-server-andor-etcd-is-unavailable) < - [What are other known failure modes?](#what-are-other-known-failure-modes) < - [What steps should be taken if SLOs are not being met to determine the problem?](#what-steps-should-be-taken-if-slos-are-not-being-met-to-determine-the-problem)

(the PRR sections in that KEP need to be converted to headings)

Agreed. You could simplify this a lot by just using mdtoc (the tool we use to generate the TOC):

Yeah.

https://github.com/kubernetes/enhancements/pull/5267/files#diff-74909a499e906551a30d329e8eb665f23f5c23f9d9602da9e53066b3b7507c25R89-R95

Also, we should use git instead of walking the files ourselves, because then aside from the diff we're also aware of gitignore etc.

pohly · 2025-11-17T10:39:25Z

hack/prr-baseline.txt

@@ -0,0 +1,372 @@
+# This file contains KEPs that are allowed to fail PRR validation
+# These are existing KEPs that were created before the requirement was enforced
+# New KEPs must pass validation - do not add new entries to this file manually


So new KEPs are covered. But that's not really where we have problems, right? New KEPs copy the current template and thus typically have all section.

More interesting are old KEPs which need to be updated when extending the template. How will that be handled? It looks like extending the template and the list of required sections will cause verification failures for all KEPs not in this list.

Is the plan then to update the list? Then we don't solve the problem of ensuring that KEPs e.g. in alpha go to beta with an up-to-date PRR.

See https://github.com/kubernetes/kubernetes/blob/8d450ef773127374148abad4daaf28dac6cb2625/hack/verify-golangci-lint.sh#L63 and https://github.com/kubernetes/kubernetes/blob/8d450ef773127374148abad4daaf28dac6cb2625/hack/verify-golangci-lint-pr-hints.sh#L31 for an example of how to run the check against files touched in a PR.

BenTheElder · 2025-11-18T20:17:38Z

Didn't we already have such a proposal before? I'm not sure anymore and a quick search didn't find it.

Yes, I had difficulty getting feedback on it so I never finished it but there's been an open PR in #5267

BenTheElder · 2025-11-18T20:18:04Z

Sadly it got auto-closed without review :/

EDIT: reopened ...

kannon92 · 2025-11-18T20:57:35Z

/close

Moving this to @BenTheElder approach.

k8s-ci-robot · 2025-11-18T20:57:42Z

@kannon92: Closed this PR.

In response to this:

/close

Moving this to @BenTheElder approach.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

add a script to verify PRR has all the sections

65de8de

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 9, 2025

k8s-ci-robot requested review from johnbelamaric and palnabarun November 9, 2025 02:29

k8s-ci-robot added area/enhancements Issues or PRs related to the Enhancements subproject size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 9, 2025

pohly reviewed Nov 17, 2025

View reviewed changes

k8s-ci-robot closed this Nov 18, 2025

add a script to verify PRR has all the sections #5685

add a script to verify PRR has all the sections #5685

Conversation

kannon92 commented Nov 9, 2025

Uh oh!

k8s-ci-robot commented Nov 9, 2025

Uh oh!

pohly commented Nov 12, 2025

Uh oh!

kannon92 commented Nov 13, 2025

Uh oh!

pohly Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

tallclair Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

BenTheElder Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

BenTheElder Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

pohly Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

tallclair Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

BenTheElder commented Nov 18, 2025

Uh oh!

BenTheElder commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kannon92 commented Nov 18, 2025

Uh oh!

k8s-ci-robot commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BenTheElder commented Nov 18, 2025 •

edited

Loading