-
Notifications
You must be signed in to change notification settings - Fork 41
Release preparation script #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Release preparation script #231
Conversation
Set up the armature of a release script: command line interface, information about the build tree, abstract class for release preparation steps. Signed-off-by: Gilles Peskine <[email protected]>
That made the build non-reproducible. Addresses Mbed-TLS/mbedtls#9521 Signed-off-by: Gilles Peskine <[email protected]>
Modern versions of Python emit a DeprecationWarning about `datetime.datetime.utcfromtimestamp()`. It didn't really matter here, but do stop using it. Signed-off-by: Gilles Peskine <[email protected]>
Create a release archive containing the git content and the generated files (including files from submodules), with `CMakeLists.txt` edited to turn off `GEN_FILES` by default. Arrange for the timestamps of generated files to be more recent than the timestamps of files checked into git, in a reproducible way. Uses GNU tar to assemble the pieces. Signed-off-by: Gilles Peskine <[email protected]>
Add a step to the release preparation script to assemble the changelog and commit that change into git. Make sure to create a changelog section for the desired version number even if there are no changes. Finalize that section by editing in a release date. Do nothing if the changelog already contains a finalized section for the desired version and release date. Signed-off-by: Gilles Peskine <[email protected]>
Fix the case when there are no pending changes in the top version section. This is not generally expected to happen, but it can happen when following the release process and there have not yet been any changes since the last release. Signed-off-by: Gilles Peskine <[email protected]>
Run `bump_version.sh` and commit that change into git. Do nothing if there is no version number changes. This commit only supports bumping the product version. Any ABI bump (SOVERSION changes) must have been done beforehand. Signed-off-by: Gilles Peskine <[email protected]>
Signed-off-by: Gilles Peskine <[email protected]>
minosgalanakis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass review. I have confirmed that given the 4.0.0 input it generated a viable artifact.
Most of my connets are intended to be considered design review suggestions. Mostly I would polish the interface with the user. What I would propose that we need
- Single step argument
- Some way to output the commands used to generate the release. It is common to be commited in the body of the commits when we are invoking a script with parameters.
- I am proposing to incorporate the bump_version into this script. We also need a way to bump the so versions.
- More printed output so the user can see progress, can know what command is executed and which step is done when things suceed or fail. This is even more essential for CI deployment.
- Clean up archives from files that do not need to be there, like .gitignore/DS files.
- Import and call assemble changelog as a python module.
- If we are modifying the --version parameter, the script could take --mbedtls-version and --tf-psa-version and package both releases in tf-psa-crypto and then mbedtls order.
Thank you for writting this script, it will greatly help in moving closer towards automating the release.
| from mbedtls_framework.build_tree import guess_project_root | ||
|
|
||
| TESTS_DIR = os.path.join(guess_project_root(), 'tests') | ||
| FRAMEWORK_DIR = os.path.join(guess_project_root(), 'framework') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply strip it using relpath?
FRAMEWORK_DIR = os.path.join(guess_project_root(), 'framework')
FRAMEWORK_DIR = os.path.relpath(FRAMEWORK_DIR, start=os.getcwd())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do all the work to construct and deconstruct a path when we already know that the end result must be 'framework'? The path is where the files are located relative to the root of the project, not related to the current directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a compromise of 'os.path.join('./', 'framework')' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's still wrong, isn't it? It would be .\framework on Windows, AFAIK. But we want a reproducible path. What's wrong with writing 'framework' when we mean 'framework'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Mind you, there may be other occurrences of os.path.join that need fixing if we want to have the same output on Windows. I haven't gone and fixed those and I don't want to extend the scope here to Windows support.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No please don't :) I doubt we will be spinning windows executors to build a release
|
|
||
| DEFAULT_OPTIONS = Options( | ||
| artifact_directory=pathlib.Path(os.pardir), | ||
| release_date=datetime.date.today().isoformat(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky. This is a sensible default from a code perpective,but it will impossible to produce a release and test/sign it off on the same day.
I would either remove the default and make arg.parser explode, or try to pass it from the 'os.env' so the release maker can set it when agreed and not bother with long command lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that the defaults would be suitable for running the script on the CI to make a mock release, and that we'd be more explicit when we run the script manually to make an actual release.
Or maybe we should change how we handle ChangeLog version sections: currently assemble_changelog.py checks the release date of the first section, and it creates a new section if it's a proper date but adds to the existing section if the date contains xx. I didn't want to change the semantics of assemble_changelog.py at the same time as introducing the release script, but there are other sensible policy. For example, we could decide that assemble_changelog.py always adds to the top section, and we create a new section as soon as a release is done. Then setting the release date would be a manual edit of ChangeLog.
| def find_gnu_tar() -> str: | ||
| """Try to guess the command for GNU tar. | ||
| This function never errors out. It defaults to "tar". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? If we don't have to handle about exceptions, you could exclitely return None if not found. Then handle it as an optional intput in Options and gracefully handle it when calling tar with an error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to make the script more complex (having a nullable tar_command) just to handle the unlikely case where GNU tar can't be found.
|
|
||
| def run(self) -> None: | ||
| """Assemble the changelog if needed.""" | ||
| subprocess.check_call(['framework/scripts/assemble_changelog.py'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels wrong to invoke another interpreter to use functionality from another python script. Our can be made to interact with other with minimal changes:
For example in assemble_changelog.py can be achieved by
- Move the argparser outside of the main and into the module.
- call 'assemble_changelog.merge_entries(options)'
Alternatively
- Use a sane default namedtuple for each python script that uses argparser ( we arleady partially do that)
Options = namedtuple("Options", ["dir", "input", "keep_entries", "output"])
DEFAULTS = Options(
dir="Changelog.d",
input="Changelog",
keep_entries=False,
output="Changelog",
)
- Then call 'assemble_changelog.merge_entries(DEFAULTS)'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? assemble_changelog has a job and it does it. The fact that it's written in Python is an implementation detail.
|
|
||
| # Files and directories that may contain version information. | ||
| FILES_WITH_VERSION = [ | ||
| 'CMakeLists.txt', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to include build_info.h in this file list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's included via include.
| mtime = self.commit_timestamp() | ||
| self.call_git(['archive', '--format=tar', | ||
| '--mtime', str(mtime), | ||
| '--prefix', prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal. .git is already ingored by git archive. How about we also ignore files like .gitingore?
GNU tar has commands like --exclude-vcs-ignores and --exclude-vcs
https://www.gnu.org/software/tar/manual/html_section/exclude.html
We could also include other temporary files
'--exclude=.DS_Store',
'--exclude=**/*.tmp',
'--exclude=**/*.swp'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we not include .gitignore in the archive? And why would we look for temporary files at time of the release? If they're already in git, they're part of the release, for good or for worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats fair. If git-archive is ignoring those files then we can avoid that. And git modules is not a bit issue nowadays that we do not update the sumobules to the restricted ones.
| calling `run()`. | ||
| """ | ||
|
|
||
| @classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an abstract method of done.
We need to discretely indicate when a step is over, which could toggle a internal boolean flag to indicate sucessfull completion. It can also include print statements, giving a report to the user of commands executed, files modified, and where the archives are created etc.
Also we could implement __enter__ and __exit__ so the Step classes are called with with(Step) to automatically call done()
e.g.
def __enter__(self):
return self
def __exit__(self, exc_type, exc, tb) -> bool:
print("{} has completed".format(self.name))
self.done()
return self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an abstract method of done.
What would done do? I added assert_preconditions because I wanted each step to default to insisting on a clean git status, otherwise they could accidentally end up committing or archiving something unintended. But I don't see a job for done.
We need to discretely indicate when a step is over, which could toggle a internal boolean flag to indicate sucessfull completion.
Why? The steps are deliberately as independent as possible, so that you can take a release candidate and tweak it into the next release candidate.
Also we could implement enter and exit so the Step classes are called with with(Step) to automatically call done()
Why? Context managers make it easier to use objects of a class in a controlled way. But here there is a single place where steps are instantiated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When something currently fails. which it did, I had no idea what was done and what was left in an incomplete state.I had to manually look into the git history and staged files to figure out why. Each step needs to communicate to the caller what it does, and when its done.
So as a role for done we could it:
- communicate to the user that the step is complete
- give artefact locations on the command line
- give warnings if something is not in the right state
- perform cleanup if required by some steps.
Why? The steps are deliberately as independent as possible, so that you can take a release candidate and tweak it into the next release candidate.
That is no longer required, as discussed above. Release candidates are recreated and no longer tweaked. The way changelog is assembled, it is cleaner and safer and easier to reproduce, when the changes are brought in and then the release is assembled.
anyway this is not a strong ask, I just found that it logically made sense since we already have a assert_preconditions as a entry point, and it would be balanced to have a matched cleanup stage.
| env=env) | ||
|
|
||
| def read_git(self, cmd: List[str], | ||
| where: Optional[PathOrString] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where can be redudant. Git supports -C to run it on a different directory
e.g
git -C mbedtls log -n 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. where is implemented as git -C with a bit of automation for relative paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have misunderstood the point of the where variable. If you want to querry a git folder that lies in a different directory than your current you can use the -C directive to point it there. So what I was saying is that the location could be part of the cmd argument.
scripts/prepare_release.py
Outdated
| version: Optional[str] | ||
|
|
||
| DEFAULT_OPTIONS = Options( | ||
| artifact_directory=pathlib.Path(os.pardir), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| artifact_directory=pathlib.Path(os.pardir), | |
| artifact_directory=os.path.join(pathlib.Path(os.pardir), self.artifact_base_name() + "_artifacts"), |
Parent director can contain a lot of files, that we may not want to override. We should create an explicit directory to store all the release related artifcats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this script should create a new directory. Especially by default: users are likely to miss it, or it could clash with something unrelated. The parent directory seems like a sensible default to me: you'll create mbedtls-1.2.3.tar.bz2 next to your mbedtls directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are in the mbedtls directory, and you cannot invoke the script from the parent directory, the chances are most users to look for the files and not find them, not knowing its on the parent directory.
And given that mbedtls rarely sits on its own, it will be an already cluttered directory. And the script is generating more than archives, it generates several files, which terminal users have to clean up by hand. Putting it on a directory makes it easier to clean up and archive from a ci perspective.
how about we use /tmp to create the artefacts, and then:
- Move them on the root of mbedtls/tf-psa-crypto where the script will be invoked. They will be visible because of git, and we can clean them up using
git clean -dfx - Move them in a release_artifacts subdir inside mbedtls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point taken that different people have different habits.
Also .. is not a good default when there is an explicit --directory argument, it should be the parent of the source directory to be somewhat sensible.
At first I didn't want to touch any files inside the source tree, in case we wanted to do something that can't cope with uncommitted files. But I think we can reasonably assume that either the script will robustly ignore uncommitted file, or there will be a simple way to exclude files.
So I propose to change this to a directory in the source tree. Of course that's the default and the command line argument will stay.
| # in markdown as line breaks, not as spaces, so an ordinary paragraph | ||
| # needs to be in a single Python logical line. | ||
| return f'''\ | ||
| ## Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put that on a string on the top of the file or a separate template file. We shouldn't be looking into the code to edit this and.
Also since we are generating this we should add the artifacts name, and mdsum using md5 hyperlinks, since the same script is creating and capturing that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree about the separate file. It's easier to have a single file to edit. This is a string with a lot of substitutions, so if we store it externally, we'll have to manage some templating instead of just relying on the f-string syntax.
I don't understand about md5, did you mean sha256? The artifact names appear under “Checksum”. I mostly followed the existing structure of release notes.
That way, if the initialization of a step fails, it'll happen before any external changes are done. Signed-off-by: Gilles Peskine <[email protected]>
Don't change the disk file if the content hasn't changed. Signed-off-by: Gilles Peskine <[email protected]>
Signed-off-by: Gilles Peskine <[email protected]>
Signed-off-by: Gilles Peskine <[email protected]>
Signed-off-by: Gilles Peskine <[email protected]>
Don't use a positional argument, for consistency with the release date. Signed-off-by: Gilles Peskine <[email protected]>
Signed-off-by: Gilles Peskine <[email protected]>
Rename `--from` and `--to` to `--from-step` and `--to-step` for consistency. They have short options so they don't need to be easy to type. Signed-off-by: Gilles Peskine <[email protected]>
Using the parent of the source directory as an artifact directory makes sense if the parent of the source directory contains multiple projects or branches of Mbed TLS or TF-PSA-Crypto. It doesn't work for people who have clutter there. Also, the default artifact directory was the parent of the _current_ directory, whatever it is, and not the parent of the source directory. Passing `--directory` without `--artifact-directory` resulted in a nonsensical artifact directory location. Default to placing artifacts in `release-artifacts` in the source tree. This way, you can have multiple release candidates for the same branch in sibling directories, and their artifacts won't overwrite each other. Signed-off-by: Gilles Peskine <[email protected]>
The list of submodules won't change during the process. So get it once and for all. In support of this, move some basic git-related methods to the Info class. Signed-off-by: Gilles Peskine <[email protected]>
| index = self.git_index_as_tree_ish() | ||
| mtime = self.commit_timestamp() | ||
| self.call_git(['archive', '--format=tar', | ||
| '--mtime', str(mtime), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git archive --mtime requires a recent git. Ubuntu 22.04 isn't good enough. Ubuntu 24.04 is ok.
That's a reasonable assumption for a developer machine. Unfortunately, our CI doesn't have an Ubuntu 24.04 image yet, only 22.04.
We could avoid this by making a commit with the GEN_FILES change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of git archive --mtime=…, which can't be used in our CI yet, we could use faketime, which is already present in our docker images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the Dockerfile anyway, to set the user name and email so that Git can make a commit. And I have a working Ubuntu 24.04 Docker image with little fuss. So I think the way to go is to add the Ubuntu 24.04 image, which is long overdue anyway.
Create a release preparation script. Resolves #232.
This script performs the following steps:
bump_version.shwhich doesn't exist in TF-PSA-Crypto, it could be adapted to be standalone at little effort.GEN_FILESoff. Uses GNU tar. Also create a shasum file.Features:
Limitations:
Contributes to:
all.shand upload the generated artifacts).psa_crypto_driver_wrappers.handpsa_crypto_driver_wrappers_no_static.care missing. mbedtls#10332.PR checklist