Skip to content
Open
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 256 additions & 1 deletion scripts/prepare_release.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
It must run from a clean git worktree with initialized, clean git submodules.
When making an Mbed TLS >=4 release, the tf-psa-crypto submodule should
already contain a TF-PSA-Crypto release candidate.

This script requires the following external tools:
- GNU tar (can be called ``gnutar`` or ``gtar``);
- ``sha256sum``.
"""

# Copyright The Mbed TLS Contributors
Expand Down Expand Up @@ -39,17 +43,21 @@
# silently outputting garbage. Nonetheless, human review of the
# result is still expected.

# If you add an external dependency to a step, please mention
# it in the docstring at the top.

################################################################


import argparse
import datetime
import os
import pathlib
import re
import subprocess
import sys
import typing
from typing import Dict, List, Optional, Sequence
from typing import Callable, Dict, Iterator, List, Optional, Sequence


PathOrString = typing.Union[os.PathLike, str]
Expand All @@ -60,6 +68,24 @@ class InformationGatheringError(Exception):
pass


def find_gnu_tar() -> str:
"""Try to guess the command for GNU tar.

This function never errors out. It defaults to "tar".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? If we don't have to handle about exceptions, you could exclitely return None if not found. Then handle it as an optional intput in Options and gracefully handle it when calling tar with an error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to make the script more complex (having a nullable tar_command) just to handle the unlikely case where GNU tar can't be found.

"""
for name in ['gnutar', 'gtar']:
try:
subprocess.check_call([name, '--version'],
stdin=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL)
return name
except FileNotFoundError:
pass
except subprocess.CalledProcessError:
pass
return 'tar'

class Options(typing.NamedTuple):
"""Options controlling the behavior of the release process.

Expand All @@ -68,11 +94,14 @@ class Options(typing.NamedTuple):
"""
# Directory where the release artifacts will be placed.
artifact_directory: pathlib.Path
# GNU tar command.
tar_command: str
# Version to release (None to read it from ChangeLog).
version: Optional[str]

DEFAULT_OPTIONS = Options(
artifact_directory=pathlib.Path(os.pardir),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
artifact_directory=pathlib.Path(os.pardir),
artifact_directory=os.path.join(pathlib.Path(os.pardir), self.artifact_base_name() + "_artifacts"),

Parent director can contain a lot of files, that we may not want to override. We should create an explicit directory to store all the release related artifcats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this script should create a new directory. Especially by default: users are likely to miss it, or it could clash with something unrelated. The parent directory seems like a sensible default to me: you'll create mbedtls-1.2.3.tar.bz2 next to your mbedtls directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are in the mbedtls directory, and you cannot invoke the script from the parent directory, the chances are most users to look for the files and not find them, not knowing its on the parent directory.

And given that mbedtls rarely sits on its own, it will be an already cluttered directory. And the script is generating more than archives, it generates several files, which terminal users have to clean up by hand. Putting it on a directory makes it easier to clean up and archive from a ci perspective.

how about we use /tmp to create the artefacts, and then:

  • Move them on the root of mbedtls/tf-psa-crypto where the script will be invoked. They will be visible because of git, and we can clean them up using git clean -dfx
  • Move them in a release_artifacts subdir inside mbedtls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken that different people have different habits.

Also .. is not a good default when there is an explicit --directory argument, it should be the parent of the source directory to be somewhat sensible.

At first I didn't want to touch any files inside the source tree, in case we wanted to do something that can't cope with uncommitted files. But I think we can reasonably assume that either the script will robustly ignore uncommitted file, or there will be a simple way to exclude files.

So I propose to change this to a directory in the source tree. Of course that's the default and the command line argument will stay.

tar_command=find_gnu_tar(),
version=None)


Expand Down Expand Up @@ -212,6 +241,38 @@ def submodules(self) -> Sequence[str]:
self._submodules = raw.decode('ascii').rstrip('\0').split('\0')
return self._submodules

def commit_timestamp(self,
where: Optional[PathOrString] = None,
what: str = 'HEAD') -> int:
"""Return the timestamp (seconds since epoch) of the given commit.

Pass `where` to specify a submodule.
Pass `what` to specify a commit (default: ``HEAD``).
"""
timestamp = self.read_git(['show', '--no-patch', '--pretty=%ct', what],
where=where)
return int(timestamp)

def git_index_as_tree_ish(self,
where: Optional[PathOrString] = None) -> str:
"""Return a git tree-ish corresponding to the index.

Pass `where` to specify a submodule.
"""
raw = self.read_git(['write-tree'], where=where)
return raw.decode('ascii').strip()

def commit_datetime(self,
where: Optional[PathOrString] = None,
what: str = 'HEAD') -> datetime.datetime:
"""Return the time of the given commit.

Pass `where` to specify a submodule.
Pass `what` to specify a commit (default: ``HEAD``).
"""
timestamp = self.commit_timestamp(where=where, what=what)
return datetime.datetime.fromtimestamp(timestamp, datetime.timezone.utc)

def assert_git_status(self) -> None:
"""Abort if the working directory is not clean (no git changes).

Expand All @@ -220,6 +281,36 @@ def assert_git_status(self) -> None:
self.call_git(['diff', '--quiet'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use porcelain for this check git status --porcelain=v1 . Git diff will only care about tracked files, but you can have old autogenerated files or other user files in the directorly and would make it into the archive.

Also I would enclose it in a try/except block and print out a helper message to the user. Otherwis subprocess.CalledProcessError if the first message any new user will encounter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untracked files don't matter. They won't be archived.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on making the message prettier here? The obvious solution is to print to stderr and exit(), but I'd rather not exit() in case this code is invoked from other Python code, for example we may want to add unit tests. A custom exception with add_note looks nice, but that requires a very recent Python.

        if not self.files_are_clean():
            raise Exception('There are uncommitted changes (maybe in submodules) in ' + str(self.info.top_dir))

is better than what's there now, do you think that's good enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script will exit anyway in the current form, with the full traceback, but wont be very informative.

We should at least print that it failed because the tree is not clean so at least they know that. We can also exit with an error code, so any script invoking it will also stop the execution flow.

On a tangent note it is quite easy in Python to detect when the script is called by a human operator as opposed to a script by using the built-in sys library sys.stdin.isatty() and sys.stdout.isatty()

So in theory you could have a custom print/log command that will display more human friendly messages when on an active tty and less when invoked by a CI script. ( not neccessarily needed here but since we mention it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction isn't human vs CI, but imported vs standalone process.

Anyway I think the custom exception with an explanatory message is a reasonable compromise.



def artifact_base_name(self) -> str:
"""The base name for a release artifact (file created for publishing).

This contains the product name and version, with no directory part
or extension. For example "mbedtls-1.2.3".
"""
return '-'.join([self.info.product_machine_name,
self.info.version])

def artifact_path(self, extension: str) -> pathlib.Path:
"""The path for a release artifact (file created for publishing).

`extension` should start with a ".".
"""
file_name = self.artifact_base_name() + extension
return self.options.artifact_directory / file_name

def edit_file(self,
path: PathOrString,
transform: Callable[[str], str]) -> None:
"""Edit a text file.

The path can be relative to the toplevel root or absolute.
"""
with open(self.info.top_dir / path, 'r+', encoding='utf-8') as file_:
new_content = transform(file_.read())
file_.seek(0)
file_.truncate()
file_.write(new_content)

def assert_preconditions(self) -> None:
"""Check whether the preconditions for this step have been achieved.

Expand All @@ -235,7 +326,168 @@ def run(self) -> None:
raise NotImplementedError


class ArchiveStep(Step):
"""Prepare release archives and the associated checksum file."""

@classmethod
def name(cls) -> str:
return 'archive'

def tar_git_files(self, plain_tar_path: str, prefix: str) -> None:
"""Create an uncompressed tar files with the git contents."""
# We archive the index, not the commit 'HEAD', because
# we may have modified some files to turn off GEN_FILES.
# We do use the commit mtime rather than the current time,
# however, for reproducibility.
# A downside of not archiving a commit is that the archive won't
# contain an extended header with the commit ID that
# `git get-tar-commit-id` could retrieve. If we change to releasing
# an exact commit, we should make sure that the commit gets published.
index = self.git_index_as_tree_ish()
mtime = self.commit_timestamp()
self.call_git(['archive', '--format=tar',
'--mtime', str(mtime),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git archive --mtime requires a recent git. Ubuntu 22.04 isn't good enough. Ubuntu 24.04 is ok.

That's a reasonable assumption for a developer machine. Unfortunately, our CI doesn't have an Ubuntu 24.04 image yet, only 22.04.

We could avoid this by making a commit with the GEN_FILES change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of git archive --mtime=…, which can't be used in our CI yet, we could use faketime, which is already present in our docker images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the Dockerfile anyway, to set the user name and email so that Git can make a commit. And I have a working Ubuntu 24.04 Docker image with little fuss. So I think the way to go is to add the Ubuntu 24.04 image, which is long overdue anyway.

'--prefix', prefix,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal. .git is already ingored by git archive. How about we also ignore files like .gitingore?

GNU tar has commands like --exclude-vcs-ignores and --exclude-vcs

https://www.gnu.org/software/tar/manual/html_section/exclude.html

We could also include other temporary files

'--exclude=.DS_Store',
'--exclude=**/*.tmp',
'--exclude=**/*.swp'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we not include .gitignore in the archive? And why would we look for temporary files at time of the release? If they're already in git, they're part of the release, for good or for worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats fair. If git-archive is ignoring those files then we can avoid that. And git modules is not a bit issue nowadays that we do not update the sumobules to the restricted ones.

'--output', str(plain_tar_path),
index])
for submodule in self.submodules:
index = self.git_index_as_tree_ish(where=submodule)
mtime = self.commit_timestamp(where=submodule)
data = self.read_git(['archive', '--format=tar',
'--mtime', str(mtime),
'--prefix', prefix + submodule + '/',
index],
where=submodule)
subprocess.run([self.options.tar_command, '--catenate',
'-f', plain_tar_path,
'/dev/stdin'],
input=data,
check=True)

GEN_FILES_FILES = [
'CMakeLists.txt',
'tf-psa-crypto/CMakeLists.txt',
]

@staticmethod
def set_gen_files_default_off(old_content: str) -> str:
"""Make GEN_FILES default off in a build script."""
# CMakeLists.txt
new_content = re.sub(r'(option\(GEN_FILES\b.*)\bON\)',
r'\g<1>OFF)',
old_content)
return new_content

def turn_off_gen_files(self) -> None:
"""Make GEN_FILES default off in build scripts."""
for filename in self.GEN_FILES_FILES:
path = self.info.top_dir / filename
if path.exists():
self.edit_file(path, self.set_gen_files_default_off)
self.call_git(['add', path.name], where=path.parent)

def restore_gen_files(self) -> None:
"""Restore build scripts affected by turn_off_gen_files()."""
for filename in self.GEN_FILES_FILES:
path = self.info.top_dir / filename
if path.exists():
self.call_git(['reset', '--', path.name],
where=path.parent)
self.call_git(['checkout', '--', path.name],
where=path.parent)

@staticmethod
def list_project_generated_files(project_dir: pathlib.Path) -> List[str]:
"""Return the list of generated files in the given (sub)project.

The returned file names are relative to the project root.
"""
raw_list = subprocess.check_output(
['framework/scripts/make_generated_files.py', '--list'],
cwd=project_dir)
return raw_list.decode('ascii').rstrip('\n').split('\n')

@staticmethod
def update_project_generated_files(project_dir: pathlib.Path) -> None:
"""Update the list of generated files in the given (sub)project."""
subprocess.check_call(
['framework/scripts/make_generated_files.py'],
cwd=project_dir)

def tar_add_project_generated_files(self,
plain_tar_path: str,
project_dir: pathlib.Path,
project_prefix: str,
file_list: List[str]) -> None:
transform = 's/^/' + project_prefix.replace('/', '\\/') + '/g'
commit_datetime = self.commit_datetime(project_dir)
file_datetime = commit_datetime + datetime.timedelta(seconds=1)
subprocess.check_call([self.options.tar_command, '-r',
'-f', plain_tar_path,
'--transform', transform,
'--owner=root:0', '--group=root:0',
'--mode=a+rX,u+w,go-w',
'--mtime', file_datetime.isoformat(),
'--'] + file_list,
cwd=project_dir)

def tar_add_generated_files(self, plain_tar_path: str, prefix: str) -> None:
"""Add generated files to an existing uncompressed tar file."""
for project in [os.curdir] + list(self.submodules):
if project.endswith('/framework') or project == 'framework':
continue
project_dir = self.info.top_dir / project
project_prefix = (prefix if project == os.curdir else
prefix + project + '/')
file_list = self.list_project_generated_files(project_dir)
self.update_project_generated_files(project_dir)
self.tar_add_project_generated_files(plain_tar_path,
project_dir,
project_prefix,
file_list)

def create_plain_tar(self, plain_tar_path: str, prefix: str) -> None:
"""Create an uncompressed tar file for the release."""
self.tar_git_files(plain_tar_path, prefix)
self.tar_add_generated_files(plain_tar_path, prefix)

@staticmethod
def compress_tar(plain_tar_path: str) -> Iterator[str]:
"""Compress the tar file.

Remove the original, uncompressed file.
Yield the list of compressed files.
"""
compressed_path = plain_tar_path + '.bz2'
if os.path.exists(compressed_path):
os.remove(compressed_path)
subprocess.check_call(['bzip2', '-9', plain_tar_path])
yield compressed_path

def create_checksum_file(self, archive_paths: List[str]) -> None:
"""Create a checksum file for the given files."""
checksum_path = self.artifact_path('.txt')
relative_paths = [os.path.relpath(path, self.options.artifact_directory)
for path in archive_paths]
content = subprocess.check_output(['sha256sum', '--'] + relative_paths,
cwd=self.options.artifact_directory,
encoding='ascii')
with open(checksum_path, 'w') as out:
out.write(content)

def run(self) -> None:
"""Create the release artifacts."""
self.turn_off_gen_files()
base_name = self.artifact_base_name()
plain_tar_path = str(self.artifact_path('.tar'))
self.create_plain_tar(plain_tar_path, base_name + '/')
compressed_paths = list(self.compress_tar(plain_tar_path))
self.create_checksum_file(compressed_paths)
self.restore_gen_files()


ALL_STEPS = [
ArchiveStep,
] #type: Sequence[typing.Type[Step]]


Expand Down Expand Up @@ -274,6 +526,8 @@ def main() -> None:
parser.add_argument('--list-steps',
action='store_true',
help='List release steps and exit')
parser.add_argument('--tar-command',
help='GNU tar command')
parser.add_argument('--to', metavar='STEP',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that is needed to be honest. I can see where the --from step can be usefull to do re-do things like assemble changelog.

I could also see a use for a --single-step function that would be considerably simpler to implement and very usefull.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used --{from,to}=step extensively during testing, I don't think it's too much harder to type than --single-step=step. I've also used --to on its own occasionally. It might even be more useful than --from since the steps are supposed to be idempotent. (Of course, if we need to do weird things e.g. for an emergency point release then “supposed to” might not actually be the case.)

help='Last step to run (default: run all steps)')
parser.add_argument('version', nargs='?',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this is inconsitent. Should be --version and -v. Also I would make it a required argument not an automatically set one, because it is quite critical to the release and we want to be able to see it on the CI log too.

It would also greatly simplify the logic, of trying to automatically resolve it.

Currently thec code does the following

        if options.version is None:
            self._release_version = self.old_version
        else:
            self._release_version = options.version

I can see no point in generating artefacts for self._release_version = self.old_version . And creating a bump version logic will need to consider patch and major versions which would add complexity.

Ask the caller to provide it and if not dont start the process and we keep it simple :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be a common case for not passing a version: when there is already a release candidate (with bump_version already done) and we update it with a last-minute patch. But this could be an explicit request, e.g. --keep-version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patching is a bit out of scope of this script, and we have decided to err on the side of caution with the new process. Patches are to applied before any of those actions implemented from the script are performed. The new release process clearly requires the release candidate to be remade when the codebase is changed.

If in the future we need to expand the functionality we could implement an input argument with the list of steps the user wants to apply onto a tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this into --release-version = -r.

It's still not mandatory. Using the version that's already filled in in the changelog is a very meaningful action and makes sense as a default.

Expand All @@ -286,6 +540,7 @@ def main() -> None:
return
options = Options(
artifact_directory=pathlib.Path(args.artifact_directory).absolute(),
tar_command=args.tar_command,
version=args.version)
run(options, args.directory,
from_=args.from_, to=args.to)
Expand Down