-
Notifications
You must be signed in to change notification settings - Fork 33
Add script to move files to the framework #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 8 commits
f4d3fbc
58bfb92
4c095e2
ddd8c7a
ffee036
6afe50f
02e7a14
b775887
60d0808
7847e77
cff001e
16a1e49
7e1748b
5a79094
a17eb64
a801f31
f824c88
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,335 @@ | ||||||||||||||
| #!/usr/bin/env python3 | ||||||||||||||
| """Move files from the Mbed TLS repository to the framework repository. | ||||||||||||||
|
|
||||||||||||||
| Invoke this script from a Git working directory with a branch of Mbed TLS. | ||||||||||||||
| """ | ||||||||||||||
|
|
||||||||||||||
| # Copyright The Mbed TLS Contributors | ||||||||||||||
| # SPDX-License-Identifier: Apache-2.0 | ||||||||||||||
| # | ||||||||||||||
| # Licensed under the Apache License, Version 2.0 (the "License"); you may | ||||||||||||||
| # not use this file except in compliance with the License. | ||||||||||||||
| # You may obtain a copy of the License at | ||||||||||||||
| # | ||||||||||||||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||
| # | ||||||||||||||
| # Unless required by applicable law or agreed to in writing, software | ||||||||||||||
| # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | ||||||||||||||
| # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||||||
| # See the License for the specific language governing permissions and | ||||||||||||||
| # limitations under the License. | ||||||||||||||
|
|
||||||||||||||
| import argparse | ||||||||||||||
| import os | ||||||||||||||
| import pathlib | ||||||||||||||
| import re | ||||||||||||||
| import subprocess | ||||||||||||||
| import sys | ||||||||||||||
| from typing import List, Optional, Tuple, Dict | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| class RepoFileMover: | ||||||||||||||
| """Move files between repos while retaining history.""" | ||||||||||||||
|
|
||||||||||||||
| GIT_EXE = 'git' | ||||||||||||||
| FRAMEWORK_DEFAULT_BASE_BRANCH = 'main' | ||||||||||||||
|
|
||||||||||||||
| def __init__(self, source_repo: str, dest_repo: str, | ||||||||||||||
| source_branch_name: str, dest_branch_name: str, | ||||||||||||||
| file_map: Dict[str, str], | ||||||||||||||
| dest_base_branch_name: Optional[str] = None): | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to document the parameters. It's not obvious what you can put in |
||||||||||||||
| self._source_repo = os.path.abspath(source_repo) | ||||||||||||||
| self._dest_repo = os.path.abspath(dest_repo) | ||||||||||||||
|
|
||||||||||||||
| self._src_branch_name = source_branch_name | ||||||||||||||
| self._tmp_framework_branch_name = 'tmp-branch-move-files-to-framework' | ||||||||||||||
| self._framework_branch_name = dest_branch_name | ||||||||||||||
|
|
||||||||||||||
| if dest_base_branch_name is None: | ||||||||||||||
| self._framework_base_branch_name = self.FRAMEWORK_DEFAULT_BASE_BRANCH | ||||||||||||||
| else: | ||||||||||||||
| self._framework_base_branch_name = dest_base_branch_name | ||||||||||||||
|
|
||||||||||||||
| self._file_map = {os.path.normpath(k): os.path.normpath(v) for k, v in file_map.items()} | ||||||||||||||
|
|
||||||||||||||
| def run_git(self, git_cmd: List[str], **kwargs) -> str: | ||||||||||||||
| """Run a git command and return its output.""" | ||||||||||||||
| if 'universal_newlines' not in kwargs: | ||||||||||||||
| kwargs['universal_newlines'] = True | ||||||||||||||
| cmd = [self.GIT_EXE] + git_cmd | ||||||||||||||
| return subprocess.check_output(cmd, **kwargs) | ||||||||||||||
|
|
||||||||||||||
| def add_file_move(self, src_path: str, dst_path: str) -> None: | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method isn't used anywhere. |
||||||||||||||
| """Move file at relative path src_path in the source repo to dst_path | ||||||||||||||
| in the destination repo""" | ||||||||||||||
| self._file_map[src_path] = dst_path | ||||||||||||||
|
||||||||||||||
|
|
||||||||||||||
| def _path_parts(self, path: str) -> List[str]: | ||||||||||||||
| return path.split(os.path.sep) | ||||||||||||||
|
|
||||||||||||||
| def _direct_children(self, dir_path: str) -> List[str]: | ||||||||||||||
| """Get the direct children of a subdirectory (at least the ones | ||||||||||||||
| that matter to git)""" | ||||||||||||||
|
Comment on lines
+98
to
+99
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Common docstring conventions say to make the first paragraph of a docstring a single line. Mind you, we don't always follow that even in the mbedtls repository. But I hope that one day we'll set up documentation rendering for our Python scripts and then we'll want to be more uniform about docstring formatting. |
||||||||||||||
| leaf_files = self.run_git(['ls-files', dir_path]).split() | ||||||||||||||
|
||||||||||||||
| leaf_files = self.run_git(['ls-files', dir_path]).split() | |
| leaf_files = self.run_git(['ls-files', '-z', '--', dir_path]).rstrip('\0').split('\0') |
This applies to the other uses of ls-files.
It would be good to make an auxiliary method that calls ls-files and returns a list.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be equivalent but simpler: run git -C dir_path ls-files and strip anything after a slash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though looking at how this method is used, it seems that you're stripping the paths down to base names here, only to reconstruct the full path later in _expand_dir_move?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part removes the suffix that git ls-files outputs. We might get:
foo/bar/a.c
foo/bar/b.c
foo/baz
Which we reduce to just the direct descendants of foo, so:
foo/bar
foo/baz
We never reconstruct the suffix in _expand_dir_move(), but we substitute the prefix, e.g. if foo is being moved to new_foo:
new_foo/bar
new_foo/baz
This is a part of the transformation required for the --except option.
I was not aware that passing -C with a directory had this effect though, that will certainly simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller only cares about iterating once, not about having a stored object in a particular format, so you could return the set directly and declare the return type as Iterable[str].
However, a quirk of the current code is that the order of traversal of a set is unspecified, and in practice it's randomized with all modern versions of CPython. For debugging, it sometimes help to have deterministic code, which you can get by sorting, which returns a list.
| return list(direct_children) | |
| return sorted(direct_children) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you know dir_path is present and you aren't using its current value, the simpler way is
| self._file_map.pop(dir_path, None) | |
| del self._file_map.pop[dir_path] |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler:
| while path[-1] == os.sep: | |
| path = path[:-1] | |
| path = path.rstrip(os.sep) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler:
| dirs, file = os.path.split(path) | |
| dirs = os.path.dirname(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever have a file whose name starts with a dash (granted, that's unlikely), git will interpret that file's name as an option. Applies everywhere we pass a file name to git.
| self.run_git(['rm', f]) | |
| self.run_git(['rm', '--', f]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BLOCKER: As far as I understand it (more info in Slack thread), this will in particular remove framework. So from that point, when checking out the original head again, the framework submodule is no longer initialized. Then later the framework directory itself disappears, which I don't understand.
I don't fully understand what's going on, but as far as I can tell, you can't have the destination be the framework directory of the source, it has to be a separate worktree. This is a very unintuitive limitation that needs to be at least documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now documented and I have added a check so that an error message is printed if you try to move files to a framework submodule within the Mbed TLS checkout that you are moving from.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like the ability to specify a commit message. I can do it afterwards by rebasing, and that's not really a problem in the source branch, but in the destination branch, the commit message is behind a merge and git is reluctant to edit history beyond a merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if I want to edit the commit message, I need to redo the conflict resolution which is implemented here as resolve_deletion_conflicts. That's not easy!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably simpler:
| if status in files_by_status: | |
| files_by_status[status].append(filename) | |
| else: | |
| files_by_status[status] = [filename] | |
| files_by_status.setdefault(status, []) | |
| files_by_status[status].append(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that there are no files that git calls “other” (neither checked in nor ignored). That's an acceptable limitation, but please document it. Ideally the script should refuse to run if such files are present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no checkout here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add checks for common errors before starting to do anything. At least check that the branches don't already exist, which is likely to be the case the second time the same person uses the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails on MacOS as /bin/true does not exist in that directory. Instead it is /usr/bin/true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just use true, Git will do a path lookup. (Actually you can even write arbitrary shell code there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this also happen when we're renaming the file, and the target name is one that used to exist but was removed before today? Granted that's not very likely to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails on MacOS as /bin/true does not exist in that directory. Instead it is /usr/bin/true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that just
| self.run_git(['-c', 'core.editor=/bin/true', | |
| 'commit', '--amend']) | |
| self.run_git(['commit', '--amend', '--no-edit']) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nargs='?' doesn't make sense. This option requires an argument (currently, if you don't pass one, the parser lets it through, but then you end up with None in the list which is consumed by code that requires a string). Applies to --except as well.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than require --path to be repeated for each path, I think this should be the positional arguments. In particular it would allow calls like mbedtls-move-to-framework … subdir/*.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't actually need to be a branch name, it can be any revision. In practice I'd want whatever is the current head of the framework submodule, and I think that should be the default, rather than main.
In fact main is likely not to work in practice, because it'll typically be some old reference made when the working directory was originally cloned I delete the default branch name in my local clones to avoid keeping that old reference around, so when I didn't pass --dst-branch-name, I got the error
fatal: 'main' matched multiple (4) remote tracking branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is typically called --dry-run and often abbreviated -n. The name --print-… suggests that the option enables tracing output but not that it doesn't perform the action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent when abbreviating “destination”: either
dstordest, not both.