Feature/harvesting metadata from a provided repository url via cloning#473
Conversation
Added clone utility functions for repository cloning with error handling and cleanup.
Added new command-line arguments for URL and token in harvest command.
Add temporary directory handling for cloning repositories and update token management.
Added an explicit logging shutdown step before clearing the HERMES caches. Without shutting down logging first, the `clean` command fails on Windows with: `An error occurred during execution of clean (Find details in './hermes.log')` "Original exception was: [WinError 32] The process cannot access the file because it is being used by another process: '.hermes\\audit.log'"
|
@sferenz |
|
|
||
| import argparse | ||
| import shutil | ||
| import logging |
There was a problem hiding this comment.
@Aidajafarbigloo Is it necessary for your changes to include logging? If not, please remove it everywhere.
There was a problem hiding this comment.
@sferenz Thank you for the comments.
When using hermes clean command on Windows, I face this error:
Run subcommand clean
Removing HERMES caches...
An error occurred during execution of clean (Find details in './hermes.log')
Error in the "hermes.log":
Original exception was: [WinError 32] The process cannot access the file because it is being used by another process: '.hermes\\audit.log'.
This happens because Windows does not allow deletion of a file that is still open by the current process. The audit.log file inside .hermes is held open by a logging file handler. When shutil.rmtree() attempts to remove the directory, it fails due to the open file handle. I'm using logging.shutdown() to ensure that all logging handlers are closed before the directory is deleted, it does not introduce new logging behavior.
There was a problem hiding this comment.
This was fixed before but somehow the fix got lost... Weird 🤔
There was a problem hiding this comment.
Well, not only once. But it seems more important to not have a log file in the working directory than having a properly cleaned .hermes cache.
you should be able to configure the path to the logfile, though.
|
|
||
| # ---------------- utilities ---------------- | ||
|
|
||
| def _normalize_clone_url(url: str) -> str: |
There was a problem hiding this comment.
Please provide a general comment for each function.
| @@ -0,0 +1,249 @@ | |||
| # SPDX-FileCopyrightText: 2026 OFFIS e.V. | |||
| import toml | ||
|
|
||
|
|
||
| def _load_config(config_path: str) -> dict: |
There was a problem hiding this comment.
Please ensure that every function has a comment
This script bulk-tests HERMES metadata harvesting across multiple repositories, checking for expected metadata files and generating a CSV report.
Add test repositories for bulk testing.
This feature branch introduces functionality for
harvesting metadata from a provided repository URL via cloning.Changes:
hermes harvestcommandhermes harvestcommand (for the plugin githublab)