Skip to content

Feature/harvesting metadata from a provided repository url via cloning#473

Open
Aidajafarbigloo wants to merge 11 commits intosoftwarepub:developfrom
Aidajafarbigloo:feature/harvesting-metadata-from-a-provided-repository-URL-via-cloning
Open

Feature/harvesting metadata from a provided repository url via cloning#473
Aidajafarbigloo wants to merge 11 commits intosoftwarepub:developfrom
Aidajafarbigloo:feature/harvesting-metadata-from-a-provided-repository-URL-via-cloning

Conversation

@Aidajafarbigloo
Copy link

This feature branch introduces functionality for harvesting metadata from a provided repository URL via cloning.

Changes:

  1. Accept a repository URL as a parameter in the hermes harvest command
  2. Accept a token (for GitHub/GitLab) as a parameter in the hermes harvest command (for the plugin githublab)
  3. Clone the repository locally
  4. Harvest metadata from the cloned repository (CFF and CodeMeta)

Added clone utility functions for repository cloning with error handling and cleanup.
Added new command-line arguments for URL and token in harvest command.
Add temporary directory handling for cloning repositories and update token management.
Added an explicit logging shutdown step before clearing the HERMES caches.
Without shutting down logging first, the `clean` command fails on Windows with:
`An error occurred during execution of clean (Find details in './hermes.log')`
"Original exception was: [WinError 32] The process cannot access the file because it is being used by another process: '.hermes\\audit.log'"
@Aidajafarbigloo
Copy link
Author

@sferenz
Could you please take a look at this pull request and share your feedback?


import argparse
import shutil
import logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aidajafarbigloo Is it necessary for your changes to include logging? If not, please remove it everywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sferenz Thank you for the comments.
When using hermes clean command on Windows, I face this error:
Run subcommand clean
Removing HERMES caches...
An error occurred during execution of clean (Find details in './hermes.log')

Error in the "hermes.log":
Original exception was: [WinError 32] The process cannot access the file because it is being used by another process: '.hermes\\audit.log'.

This happens because Windows does not allow deletion of a file that is still open by the current process. The audit.log file inside .hermes is held open by a logging file handler. When shutil.rmtree() attempts to remove the directory, it fails due to the open file handle. I'm using logging.shutdown() to ensure that all logging handlers are closed before the directory is deleted, it does not introduce new logging behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed before but somehow the fix got lost... Weird 🤔

#226

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not only once. But it seems more important to not have a log file in the working directory than having a properly cleaned .hermes cache.

you should be able to configure the path to the logfile, though.


# ---------------- utilities ----------------

def _normalize_clone_url(url: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a general comment for each function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings were added.

@@ -0,0 +1,249 @@
# SPDX-FileCopyrightText: 2026 OFFIS e.V.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put UOL here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

import toml


def _load_config(config_path: str) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that every function has a comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings were added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants