-
Notifications
You must be signed in to change notification settings - Fork 91
feat(processing): add unblob dedicated temporary directory handling #1222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| """Set environment variables so all subprocesses and handlers use our temp dir""" | ||
| for var in ("TMP", "TMPDIR", "TEMP", "TEMPDIR"): | ||
| os.environ[var] = self.tmp_dir.as_posix() | ||
| atexit.register(self._cleanup_tmp_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic for programs using unblob as library, as these variables remain set in the hosting process's environment. We should set and reset them in a tighter scope.
For example, Processor.process_task looks like a fine candidate, as it by default will spawn child processes, not poisoning the caller's environment at all (given they use process_num > 1).
Another possibility is doing this at process_file level, but it still affects the hosting process unfortunately, but at least we can restore the variables upon returning from the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also implemented tests in test_necessary_resources_can_be_created_in_sandbox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the original comment being resolved in the code - below I can find the env variables being set, but not restored, and here atexit.register still being used.
Have you pushed your changes?
I am also not sure how it would landlock to a /tmp/call-specific-tempdir if used as a library on the second call with another ExtractionConfig.
One solution I can think of is process_file forking in case of landlock, setting the environment variables in the fork, and cleaning up the temporary directory when the fork exits. Since the environment variables would be set in the child, they need no restoring.
(actually multiprocessing should be used somehow, as https://docs.python.org/3/library/os.html#os.fork has many problems and known not to work on macos)
Draft implementation for #1216
It fails on
test_archive_successbecause we cannot instantiateExtractionConfigwith an arbitrary tmp directory right now.Some unit testing is also required for proper coverage.