Cached artifact store#197
Draft
azimov wants to merge 10 commits into
Draft
Conversation
…ids within checksums
…ase/time at risk settings and not outcome based
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #197 +/- ##
===========================================
- Coverage 94.25% 94.06% -0.20%
===========================================
Files 22 23 +1
Lines 6531 6685 +154
===========================================
+ Hits 6156 6288 +132
- Misses 375 397 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces a content-addressable caching system for runCmAnalyses() and restructures study population creation into two phases to maximize artifact reusability when adding new outcomes.
Key Changes
Content-Addressable Caching — Artifact filenames are now derived from SHA-256 hashes of all parameters that determine their content (including databaseId). This means:
databaseIdparameter which is used in the checksums to prevent cross database issues - when using Strategus this will be based on its own hashing mechanism further reducing collision riskcreateStudyPopulationchangesThis means adding a new outcome to an existing analysis only requires the lightweight per-outcome step — all expensive shared computation (data loading, base population, PS fitting, matching/stratification) is reused from cache.
Artifact stores
Pluggable ArtifactStore interface — New R6 abstract class with LocalArtifactStore default implementation. Enables future custom storage backends (S3, shared filesystem, rdbms blob storage). Such an extension would allow for the multi-node execution of tasks and re-use of intermediate artifacts.