Improve versioning, add SQL query/result caching, optimize main index query#142
Merged
Conversation
…c_index Joins auxiliary_metadata to expose when each series was first added to IDC and when it was last revised, enabling version-aware filtering of the index. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the direct 57M-row instance-level join on SeriesInstanceUID with a CTE that groups auxiliary_metadata to one row per series first, avoiding the many-to-many join explosion before GROUP BY. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CTE MIN on series_init_idc_version and MAX on series_revised_idc_version are semantically correct and robust against unexpected intra-series variation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cache key is the SHA256 of each SQL file, which encodes both query logic and the BQ dataset version. On cache hit, all three artifacts (.parquet, _schema.json, .sql) are restored from gs://idc-index-data-cache without hitting BigQuery. Cache failures fall back to BQ transparently. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add type: ignore[attr-defined] for google.cloud.storage import (no stubs available for mypy) - Add gcs_cache_bucket to both guard conditions so mypy narrows the type from str | None to str at the call sites Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
google.cloud.storage causes persistent mypy attr-defined and import-untyped errors that resist per-line suppression due to ruff auto-fixing the import form. Use # mypy: ignore-errors at the file level as a pragmatic workaround until a proper fix is implemented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Indexes idc-dev-etl.idc_v24_pub.version_metadata, exposing idc_version and version_timestamp while excluding the version_hash column. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
738bfc8 to
dd206f4
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.