-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Feature
2 / 42 of 4 issues completed
Copy link
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Overview
This epic serves as the parent issue for all work related to implementing and documenting the data ingest process for each provider (EMSL, JGI, NMDC, ESS-DIVE). All subtasks and deliverables under this epic must comply with the standardized folder structure, file naming conventions, and data validation requirements outlined in issue #9.
Scope
- Coordinate the creation and population of the ingest directory with subfolders for each data provider.
- Ensure all ingest files conform to the latest release schema.
- Enforce splitting of files to limit each to ~25 MB, with no record spanning multiple files.
- Require all data files to be formatted as JSON lists (or the agreed format as documented).
- Apply the standardized naming convention: _<padded 5 number>.json (e.g., emsl_00001.json).
- Document the folder structure, file formats, splitting strategy, and any tools/scripts used for ETL (to be placed in contrib/).
- All implementation and documentation tasks related to these requirements should be tracked as subtasks under this epic.
Acceptance Criteria
- All subtasks necessary to implement the ingest process and documentation are completed.
- The ingest folder structure and naming conventions fully comply with the requirements in issue Implement Data Ingest Folder Structure and Conventions #9.
- All provider data is validated against the current release schema.
- Splitting strategy and file formats are documented.
- All ETL scripts are placed in the appropriate contrib/ directories and documented.
Sub-issues
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request