Skip to content

EPIC: Populate data provider ingest files to support bertron api #10

@vchendrix

Description

@vchendrix

Overview

This epic serves as the parent issue for all work related to implementing and documenting the data ingest process for each provider (EMSL, JGI, NMDC, ESS-DIVE). All subtasks and deliverables under this epic must comply with the standardized folder structure, file naming conventions, and data validation requirements outlined in issue #9.

Scope

  • Coordinate the creation and population of the ingest directory with subfolders for each data provider.
  • Ensure all ingest files conform to the latest release schema.
  • Enforce splitting of files to limit each to ~25 MB, with no record spanning multiple files.
  • Require all data files to be formatted as JSON lists (or the agreed format as documented).
  • Apply the standardized naming convention: _<padded 5 number>.json (e.g., emsl_00001.json).
  • Document the folder structure, file formats, splitting strategy, and any tools/scripts used for ETL (to be placed in contrib/).
  • All implementation and documentation tasks related to these requirements should be tracked as subtasks under this epic.

Acceptance Criteria

  • All subtasks necessary to implement the ingest process and documentation are completed.
  • The ingest folder structure and naming conventions fully comply with the requirements in issue Implement Data Ingest Folder Structure and Conventions #9.
  • All provider data is validated against the current release schema.
  • Splitting strategy and file formats are documented.
  • All ETL scripts are placed in the appropriate contrib/ directories and documented.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions