Skip to content

vncntprvst/dataset-manager

Repository files navigation

Dataset manager

Version License Python Status Issues Last Commit Streamlit

A user interface that helps bundle data and metadata based on experimental types, while ensuring compatibility with NWB (Neurodata Without Borders) and DANDI (Distributed Archives for Neurophysiology Data Integration) submission requirements.

This project was created as part of the Team BRAIN Circuit Program (U19) NS137920:
🔗 High- and low-level computations for coordination of orofacial motor actions

Overview

This application generates data collection templates and conversion scripts tailored to specific experimental types. Users select their experimental modalities, and the tool automatically creates spreadsheets with the appropriate metadata fields required for NWB file creation and DANDI archive submission, and then generates a script to convert the collected data into NWB format.

image

Workflow

The simplest way to launch the app is using uv).
If uv is installed, just clone or download this repository, then double-click run_app.bat (Windows) - or run ./run_app.sh (Mac/Linux) from the terminal.
You can also drag and drop a dataset folder onto the script icon. See also alternative installation options.

Once the app is running in your browser:

  1. Select experimental types relevant to your research
  2. Create a new dataset on a data repository (e.g., DANDI)
  3. Generate and download your customized template as .xlsx or .csv. This interface helps standardizing data collection that comply with NWB and DANDI standards.
  4. Generate a Python conversion script.
  5. Use that script to convert your data into NWB format, ensuring it meets the necessary standards for sharing and archiving.

Supported Experimental Types

  • Electrophysiology – Extracellular / Intracellular
  • Behavior and physiological measurements
  • Optical Physiology
  • Stimulations
  • Experimental metadata and notes

Schema Validation

The application enforces data standards via two complementary schema layers:

  1. DANDI metadata layer. We load the official dandischema (Pydantic models) or the JSON Schema it produces. This automatically gives us the full set of required and optional fields plus their data types—no manual duplication. (Reference: https://docs.dandiarchive.org)
  2. NWB core layer. We rely on PyNWB to construct NWBFile objects. At minimum an NWB file must define: session_description, identifier, and session_start_time. The app can validate the resulting structure with PyNWB’s built‑in validator and apply additional best‑practice checks using NWB Inspector.

With those definitions in place, the app generates a session-oriented spreadsheet template (one row per recording session) for you to complete.

Integration with BrainSTEM

This application is designed to integrate seamlessly with the BrainSTEM platform, allowing for efficient data management and analysis. Users can easily import their notes and metadata from BrainSTEM during the data conversion process, ensuring that all relevant information is accurately captured and organized.

See the short video demo.

Installation

  • Clone this repository.
    git clone https://github.com/vncntprvst/dataset-manager.git
  • Run the app:
    • Double-click run_app.bat (Windows) or run ./run_app.sh (Mac/Linux) from the terminal.
      You can also drag and drop a data folder onto the script icon.

    • Alternatively: If using uv (recommended):
      uv run streamlit run app.py

      or, if not using uv:
      - Create a virtual environment (Python 3.9+)
      - Install dependencies
      pip install -r requirements.txt
      - Activate environment, then streamlit run app.py

About

App to manage dataset conversions

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages