Skip to content

Conversation

@franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented May 3, 2023

The openPMD standard works by defining "what must be there", but does not impose restrictions as to "what must not be there". By this principle, openPMD is an extensible standard.
So far, standard extensions relied mostly on defining additional metadata in terms of attributes, e.g. for storing the name of the employed field solver for the ED-PIC extension. Custom hierarchies and custom n-dimensional datasets ("heavy" data in comparison to lightweight metadata) have not been employed so far despite the theoretical possibility to do so, granted by the openPMD standard. The major hindrance to such data organization has been the lacking support at the level of the openPMD-api, i.e. the implementation of the standard.

As the first part of this PR, the openPMD-api now supports writing custom-defined hierarchies and datasets within the basepath, i.e. within Iterations. This change is entirely independent from the standard as it makes use of the already existing liberty within the standard's conception as explained in the introduction.

This alone finds useful applications already:

  • Data that has been marked up according to another standard can be embedded side-by-side with openPMD-formatted particle-mesh data. A short example is given as part of this PR that writes an openPMD-formatted temperature mesh side by side with a simple NeXus example. The resulting dataset is shown below:
      string       /basePath                                                attr   = "/data/%T/"
      string       /date                                                    attr   = "2024-08-12 16:58:01 +0200"
      string       /iterationEncoding                                       attr   = "groupBased"
      string       /iterationFormat                                         attr   = "/data/%T/"
      string       /meshesPath                                              attr   = "meshes/"
      string       /openPMD                                                 attr   = "1.1.0"
      uint32_t     /openPMDextension                                        attr   = 0
      string       /software                                                attr   = "openPMD-api"
      string       /softwareVersion                                         attr   = "0.16.0-dev"
      double       /data/100/dt                                             attr   = 1
      double       /data/100/time                                           attr   = 0
      double       /data/100/timeUnitSI                                     attr   = 1
      string       /data/100/Scan/NX_class                                  attr   = "NXentry"
      string       /data/100/Scan/data/NX_class                             attr   = "NXdata"
      string       /data/100/Scan/data/axes                                 attr   = {"two_theta"}
      int64_t      /data/100/Scan/data/counts                               {15} = 0 / 0
      string       /data/100/Scan/data/counts/long_name                     attr   = "photodiode counts"
      string       /data/100/Scan/data/counts/units                         attr   = "counts"
      string       /data/100/Scan/data/signal                               attr   = "counts"
      double       /data/100/Scan/data/two_theta                            {15} = 0 / 0
      string       /data/100/Scan/data/two_theta/long_name                  attr   = "two_theta (degrees)"
      string       /data/100/Scan/data/two_theta/units                      attr   = "degrees"
      uint8_t      /data/100/Scan/data/two_theta_indices                    attr   = {0}
      string       /data/100/Scan/default                                   attr   = "data"
      double       /data/100/meshes/temperature                             {5, 5} = 0 / 0
      string       /data/100/meshes/temperature/axisLabels                  attr   = {"x", "y"}
      string       /data/100/meshes/temperature/dataOrder                   attr   = "C"
      string       /data/100/meshes/temperature/geometry                    attr   = "cartesian"
      double       /data/100/meshes/temperature/gridGlobalOffset            attr   = {0, 0}
      double       /data/100/meshes/temperature/gridSpacing                 attr   = {1, 1}
      double       /data/100/meshes/temperature/gridUnitSI                  attr   = 1
      long double  /data/100/meshes/temperature/position                    attr   = {0.5, 0.5}
      float        /data/100/meshes/temperature/timeOffset                  attr   = 0
      double       /data/100/meshes/temperature/unitDimension               attr   = {0, 0, 1, 0, 0, 0, 0}
      double       /data/100/meshes/temperature/unitSI                      attr   = 1
    
  • Embedding non-physical information into output files. An example is the particle-in-cell simulation PIConGPU that uses openPMD for regular output as well as for checkpoint-restart output. In the case of checkpoint-restart, internal program state must be serialized along with the physical state of the simulation, currently only possible by pretending that the internal state is a mesh which confuses many post-processing tools such as visualizers. PIConGPU has been adapted to make use of this change on this Git tree, check here for a diff. A shortened example output is pasted below, demonstrating that internal state information is now cleanly separated from physical data:
      float     /data/100/fields/E/x                                      {192, 1024, 192}
      float     /data/100/fields/E/y                                      {192, 1024, 192}
      float     /data/100/fields/E/z                                      {192, 1024, 192}
      float     /data/100/particles/e/momentum/x                          {71958528}
      float     /data/100/particles/e/momentum/y                          {71958528}
      float     /data/100/particles/e/momentum/z                          {71958528}
      float     /data/100/particles/e/position/x                          {71958528}
      float     /data/100/particles/e/position/y                          {71958528}
      float     /data/100/particles/e/position/z                          {71958528}
      int32_t   /data/100/particles/e/positionOffset/x                    {71958528}
      int32_t   /data/100/particles/e/positionOffset/y                    {71958528}
      int32_t   /data/100/particles/e/positionOffset/z                    {71958528}
      float     /data/100/particles/e/weighting                           {71958528}
      char      /data/100/picongpu_internal/RNG/RNGProvider3XorMin        {48, 128, 147456}
      uint64_t  /data/100/picongpu_internal/idProvider/nextId             {1, 1, 1}
      uint64_t  /data/100/picongpu_internal/idProvider/startId            {1, 1, 1}
    

Building on top of this, the other logical component of this PR consists in the support of this standard extension. While the PR as described so far brings custom hierarchies and datasets to the openPMD-api in a way that is transparent to the standard itself, the purpose of this next standard extension is to now make the standard aware of these hierarchies by embedding openPMD markup within them.

The schematic idea behind this is pictured below:
267274652-a4a4a4ac-636f-4349-bc14-c4e4a2cc36a1

With this, the data organization can step back into openPMD markup from anywhere within a custom-defined hierarchy. This further extends the use of this PR to:

  • Using openPMD markup within another standard, rather than merely beside it. This is currently being applied exploratively in this script for a sample dataset collected in the POLARIS laboratory.
  • For more complex setups, this permits a better organization of output data. As an example, meshes can be of different kinds such as 3-dimensional physical fields or 2-dimensional images; also there might be similar kinds of dependencies between particle data. It is desirable to group such data in a way that reflects the logical adjacencies and interdependencies between them.
  • A particular instance of the above is mesh refinement, currently proposed in a standard extension as a suffix-based naming scheme. Switching to an approach based on custom hierarchies, this comment details a more natural and more easily parsed approach at mesh refinement. A mesh-refined dataset of this type might be structured as follows:
    /data/0/refined_mesh_levels/0/meshes/E
    /data/0/refined_mesh_levels/0/meshes/B
    /data/0/refined_mesh_levels/1/meshes/E
    /data/0/refined_mesh_levels/1/meshes/B
    /data/0/refined_mesh_levels/2/meshes/E
    /data/0/refined_mesh_levels/2/meshes/B
    +++++++ ––––––––––––––––––––– ++++++++
    standard        custom        standard
    
    /data/0/simulation_internal/some_checkpointing_info
    +++++++ –––––––––––––––––––––––––––––––––––––––––––
    standard                  custom
    

TODO

  • Merge first: Remove necessity for RecordComponent::SCALAR #1154
  • Await Pybind11 release that has merged this fix: Introduce recursive_container_traits pybind/pybind11#4623
  • Implement custom groups at the Iteration level that can hold custom attributes
  • Implement custom datasets inside custom hierarchy
  • Implement openPMD-defined meshes/particles-data from anywhere in the hierarchy
  • Implement extended meshesPath/particlesPath
  • Update the openPMD standard, see Allow user to store non-openPMD information openPMD-standard#115 (comment)
  • Lenient parsing in CustomHierarchy class
  • Maybe lazy parsing of the custom hierarchy?
  • Use the new SharedAttributableData pattern to better implement variable-based encoding (where series.iterations and series.iterations[0] are the same backend objects)
  • Replace Iteration::meshes with Iteration::mesh("subdir/E") and Iteration::allMeshes() -> std::map<std::string, Mesh>, similar Iteration::species("subdir/e") and Iteration::allSpecies() -> std::map<std::string, ParticleSpecies>. But should it be species("subdir/particles/e") or species("subdir/e")?
  • Generalize to Attributable::openAsCustomHierarchy()?

Diff: https://github.com/franzpoeschel/openPMD-api/compare/topic-remove-scalar-component..topic-custom-hierarchies

@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 6c7f23a to c692dc7 Compare May 8, 2023 09:22
}
}

TEST_CASE("custom_hierarchies", "[core]")

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function C_A_T_C_H_T_E_S_T_4 is unreachable (
autoRegistrar5
must be removed at the same time)
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from c8a68a5 to 6c87958 Compare May 11, 2023 09:19
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 86d8a73 to 399e6cd Compare May 30, 2023 12:43
}
}

TEST_CASE("custom_hierarchies", "[core]")

Check warning

Code scanning / CodeQL

Poorly documented large function Warning test

Poorly documented function: fewer than 2% comments for a function of 194 lines.
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jun 19, 2023

comment removed, updated version in comments below

@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 8c28fab to 605bd55 Compare June 29, 2023 11:11
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from bef9c6b to b4779a3 Compare July 13, 2023 12:24
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jul 13, 2023

For the meshesPath (equivalently for particlesPath), I have now implemented a prototype that does the following:

A path /data/0/custom/group/meshes/E is a mesh if the meshesPath contains any of the following:

  1. Full path to the group containing the mesh: /custom/group/meshes/
  2. Full path to the mesh itself: /custom/group/meshes/E No longer supported
  3. Shorthand notation: meshes/

The underlying rule: Full paths are denoted by a leading slash and are based on the data path (/data/%T)

Remark: The shorthand notation achieves backwards compatibility with old openPMD files

@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented Jul 13, 2023

One nontrivial design question is how to deal with the traditional openPMD hierarchy, especially with the paths /data/%T/meshes and /data/%T/particles. There is no definition of any form of physical data for those groups in the openPMD standard, a normal openPMD file contains no attributes /data/%T/meshes/<attr_name>.

This suggests to me that in the extended openPMD standard with custom hierarchies these paths should be treated as "nothing special". Rather, they become the canonical, but not mandatory layout/organization of a simple openPMD dataset.

Two somewhat tricky consequences from this point of view:

1. There might be more than 1 meshes paths in the same group
E.g. the paths /data/%T/meshes and /data/%T/images might exist side by side. In the openPMD standard, this is no problem, in the openPMD-api this becomes challenging.
The problem is with the member Iteration::meshes (made even worse by the fact that it's not a getter method, but a data member). Should it point to /data/%T/meshes? To a union of both? What about writing?

Imo, the best solution is to consider Iteration::meshes a shorthand API that should not be used in more complex setups. Rather, since /data/%T/meshes is now just another normal path in the custom Iteration hierarchy, one should access iteration["meshes"].asContainerOf<Mesh>() for clarity.

Iteration::meshes will point to the first user-specified meshes path that takes the form of a shorthand notation. E.g., after series.setMeshesPath({"fields/"}), the call iteration.meshes will be the same as iteration["fields"].asContainerOf<Mesh>(). This ensures backwards compatibility.

(Note: Since Iteration::meshes is unfortunately a member and not a method, this means that the meshes path must be set before creating or opening any Iteration. And it was enough fighting with pointers to get things to that state.)

2. There might be custom data inside /data/%T/meshes
This is not really a problem, but could be unexpected. When setting series.setMeshesPath({"/meshes/E"}), you state that only the E field is a mesh. Since /data/%T/meshes is otherwise "just a regular group" with no special meaning, there might be other data in there, too, e.g. /data/%T/meshes/custom/hierarchy. It's the job of the user to create a meaningful data layout here.

With the more restricted definition of meshesPath and particlesPath, this is no longer supported.

@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from b0d370e to 4873e21 Compare July 24, 2023 14:34
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 53f968c to ba10099 Compare August 1, 2023 13:37
}
}

TEST_CASE("custom_hierarchies_no_rw", "[core]")

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function C_A_T_C_H_T_E_S_T_6 is unreachable (
autoRegistrar7
must be removed at the same time)
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from ba10099 to d86fa69 Compare August 1, 2023 14:43
@franzpoeschel franzpoeschel mentioned this pull request Aug 1, 2023
1 task
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 4 times, most recently from 31c7a25 to 1d47d17 Compare August 3, 2023 09:25
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 443147c to fd7a443 Compare May 24, 2024 14:37
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 033ca9f to 6da1081 Compare June 7, 2024 12:57
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from dd48459 to 106a28a Compare June 26, 2024 11:47
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 6d5e1cb to cfe8299 Compare July 17, 2024 09:20
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from cfe8299 to 5812de6 Compare July 23, 2024 14:13
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 5812de6 to 931619b Compare August 5, 2024 09:53
@franzpoeschel franzpoeschel changed the title [WIP] Custom Hierarchies Custom Hierarchies Aug 14, 2024
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from ed8f82c to f28de4f Compare August 16, 2024 08:26
@pgrete
Copy link

pgrete commented Sep 2, 2024

What's the status/schedule here?
I'm asking as we probably need this for a similar use case (storing additional data for checkpoint files) or I need a different hint with regard to our use case.
More specifically, I want to store some parameters (scalars and vectors of int, float, double,...) that may contain to different "packages".
Our current (in other output format) paradigm has been to store them in attributes called Params/PACKAGE_NAME/PARAM_NAME.
As far as I understand Params and PACKAGE_NAME could be a "group" within the standard (which is logically consistent with out data model).
Writing these with the openpmd-api also works (I see the data in the output files).
However, reading does not work

Reading 'Params/tracers/t_lookback' with type: St6vectorIdSaIdEE
terminate called after throwing an instance of 'openPMD::error::NoSuchAttribute'
  what():  Params/tracers/t_lookback

I assume that this is because the attributes in "groups" are not parsed by default.
Here's what OpenPMD sees (print the iteration->attributes():

Contains attribute: BlocksPerPE
Contains attribute: BoundaryConditions
Contains attribute: Coordinates
Contains attribute: IncludesGhost
Contains attribute: InputFile
Contains attribute: MaxLevel
Contains attribute: MeshBlockSize
Contains attribute: Multilevel
Contains attribute: NBDel
Contains attribute: NBNew
Contains attribute: NCycle
Contains attribute: NGhost
Contains attribute: NumDims
Contains attribute: NumMeshBlocks
Contains attribute: Refine
Contains attribute: RootGridDomain
Contains attribute: RootGridSize
Contains attribute: RootLevel
Contains attribute: WallTime
Contains attribute: derefinement_count
Contains attribute: dt
Contains attribute: loc.level-gid-lid-cnghost-gflag
Contains attribute: loc.lx123
Contains attribute: time
Contains attribute: timeUnitSI

and here's what's in the file

$ bpls ../parthenon.opmd.00002.bp -A 
  string    /author                                                                attr
  string    /basePath                                                              attr
  uint8_t   /bla                                                                   attr
  string    /comment                                                               attr
  int32_t   /data/2/BlocksPerPE                                                    attr
  string    /data/2/BoundaryConditions                                             attr
  string    /data/2/Coordinates                                                    attr
  int32_t   /data/2/IncludesGhost                                                  attr
  string    /data/2/InputFile                                                      attr
  int32_t   /data/2/MaxLevel                                                       attr
  int32_t   /data/2/MeshBlockSize                                                  attr
  int32_t   /data/2/Multilevel                                                     attr
  int32_t   /data/2/NBDel                                                          attr
  int32_t   /data/2/NBNew                                                          attr
  int32_t   /data/2/NCycle                                                         attr
  int32_t   /data/2/NGhost                                                         attr
  int32_t   /data/2/NumDims                                                        attr
  int32_t   /data/2/NumMeshBlocks                                                  attr
  double    /data/2/Params/Hydro/AdiabaticIndex                                    attr
  uint8_t   /data/2/Params/Hydro/calc_c_h                                          attr
  uint8_t   /data/2/Params/Hydro/calc_dt_hyp                                       attr
  double    /data/2/Params/Hydro/cfl                                               attr
  double    /data/2/Params/Hydro/cfl_diff                                          attr
  double    /data/2/Params/Hydro/dt_diff                                           attr
  uint8_t   /data/2/Params/Hydro/first_order_flux_correct                          attr
  double    /data/2/Params/Hydro/max_dt                                            attr
  int32_t   /data/2/Params/Hydro/nhydro                                            attr
  int32_t   /data/2/Params/Hydro/nscalars                                          attr
  uint8_t   /data/2/Params/Hydro/pack_in_one                                       attr
  int32_t   /data/2/Params/Hydro/scratch_level                                     attr
  double    /data/2/Params/Hydro/turbulence/accel_rms                              attr
  int32_t   /data/2/Params/Hydro/turbulence/inject_n_blobs                         attr
  int32_t   /data/2/Params/Hydro/turbulence/inject_once_at_cycle                   attr
  double    /data/2/Params/Hydro/turbulence/inject_once_at_time                    attr
  uint8_t   /data/2/Params/Hydro/turbulence/inject_once_on_restart                 attr
  double    /data/2/Params/Hydro/turbulence/kpeak                                  attr
  int32_t   /data/2/Params/Hydro/turbulence/rescale_once_at_cycle                  attr
  double    /data/2/Params/Hydro/turbulence/rescale_once_at_time                   attr
  uint8_t   /data/2/Params/Hydro/turbulence/rescale_once_on_restart                attr
  double    /data/2/Params/Hydro/turbulence/rescale_to_rms_Ms                      attr
  uint32_t  /data/2/Params/Hydro/turbulence/rseed                                  attr
  double    /data/2/Params/Hydro/turbulence/sol_weight                             attr
  double    /data/2/Params/Hydro/turbulence/t_corr                                 attr
  uint8_t   /data/2/Params/tracers/enabled                                         attr
  int32_t   /data/2/Params/tracers/n_lookback                                      attr
  double    /data/2/Params/tracers/num_tracers_per_cell                            attr
  int32_t   /data/2/Params/tracers/rng_seed                                        attr
  double    /data/2/Params/tracers/t_lookback                                      attr
  int32_t   /data/2/Refine                                                         attr
  double    /data/2/RootGridDomain                                                 attr
  int32_t   /data/2/RootGridSize                                                   attr
  int32_t   /data/2/RootLevel                                                      attr
  double    /data/2/WallTime                                                       attr
  int32_t   /data/2/derefinement_count                                             attr
  double    /data/2/dt                                                             attr
  int32_t   /data/2/loc.level-gid-lid-cnghost-gflag                                attr
  int64_t   /data/2/loc.lx123                                                      attr
  string    /data/2/meshes/acc_acc_0_lvl0/axisLabels                               attr
  string    /data/2/meshes/acc_acc_0_lvl0/dataOrder                                attr
  string    /data/2/meshes/acc_acc_0_lvl0/geometry                                 attr
...

Any short/long term recommendations?

IOHandler()->enqueue(IOTask(this, pList));
std::string version = s.openPMD();
bool hasMeshes = false;
bool hasParticles = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: unused

@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 20f2cd5 to ce5704d Compare November 15, 2024 14:23
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 03ba4bc to 1032573 Compare December 17, 2024 11:00
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 1032573 to cbe5863 Compare February 21, 2025 11:05
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 0c9f7ee to 2f620f9 Compare March 26, 2025 14:28
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from 1f16ac3 to 7134070 Compare April 4, 2025 08:32
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 7134070 to ea58ad1 Compare April 22, 2025 08:44
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch 2 times, most recently from bf9f44c to 08c421d Compare August 11, 2025 15:06
JSON backend: Fail when trying to open non-existing groups

Insert CustomHierarchy class to Iteration

Help older compilers deal with this

Add vector variants of meshes/particlesPath

Move meshes and particles over to CustomHierarchies class

Move dirtyRecursive to CustomHierarchy

Move Iteration reading logic to CustomHierarchy

Move Iteration flushing logic to CustomHierarchy class

Support for custom datasets

Treat "meshes"/"particles" as normal subgroups

Introduction of iteration["meshes"].asContainerOf<Mesh>() as a more
explicit variant for iteration.meshes.

Regex-based list of meshes/particlesPaths

More extended testing

Fix Python bindings without adding new functionality yet

Overload resolution

Add simple Python bindings and an example

Replace Regexes with Globbing

TODO: Since meshes/particles can no longer be directly addressed with
this, maybe adapt the class hierarchy to disallow mixed groups that
contain meshes, particles, groups and datasets at the same time.

Only maybe though..

Move .meshes and .particles back to Iteration class

The have their own meaning now and are no longer just carefully maintained
for backwards compatibility.
Instead, they are supposed to serve as a shortcut to all openPMD data
found further down the hierarchy.

Some fixes in read error handling

More symmetric design for container types

Don't write unitSI in custom datasets

Discouraged support for custom datasets inside the particlesPath

Fix after rebase: dirtyRecursive

Fixes to the dirty/dirtyRecursive logic

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Some cleanup in CustomHierarchies class

Use polymorphism for meshes/particlesPath in Python

Remove hasMeshes / hasParticles logic

Sort dirty files

This is a workaround only, only one file should be dirty in this test.

Formatting

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Fixes after rebase
@franzpoeschel franzpoeschel force-pushed the topic-custom-hierarchies branch from 08c421d to 2b708a2 Compare November 21, 2025 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants