Skip to content

Try to find if a file is corrupt and throw if writing more than 4 GiB#29

Open
jmcarcell wants to merge 1 commit intomasterfrom
read-write-more4gb
Open

Try to find if a file is corrupt and throw if writing more than 4 GiB#29
jmcarcell wants to merge 1 commit intomasterfrom
read-write-more4gb

Conversation

@jmcarcell
Copy link
Contributor

@jmcarcell jmcarcell commented Mar 4, 2026

The issue in iLCSoft/LCIO#218 (at least one of them) is that the SIO format stores record and block data lengths as 32 bit unsigned integers (4 bytes, max ~4 GiB). When writing a record bigger than that, the std::size_t length is narrowed to unsigned int before being written to disk, producing a wrapped (modulo 2^32) value. Data is written correctly, but the length fields are wrong. I'm not brave enough to change the format, so I have added some guards to find it and prevent it from happening. If the file format can be changed easily, then either the lengths are changed to 64 bits or another 32 bit unsigned integer is added.

BEGINRELEASENOTES

  • Try to find if a block is corrupt (may not always work) when reading
  • Throw if writing a block or record bigger than 4 GiB since the block and record lengths are stored as 32 bit integers
  • Make some related arithmetic operations not be limited to 32 bits

ENDRELEASENOTES

Possibly fix (by not letting it happen) iLCSoft/LCIO#218

Now, when dumping the file, a better error message is printed:

sio-dump -d neutrinoGun_digi_4200MB.slcio
---------------------------------------------------------------------------------------------------------------------
Record name                    | Start           | End             | Options      | Header len | Data len
LCEventHeader                  | 0               | 1168            | 0            | 40         | 1128 (1128)
---------------------------------------------------------------------------------------------------------------------
Block name                     | Start           | End             | Version      | Header len | Data len
---------------------------------------------------------------------------------------------------------------------
EventHeader                    | 0               | 1128            | 2.22         | 28         | 1100

---------------------------------------------------------------------------------------------------------------------
Record name                    | Start           | End             | Options      | Header len | Data len
LCEvent                        | 1168            | 153041592       | 0            | 32         | 153040392 (153040392)
---------------------------------------------------------------------------------------------------------------------
Block name                     | Start           | End             | Version      | Header len | Data len
---------------------------------------------------------------------------------------------------------------------
ERROR: /SIO/source/src/api.cc (l.292) in extract_block: Block 'IBTrackerHits': block_len (484433168) exceeds remaining record buffer (152938584 bytes). The record is likely corrupt (possible 32 bit overflow of data length at write time). [out_of_range]

See iLCSoft/LCIO#218, the SIO format stores record and
block data lengths as 32 bit unsigned integers (4 bytes, max ~4 GiB). These
changes will try to find if the issue exists when reading a file and prevent it
from happening when writing.
Copy link
Contributor

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good first step. Given that we already have some files with excessive lengths written, I suspect that we have to potentially come back to "properly" fix this, once people get the new error when writing files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants