Add Rucio/Chaining examples for opening full datasets by DID#22
Draft
Add Rucio/Chaining examples for opening full datasets by DID#22
Conversation
Co-authored-by: wdconinc <4656391+wdconinc@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Create Python scripts for processing Rucio datasets using uproot and ROOT
Add Rucio/Chaining examples for opening full datasets by DID
Mar 10, 2026
Contributor
|
@sjdkay FYI re: file access tutorial. May be useful to integrate some rucio file access patterns to avoid people downloading lots of files locally. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No existing examples showed how to open a full Rucio dataset (all files) by DID — only single-file access was demonstrated. Adds three self-contained scripts under
Rucio/Chaining/using the Rucio Python client API directly.Scripts
uproot_example.py— lists dataset files, resolves PFNs, readsEventHeader.eventNumberacross all files with uproot, plots histogram via matplotlibtchain_example.py— same PFN resolution, chains all files withROOT.TChainrdataframe_example.py— same PFN resolution, processes all files withROOT.RDataFrameKey pattern: getting all PFNs
nrandom=1inlist_replicasselects 1 file from the entire dataset (not 1 replica per file). The correct approach iteratesreplica['pfns']whose keys are the PFN URLs:This yields one PFN per file across the full dataset, skipping any files with no open-access replicas.
Original prompt
Create Python scripts demonstrating how to open and process full Rucio datasets by their DID (Data Identifier) using both uproot and ROOT.
Directory Structure
Create a new directory
Rucio/Chaining/containing three Python scripts.Files to Create
1.
Rucio/Chaining/uproot_example.py2.
Rucio/Chaining/tchain_example.py3.
Rucio/Chaining/rdataframe_example.pyRequirements
EventHeader.eventNumberand save it as PNGThe following is the prior conversation context from the user's chat exploration (may be truncated):
User: This page includes examples of how to open a single file, but no example of how a full dataset can be loaded. Develop a python example for both uproot and ROOT that opens a full dataset by its DID.
Assistant: I'll help you create Python examples for opening a full dataset by its DID using both uproot and ROOT. These examples will use Rucio to get the list of files in the...
This pull request was created from Copilot chat.
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.