Skip to content

Conversation

@kmontemayor2-sc
Copy link
Collaborator

@kmontemayor2-sc kmontemayor2-sc commented Dec 1, 2025

Scope of work done

As we discussed offline - creating these class(es) so that client code can operate on some object instead of calling helper functions.

The purpose of RemoteDataset is to allow client code to access information stored in the storage cluster. See below for example client (e.g. user-controlled) code.

def inference_process(local_rank, cluster_info):  
  gigl.distribtued.graph_store.init_client_process(local_rank, cluster_info)
  remote_dataset = RemoteDistDataset(local_rank, cluster_info)
  inference_input: list[Tensor] = remote_dataset.get_node_ids()

  loader = DistNeighborLoader(
     dataset=remote_dataset,
     input_nodes=inference_input,
  )
  for data in loader: ...

# DistNeighbtheorLoader will also use `remote_dataset` to get required info about the dataset
class DistNeighbtheorLoader:
    def __init__(self, dataset: RemoteDataset | DistDataset, ...):
       ...
       self._node_feature_info = dataset.get_node_feature_info() # Needs to be added
       self._edge_feature_info = dataset.get_edge_feature_info() # Needs to be added

NOTE: I am intentionally making this a different class from DistDataset - I don't think we should directly conflate the two, they are different objects and I don't think we should treat them as the same thing (or as a subclass). For the most part, I don't think users are going to need to access the partition books or edge indices, and if they do we can always expose them via this class :)

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

@svij-sc
Copy link
Collaborator

svij-sc commented Dec 4, 2025

cc @mkolodner-sc for review as well.

Copy link
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, most existing use cases want to define their custom "input_nodes" will be; how are we planning on enabling this lever as well? What theoretical performance considerations do we need to make here / make it easier for users?

Copy link
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id pref to get @mkolodner-sc 's review on the PR too.

@kmontemayor2-sc
Copy link
Collaborator Author

/e2e_test

1 similar comment
@kmontemayor2-sc
Copy link
Collaborator Author

/e2e_test

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

GiGL Automation

@ 21:43:47UTC : 🔄 E2E Test started.

Copy link
Collaborator

@mkolodner-sc mkolodner-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle! A few comments from me

@kmontemayor2-sc
Copy link
Collaborator Author

/unit_test_py

@kmontemayor2-sc
Copy link
Collaborator Author

/integration_test

@github-actions
Copy link
Contributor

github-actions bot commented Dec 13, 2025

GiGL Automation

@ 01:38:45UTC : 🔄 Python Unit Test started.

@ 02:50:00UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 13, 2025

GiGL Automation

@ 01:38:50UTC : 🔄 Integration Test started.

@ 02:59:36UTC : ✅ Workflow completed successfully.

Copy link
Collaborator

@mkolodner-sc mkolodner-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle!

@kmontemayor2-sc kmontemayor2-sc marked this pull request as ready for review December 15, 2025 22:53
@kmontemayor2-sc kmontemayor2-sc added this pull request to the merge queue Dec 15, 2025
Merged via the queue into main with commit 9b57bb9 Dec 16, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants