-
Notifications
You must be signed in to change notification settings - Fork 12
Add RemoteDistDataset to operate on the dataset from compute nodes in graph store mode
#404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @mkolodner-sc for review as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, most existing use cases want to define their custom "input_nodes" will be; how are we planning on enabling this lever as well? What theoretical performance considerations do we need to make here / make it easier for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Id pref to get @mkolodner-sc 's review on the PR too.
|
/e2e_test |
1 similar comment
|
/e2e_test |
GiGL Automation@ 21:43:47UTC : 🔄 |
mkolodner-sc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kyle! A few comments from me
python/tests/integration/distributed/graph_store/graph_store_integration_test.py
Outdated
Show resolved
Hide resolved
|
/unit_test_py |
|
/integration_test |
GiGL Automation@ 01:38:45UTC : 🔄 @ 02:50:00UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 01:38:50UTC : 🔄 @ 02:59:36UTC : ✅ Workflow completed successfully. |
mkolodner-sc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kyle!
Scope of work done
As we discussed offline - creating these class(es) so that client code can operate on some object instead of calling helper functions.
The purpose of
RemoteDatasetis to allow client code to access information stored in the storage cluster. See below for example client (e.g. user-controlled) code.NOTE: I am intentionally making this a different class from
DistDataset- I don't think we should directly conflate the two, they are different objects and I don't think we should treat them as the same thing (or as a subclass). For the most part, I don't think users are going to need to access the partition books or edge indices, and if they do we can always expose them via this class :)Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO