Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt] Add ItemSet/Dict4 #7382

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Skeleton003
Copy link
Collaborator

Description

Add ItemSet/Dict4 that will later replace current ItemSet/Dict.
Only add a few test cases because most of the functionalities already have tests covered above. The code can be reused when replacing.

TODO:

  • add formal docstring.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented May 8, 2024

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented May 8, 2024

Commit ID: 26a95d9

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link


__all__ = ["ItemSet", "ItemSetDict"]
__all__ = ["ItemSet", "ItemSetDict", "ItemSet4", "ItemSetDict4"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more meaningful name instead of ItemSet4?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a temporary name since it will be replaced right away.


class ItemSet4(Dataset):
r"""Class for iterating over tensor-like data.
Experimental. Implemented only __getitem__() accepting slice and list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring and examples to be added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the high-level design doc here.


def __init__(
self,
items: Union[torch.Tensor, Mapping, Tuple[Mapping]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need Mapping? we do need int, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm considering to disable int, because int doesn't contain any dtype info. To generate an all_nodes_set we can always use a tensor scalar (which is how we do it now). What's your opinion?
  2. Mapping is indded to "large-scope", but I'm trying to differentiate ItemSet4 from existing ItemSet which takes in Iterable. I think Sequence might be a good choice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -442,3 +443,188 @@ def __repr__(self) -> str:
itemsets=itemsets_str,
names=self._names,
)


class ItemSet4(Dataset):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just removing __iter__ from existing ItemSet/Dict and inherit from Dataset only? Other code are directly copied? If yes, could we just modify upon the existing one directly? will it break existing ItemSampler?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just simply copying existing ItemSet. But I can try to modify upon the existing code directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants