Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make metapath walk faster by grouping adjacency lists by node type #1399

Open
wants to merge 40 commits into
base: develop
Choose a base branch
from

Conversation

kjun9
Copy link
Contributor

@kjun9 kjun9 commented Apr 29, 2020

Pulled out while working on #934 - currently this demo is a bit unusable and doesn't provide a good enough experience to be converted into a demo notebook, since it takes hours for it to run on a laptop, but with the changes from this PR, it should run within ~ 10 minutes or so depending on the parameters used.

This mainly updates EdgeData to store an inner set of dicts within each adjacency list, so that we group adjacency elements by their node type. This could probably be useful for other heterogeneous algorithms that want to traverse the graph either for random walks or SAGE-type sampling - currently it's just been used for Metapath2Vec specifically. At its current state in this PR, the speedup for Metapath2vec looks pretty great, but graph.neighbours becomes quite a bit slower for homogeneous graphs

Benchmark against develop for metapath walk:

----------------------------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------------------------
Name (time in us)                                                   Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_uniformrandommetapathwalk (NOW)                 409.0180 (1.0)       1,415.0450 (1.0)         508.5108 (1.0)         89.4222 (1.0)         480.2570 (1.0)         38.4945 (1.0)       126;171  1,966.5264 (1.0)        1349           1
test_benchmark_uniformrandommetapathwalk (0007_1cca6b6)     11,785.8720 (28.82)    20,027.5200 (14.15)    15,493.4845 (30.47)    1,442.1304 (16.13)    15,570.4430 (32.42)    1,697.6358 (44.10)        15;3     64.5433 (0.03)         61           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Things do get a bit slower for just getting neighbours without caring about types, and constructing the stellargraph object

---------------------------------------------------------------------------------------- benchmark 'StellarGraph neighbours': 2 tests ----------------------------------------------------------------------------------------
Name (time in us)                                     Min                   Max                  Mean              StdDev              Median                IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_get_neighbours (0009_1cca6b6)     673.0350 (1.0)      1,753.2520 (1.0)        772.2845 (1.0)      112.2834 (1.0)      738.3375 (1.0)      46.6965 (1.0)       120;151  1,294.8595 (1.0)        1244           1
test_benchmark_get_neighbours (NOW)              874.3200 (1.30)     1,957.4830 (1.12)     1,030.1057 (1.33)     156.3577 (1.39)     985.7585 (1.34)     78.7610 (1.69)       96;114    970.7742 (0.75)        920           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------- benchmark 'StellarGraph creation (time)': 12 tests ------------------------------------------------------------------------------------
Name (time in ms)                                              Min                 Max               Mean             StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_creation[None-0-0] (0011_1cca6b6)            7.0827 (1.0)       10.4290 (1.0)       7.7464 (1.0)       0.5459 (1.01)      7.5999 (1.0)      0.3738 (1.0)         15;10  129.0918 (1.0)         123           1
test_benchmark_creation[None-0-0] (NOW)                     7.2598 (1.03)      11.9819 (1.15)      7.8674 (1.02)      0.5400 (1.0)       7.7635 (1.02)     0.4122 (1.10)         11;5  127.1069 (0.98)        114           1
test_benchmark_creation[None-100-200] (0011_1cca6b6)        7.7215 (1.09)      11.0146 (1.06)      8.6099 (1.11)      0.5706 (1.06)      8.4670 (1.11)     0.4709 (1.26)         18;5  116.1448 (0.90)         95           1
test_benchmark_creation[None-100-200] (NOW)                 8.2406 (1.16)      80.2710 (7.70)      9.9185 (1.28)      7.0318 (13.02)     8.9742 (1.18)     0.4729 (1.26)         1;12  100.8218 (0.78)        104           1
test_benchmark_creation[100-0-0] (0011_1cca6b6)             9.9753 (1.41)      16.8219 (1.61)     10.9606 (1.41)      0.8742 (1.62)     10.7575 (1.42)     0.8113 (2.17)         12;4   91.2356 (0.71)         91           1
test_benchmark_creation[100-0-0] (NOW)                     10.0131 (1.41)      14.2590 (1.37)     10.8740 (1.40)      0.8288 (1.53)     10.5627 (1.39)     0.7808 (2.09)          9;7   91.9623 (0.71)         91           1
test_benchmark_creation[100-100-200] (0011_1cca6b6)        10.3021 (1.45)      14.3739 (1.38)     11.7725 (1.52)      0.8119 (1.50)     11.6448 (1.53)     1.1646 (3.12)         18;2   84.9437 (0.66)         79           1
test_benchmark_creation[100-100-200] (NOW)                 11.6161 (1.64)      27.1939 (2.61)     14.2690 (1.84)      3.7345 (6.92)     12.6178 (1.66)     2.2522 (6.02)        10;10   70.0822 (0.54)         78           1
test_benchmark_creation[None-1000-5000] (0011_1cca6b6)     18.3149 (2.59)     100.4675 (9.63)     21.3902 (2.76)     11.4734 (21.25)    19.6913 (2.59)     1.2455 (3.33)          1;4   46.7505 (0.36)         50           1
test_benchmark_creation[100-1000-5000] (0011_1cca6b6)      20.9809 (2.96)      25.4421 (2.44)     22.9116 (2.96)      1.2696 (2.35)     22.8039 (3.00)     2.0631 (5.52)         16;0   43.6460 (0.34)         37           1
test_benchmark_creation[None-1000-5000] (NOW)              30.8860 (4.36)     114.9587 (11.02)    45.6015 (5.89)     29.9989 (55.56)    32.7245 (4.31)     5.6656 (15.16)         4;4   21.9291 (0.17)         26           1
test_benchmark_creation[100-1000-5000] (NOW)               34.9050 (4.93)     148.0317 (14.19)    56.4356 (7.29)     38.1269 (70.61)    37.4651 (4.93)     8.3243 (22.27)         4;4   17.7193 (0.14)         21           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@codeclimate
Copy link

codeclimate bot commented Apr 29, 2020

Code Climate has analyzed commit 1e4d782 and detected 3 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 1
Security 2

View more on Code Climate.

@kjun9 kjun9 marked this pull request as ready for review April 29, 2020 05:14
@huonw
Copy link
Member

huonw commented Apr 29, 2020

Unfortunately this seems to duplicate a bunch of the node-data/edge-data interaction work from #1220, unlucky! 🙁

A 29× speed-up is pretty nice!

Things do get a bit slower for just getting neighbours without caring about types, and constructing the stellargraph object

As a potential tweak to the implementation of this, maybe we could take inspiration from #1291 and #1316, expanding EdgeData to have multiple modes of indexing/caching each of which are lazily computed as required:

This has the downside of requiring multiple batches of memory, but it may be the right trade-off.

In particular, I'm assuming that the workflow would typically be an interactive session with small-ish graphs for experimenting with different algorithms (thus requiring multiple indexing schemes), where the user has more control and monitoring of memory usage. A real ML pipeline on larger data would then be only using one or two algorithms, and thus only creating and storing the indices that are really needed.

What do you think?

@kjun9
Copy link
Contributor Author

kjun9 commented Apr 29, 2020

expanding EdgeData to have multiple modes of indexing/caching each of which are lazily computed as required

That's an interesting idea. So we would build the index automatically when graph.neighbors is called with a non-None other_node_type value?

This has the downside of requiring multiple batches of memory, but it may be the right trade-off

As an alternative, maybe it could be specific generators/walkers that force a graph to build the index, and default to the original behaviour of filtering the adjacency list after obtaining them if the index doesn't exist

class StellarGraph:
    def build_adjacency_index(index_by)
       # do nothing if index exists otherwise build

class UniformRandomMetapathWalk:
    def __init__(self, graph, ...):
        self._graph = graph
    def run(self, ...):
        self._graph.build_adjacency_index("other_node_type")

or something like that?

@kieranricardo
Copy link
Contributor

kieranricardo commented Apr 29, 2020

nice speedup!

Unfortunately I think part of this speedup is due to not having to pay the cost of converting the node ids to ilocs in node_type - #1267 gets a 7x speedup on metapath. Also, changing node_type to handle multiple nodes and doing:

neighbour_types = self.graph.node_type(neighbours, use_ilocs=True)
neighbours = [
    n_node
    for n_node, n_type in zip(neighbours, neighbour_types)
    if n_type == metapath[d]
]

gets another 2x speedup, so 15x in total but still shy of your 29x! But maybe this could be a quick way of speeding up metapath in time for the release, what do you think?

That being said, I think this is still a valuable way of storing the edges, and we should definitely implement this as an edge caching option!

@kieranricardo
Copy link
Contributor

kieranricardo commented Apr 29, 2020

On the topic of edge caching, for unweighted graphs I think we can get a ~20x speedup (on top of this 29x speedup) if we store the node_ids/node_ilocs instead of edge_ilocs in the adj lists. It looks like 95% of the time in neighbors is spent doing pandas lookups with the edge_ilocs.

In the case of node_ilocs, this wouldn't cost any extra memory, infact if the node_ilocs type is smaller than edge_ilocs this will save memory.

@kieranricardo
Copy link
Contributor

@kjun9 can you run the allocation benchmarks for this PR?

@huonw
Copy link
Member

huonw commented Apr 29, 2020

That's an interesting idea. So we would build the index automatically when graph.neighbors is called with a non-None other_node_type value?

Yeah.

As an alternative, maybe it could be specific generators/walkers that force a graph to build the index, and default to the original behaviour of filtering the adjacency list after obtaining them if the index doesn't exist

I guess that's equivalent in my mind to EdgeData computing them when other_node_type is specified, since that'll happen if (and only if) it's being used by an algorithm that would benefit from the indexing; or maybe I'm missing something?

gets another 2x speedup, so 15x in total but still shy of your 29x! But maybe this could be a quick way of speeding up metapath in time for the release, what do you think?

If you've done that work and measured it, can we have a PR for it?

On the topic of edge caching, for unweighted graphs I think we can get a ~20x speedup (on top of this 29x speedup) if we store the node_ids/node_ilocs instead of edge_ilocs in the adj lists. It looks like 95% of the time in neighbors is spent doing pandas lookups with the edge_ilocs.

I think that edge_ilocs vs. node_ilocs is a separate problem and worth having a separate issue for, to discuss without getting totally distracted from this one. I specifically chose the current approach because it means we can easily get to whatever data we want about the edge, such as the edge weight and potentially edge features (#1326). Although you're correct that it's probably unnecessarily expensive for an unweighted graph without edge features.

pandas lookups with the edge_ilocs

Typo: "numpy lookups"?

@kjun9 can you run the allocation benchmarks for this PR?

I think there's enough above that we can potentially pause effort on this PR for a bit (at least until the other PRs land) and not spent too much time doing comparisons with it until we've done that baseline work, since those comparisons will be invalidated anyway?

@kjun9
Copy link
Contributor Author

kjun9 commented Apr 29, 2020

I guess that's equivalent in my mind to EdgeData computing them when other_node_type is specified, since that'll happen if (and only if) it's being used by an algorithm that would benefit from the indexing; or maybe I'm missing something?

I was thinking that building/not-building an index "explicitly" would potentially be useful e.g. an algorithm just requires a single call to graph.neighbors with other_node_type.. but yeah not sure if that's actually a real use-case 🤷

I've adjusted the code to do the building lazily, taking inspiration from #1291 and also thinking a little bit about the extensions

I'll include the new benchmarks here anyway but we can keep in mind the comparisons will be invalidated by the other PRs (still a bit slower due to additional if statements, but not quite as bad, and the allocation was probably much worse before):

------------------------------------------------------------------------------------------------------------- benchmark 'StellarGraph creation': 12 tests --------------------------------------------------------------------------------------------------------------
Name (time in s)                                                                 Min                       Max                      Mean                StdDev                    Median                   IQR            Outliers     OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_allocation_benchmark_creation[None-0-0] (NOW)                       13,575.0000 (1.0)         16,431.0000 (1.00)        14,816.6000 (1.03)       986.0009 (2.05)        14,671.0000 (1.03)     2,192.0000 (inf)           8;0  0.0001 (0.97)         20           1
test_allocation_benchmark_creation[None-0-0] (0014_1cca6b6)              14,223.0000 (1.05)        16,399.0000 (1.0)         14,353.4000 (1.0)        481.5141 (1.00)        14,247.0000 (1.0)          0.0000 (1.0)           1;2  0.0001 (1.0)          20           1
test_allocation_benchmark_creation[100-0-0] (0014_1cca6b6)               15,231.0000 (1.12)        18,503.0000 (1.13)        15,544.2000 (1.08)       699.1328 (1.46)        15,415.0000 (1.08)         0.0000 (1.0)           1;4  0.0001 (0.92)         20           1
test_allocation_benchmark_creation[100-0-0] (NOW)                        15,655.0000 (1.15)        18,279.0000 (1.11)        15,998.6000 (1.11)       578.2964 (1.20)        15,839.0000 (1.11)         0.0000 (1.0)           2;3  0.0001 (0.90)         20           1
test_allocation_benchmark_creation[None-100-200] (0014_1cca6b6)          56,499.0000 (4.16)        58,812.0000 (3.59)        56,780.2500 (3.96)       479.9861 (1.0)         56,683.0000 (3.98)         0.0000 (1.0)           1;2  0.0000 (0.25)         20           1
test_allocation_benchmark_creation[None-100-200] (NOW)                   65,295.0000 (4.81)        68,991.0000 (4.21)        66,511.8000 (4.63)     1,049.2252 (2.19)        66,391.0000 (4.66)     2,192.0000 (inf)           7;0  0.0000 (0.22)         20           1
test_allocation_benchmark_creation[100-100-200] (0014_1cca6b6)           94,571.0000 (6.97)        99,138.0000 (6.05)        95,886.1500 (6.68)     1,013.8430 (2.11)        95,667.0000 (6.71)       548.0000 (inf)           4;4  0.0000 (0.15)         20           1
test_allocation_benchmark_creation[100-100-200] (NOW)                   104,169.0000 (7.67)       110,461.0000 (6.74)       105,515.6000 (7.35)     1,364.7053 (2.84)       105,265.0000 (7.39)       640.0000 (inf)           1;5  0.0000 (0.14)         20           1
test_allocation_benchmark_creation[None-1000-5000] (0014_1cca6b6)       613,899.0000 (45.22)      618,945.0000 (37.74)      614,303.3000 (42.80)    1,093.1993 (2.28)       614,067.0000 (43.10)        0.0000 (1.0)           1;3  0.0000 (0.02)         20           1
test_allocation_benchmark_creation[None-1000-5000] (NOW)                693,529.0000 (51.09)      699,983.0000 (42.68)      694,938.5000 (48.42)    1,450.7666 (3.02)       694,625.0000 (48.76)    1,736.0000 (inf)           1;1  0.0000 (0.02)         20           1
test_allocation_benchmark_creation[100-1000-5000] (0014_1cca6b6)      1,008,299.0000 (74.28)    1,013,701.0000 (61.81)    1,009,657.5000 (70.34)    1,161.5682 (2.42)     1,009,395.0000 (70.85)      548.0000 (inf)           4;4  0.0000 (0.01)         20           1
test_allocation_benchmark_creation[100-1000-5000] (NOW)               1,089,323.0000 (80.24)    1,095,969.0000 (66.83)    1,090,687.3000 (75.99)    1,432.8181 (2.99)     1,090,419.0000 (76.54)      640.0000 (inf)           1;5  0.0000 (0.01)         20           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------- benchmark 'StellarGraph neighbours': 2 tests ----------------------------------------------------------------------------------------
Name (time in us)                                     Min                   Max                Mean              StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_get_neighbours (0014_1cca6b6)     700.0500 (1.0)      2,304.0250 (1.05)     770.7208 (1.0)      101.4645 (1.0)      741.4980 (1.0)       51.4912 (1.0)         61;79        1.2975 (1.0)         841           1
test_benchmark_get_neighbours (NOW)              732.0860 (1.05)     2,200.7320 (1.0)      923.6836 (1.20)     246.5478 (2.43)     819.3160 (1.10)     139.9090 (2.72)      119;127        1.0826 (0.83)        886           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------- benchmark: 2 tests -----------------------------------------------------------------------------------------------------------------
Name (time in us)                                                   Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_uniformrandommetapathwalk (NOW)                 456.3170 (1.0)         849.3380 (1.0)         520.7856 (1.0)         60.3954 (1.0)         498.3265 (1.0)         45.2690 (1.0)         38;33  1,920.1758 (1.0)         342           1
test_benchmark_uniformrandommetapathwalk (0016_1cca6b6)     13,007.4970 (28.51)    19,690.7460 (23.18)    16,295.5201 (31.29)    1,387.3179 (22.97)    16,282.2605 (32.67)    2,086.3395 (46.09)        22;0     61.3666 (0.03)         64           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. I think we can even extend this idea to handle edge types too, and so get optimised and memory use queries for HinSAGE (since that currently constructs its own type_adjacency_list thing?), and for something like #433.

def _numpyise_inner(d, dtype):
return {k: _numpyise(v, dtype) for k, v in d.items()}

self._edges_index_by_other_node_type = _AdjIndex(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up being a dictionary like {node_id: {node_type: array}}, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

in_dict.setdefault(tgt, {}).setdefault(src_type, []).append(i)
out_dict.setdefault(src, {}).setdefault(tgt_type, []).append(i)

undirected.setdefault(tgt, {}).setdefault(src_type, []).append(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder if it actually makes sense to compute each of these separately (following #1316)? E.g. an algorithm that's using undirected probably won't touch in_dict/out_dict, and vice versa. I thing this might be especially important for this, because there'll be a lot more memory overhead from having many smaller np.ndarray instances (i.e. there'll be more so more of the fixed overheads of each one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah makes sense. I might try to extend the _AdjIndex type to make it lazy for individual dicts

tests/core/test_graph.py Show resolved Hide resolved
@kjun9
Copy link
Contributor Author

kjun9 commented May 4, 2020

I think this now conflicts a lot with #1435 and the new speedups I'm getting are not as huuuuge compared to what's already landed in develop, so I might try to run the notebook for #934 independently of this PR, and this might not have to land

------------------------------------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------------------------------------
Name (time in us)                                                  Min                   Max                  Mean              StdDev                Median                 IQR            Outliers         OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_uniformrandommetapathwalk (NOW)                576.2610 (1.0)      1,260.3640 (1.0)        667.4091 (1.0)       97.5010 (1.0)        637.5760 (1.0)       48.6500 (1.0)       104;122  1,498.3313 (1.0)        1018           1
test_benchmark_uniformrandommetapathwalk (0018_9506f93)     1,679.1980 (2.91)     4,186.3280 (3.32)     2,183.5397 (3.27)     266.7055 (2.74)     2,150.2190 (3.37)     245.3337 (5.04)        87;19    457.9720 (0.31)        417           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@huonw
Copy link
Member

huonw commented May 4, 2020

A 3× speedup still seems worth it, but yeah, maybe we can leave it till after the 1.0 crunch.

@kjun9
Copy link
Contributor Author

kjun9 commented May 4, 2020

Hmm maybe the benchmark test doesn't quite provide the full picture.. notebook seems to be much slower than 3x (been running random walks for ~2 hours)

@huonw
Copy link
Member

huonw commented May 4, 2020

Hm, 2 hours with this PR or with develop? #1446 is related too.

@kjun9
Copy link
Contributor Author

kjun9 commented May 4, 2020

Hm, 2 hours with this PR or with develop? #1446 is related too.

with develop. With this PR, the whole notebook runs within ~10 mins or so

Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, although I'm not sure we've quite hit on the perfect design of the laziness (both before in the current develop and in this PR's current state).


walks.append(
list(self.graph.node_ilocs_to_ids(walk))
) # store the walk

return walks

def _augment_and_group_metapaths(self, metapaths, walk_length):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "augment" mean here? Maybe:

Suggested change
def _augment_and_group_metapaths(self, metapaths, walk_length):
def _extend_and_group_metapaths(self, metapaths, walk_length):

or even

Suggested change
def _augment_and_group_metapaths(self, metapaths, walk_length):
def _repeat_and_group_metapaths(self, metapaths, walk_length):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extend sounds good to me. augment came from me taking the phrasing from an existing comment that I removed # augment metapath to be length long.

@@ -762,7 +762,7 @@ def f():
@pytest.mark.parametrize("num_nodes,num_edges", [(0, 0), (100, 200), (1000, 5000)])
# features or not, to capture their cost
@pytest.mark.parametrize("feature_size", [None, 100])
@pytest.mark.parametrize("force_adj_lists", [None, "directed", "undirected", "both"])
@pytest.mark.parametrize("force_adj_lists", [None, "homogeneous", "by_other_node_type"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dramatically changing what this benchmark is measuring; previously it was capturing the cost of using a graph in undirected mode or directed mode (or neither (adjacency matrix only), or both), now it's only capturing the cost of using a graph in either:

  • adjacency matrix only
  • both undirected and directed mode for homogeneous
  • both undirected and directed mode for other_node_type

This benchmark is kinda completely exploding in terms of options, but maybe it instead makes sense to do this as a new parameter like:

@pytest.mark.parametrize("adj_list_directness", ["homogeneous", "by_other_node_type"])

Potentially with a:

if force_adj_lists is None and adj_list_directness != "homogeneous":
    pytest.skip("only measure force_adj_lists=None once")

However, maybe it's not entirely useful to know the specific cost of the undirected vs. directed adjacencies.

@@ -317,11 +317,11 @@ def test_init_parameters(self):
assert all(np.array_equal(w1, w2) for w1, w2 in zip(run_1, run_2))

def test_benchmark_uniformrandommetapathwalk(self, benchmark):
g = example_graph_random(n_nodes=50, n_edges=500, node_types=2)
g = example_graph_random(n_nodes=50, n_edges=2000, node_types=4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why did this need to become larger? The metapaths got truncated often in practice, despite the theory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried increasing it since I wasn't so convinced that the speedups reported from the benchmark were as good as what we end up seeing in practice (#1399 (comment) vs #1399 (comment) the larger benchmark reports much higher speedup)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the number of edges increasing or the number of head nodes increasing?

mrw = UniformRandomMetaPathWalk(g)

# this should be made larger to be more realistic, when it is fast enough
nodes = np.arange(0, 5)
nodes = np.arange(0, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably now fast enough to switch to being the same as the other ones, e.g.

nodes = np.arange(0, 50)

Suggested change
nodes = np.arange(0, 10)
nodes = np.arange(0, 50)

Comment on lines 312 to 314
self.undirected = self.init_index(ins=True, outs=True)
self.ins = self.init_index(ins=True, outs=False)
self.outs = self.init_index(ins=False, outs=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than duplicate the init_index calling and assignment logic, maybe this could be:

Suggested change
self.undirected = self.init_index(ins=True, outs=True)
self.ins = self.init_index(ins=True, outs=False)
self.outs = self.init_index(ins=False, outs=True)
self.lookup(ins=True, outs=True)
self.lookup(ins=True, outs=False)
self.lookup(ins=False, outs=True)

@@ -303,74 +303,117 @@ def _numpyise(d, dtype):
return {k: np.array(v, dtype=dtype) for k, v in d.items()}


class AdjList:
def __init__(self, init_index):
self.init_index = init_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you talk a bit about the design choice of passing in a function here? I'm not saying it's wrong and I'm not sure there's a best answer at all, but I wonder if there's other versions that are a little clearer?

In particular, it feels a bit funny to have EdgeData store an AdjList object that holds a method from that parent EdgeData. An alternative might be something like

class AdjList:
    def __init__(self, by_other_node_type, ...):
        if by_other_node_type:
            self._init_func = self._init_by_other_node_type
        else:
            self._init_func = self._init_plain

        self.undirected = self.ins = self.outs = None

    def _init_plain(self, ...):
        ...

    def _init_by_other_node_type(self, ...):
        ...

I guess we'll need to pass more arguments around in this case.

Copy link
Contributor Author

@kjun9 kjun9 May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I wanted AdjList to handle the lookup + lazy initialising in one place, but thought it would be more convenient to define more adjacency-initialiser functions in EdgeData rather than in AdjList since I'm guessing it'll need access to all the type information in EdgeData and NodeData to build the adjacency lists with, so that's how I ended up with this approach.

Not sure if you had something else in mind with your suggestion, but I guess we could pass in the whole edge data and node data into the adjacency list to do whatever it needs to do... but maybe that also looks a bit funny?

class AdjList:
    def __init__(self, by_other_node_type, edge_data, node_data)
    def _init_plain(self, ...):
    def _init_by_other_node_type(self, ...):

class EdgeData:
    def __init__(self, ...):
        ...
        self._adj_list = AdjList(by_other_node_type=False, self, node_data)
        self._adj_list_by_other_node_type = AdjList(by_other_node_type=True, self, node_data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a bit funny too.

I thought about this overnight and I think my "concern" here is that the recursiveness of the data structure initialisation introduces risks of attempting to use a value while it's being initialised (e.g. if edge list initialisation happened to look at other edge lists, for some reason).

I guess the fully-separated version would be passing in sources, targets and node_data?

However, this may not be worth it, especially as edge list initialisation is:

  • limited enough that there doesn't seem to be a reason why it would look at itself
  • done as a final commit, rather than incremental construction (specifically, the self.undirected properties go from None to "fully ready", rather than being incrementally built up), and so we'll see infinite recursion here rather than invalid results

The main risk might be around delaying garbage collection (since this causes a cycle between objects, and so cPython will have to wait for the actual garbage collector to run, rather than objects being deallocated as soon as they go out of scope), and future refactoring that change either of the points above.

super().__init__(shared, type_starts)

# cache these columns to avoid having to do more method and dict look-ups
self.sources = self._column(SOURCE)
self.targets = self._column(TARGET)
self.weights = self._column(WEIGHT)

self.source_types = nodes.type_of_iloc(self.sources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to noticeably increase memory usage, both because:

  1. it's using type names not type ilocs
  2. it's two extra O(num edges) arrays that aren't shared with anything (i.e. they're new) but are stored permanently

The first can be fixed with:

Suggested change
self.source_types = nodes.type_of_iloc(self.sources)
self.source_types = nodes.type_ilocs[self.sources]

but the second is more problematic. Maybe these could be computed lazily too? (And/or, not even stored, just used transiently for initialising the adjacency lists?)

I think the benchmarks in the PR description aren't up to date with the current state? Maybe some new benchmarks could be reported, including the test_allocation_benchmark ones (to try to capture this addition)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yeah I don't think we need to store these at all actually. I'll run some new benchmarks afterwards

@@ -303,74 +303,117 @@ def _numpyise(d, dtype):
return {k: np.array(v, dtype=dtype) for k, v in d.items()}


class AdjList:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor, but I think this isn't quite an adjacency list, but actually something like:

Suggested change
class AdjList:
class LazyAdjLists:

@kjun9
Copy link
Contributor Author

kjun9 commented May 19, 2020

Update: I decided to kind of start-over by getting rid of the LazyAdjLists class for now and just add to the existing structure for adjacency lists. So the benchmarks for existing functionality other than methpath2vec walk should remain roughly unaffected, and we're just adding some new benchmarks for the new by_other_node_type parts.

Adjacency list construction times (should be roughly the same):

------------------------------------------------------------------------------------------------------ benchmark 'StellarGraph adjacency lists (time)': 18 tests -------------------------------------------------------------------------------------------------------
Name (time in us)                                                                Min                     Max                    Mean                 StdDev                  Median                    IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_adj_list[directed-100-200] (NOW)                             231.4470 (1.0)          637.8610 (1.0)          258.5698 (1.0)          42.2262 (1.00)         242.1290 (1.0)          13.7610 (1.0)       350;521  3,867.4271 (1.0)        3339           1
test_benchmark_adj_list[directed-100-200] (0042_cc15f17)                    232.1980 (1.00)         687.9200 (1.08)         258.8520 (1.00)         42.0826 (1.0)          242.9040 (1.00)         13.8745 (1.01)      351;519  3,863.2119 (1.00)       3301           1
test_benchmark_adj_list[undirected-100-200] (NOW)                           315.9070 (1.36)         900.4990 (1.41)         353.7579 (1.37)         61.2691 (1.46)         333.7215 (1.38)         26.9920 (1.96)      236;336  2,826.7916 (0.73)       2452           1
test_benchmark_adj_list[undirected-100-200] (0042_cc15f17)                  320.4990 (1.38)         825.2960 (1.29)         358.5158 (1.39)         51.7141 (1.23)         339.3480 (1.40)         23.2142 (1.69)      250;324  2,789.2772 (0.72)       2427           1
test_benchmark_adj_list[directed_by_other_node_type-100-200]                473.1610 (2.04)         977.0970 (1.53)         533.5859 (2.06)         66.9349 (1.59)         505.8120 (2.09)         46.6920 (3.39)      191;161  1,874.1127 (0.48)       1551           1
test_benchmark_adj_list[undirected_by_other_node_type-100-200]              583.8950 (2.52)       1,840.8670 (2.89)         692.0267 (2.68)        136.3381 (3.24)         640.6880 (2.65)        110.0740 (8.00)       118;88  1,445.0309 (0.37)       1122           1
test_benchmark_adj_list[directed-1000-5000] (NOW)                         4,700.7700 (20.31)      7,351.8230 (11.53)      5,140.5051 (19.88)       402.8736 (9.57)       5,016.1920 (20.72)       390.2730 (28.36)        21;9    194.5334 (0.05)        174           1
test_benchmark_adj_list[directed-1000-5000] (0042_cc15f17)                4,733.5760 (20.45)     14,525.9150 (22.77)      5,423.7449 (20.98)     1,129.3189 (26.84)      5,112.1680 (21.11)       474.9890 (34.52)       10;17    184.3745 (0.05)        172           1
test_benchmark_adj_list[undirected-1000-5000] (NOW)                       5,681.0990 (24.55)      7,905.0130 (12.39)      6,238.5514 (24.13)       431.1811 (10.25)      6,130.7970 (25.32)       491.5305 (35.72)        35;7    160.2936 (0.04)        149           1
test_benchmark_adj_list[undirected-1000-5000] (0042_cc15f17)              5,733.0330 (24.77)      8,448.8230 (13.25)      6,145.9833 (23.77)       366.6337 (8.71)       6,023.9110 (24.88)       456.8220 (33.20)        37;2    162.7079 (0.04)        161           1
test_benchmark_adj_list[directed_by_other_node_type-1000-5000]           10,009.7130 (43.25)     89,959.8790 (141.03)    11,933.1857 (46.15)     8,443.4651 (200.64)    10,914.3815 (45.08)       544.4775 (39.57)         1;6     83.7999 (0.02)         88           1
test_benchmark_adj_list[undirected_by_other_node_type-1000-5000]         13,573.0680 (58.64)     97,755.2520 (153.25)    22,233.7995 (85.99)    13,983.3121 (332.28)    17,866.9160 (73.79)     6,815.7660 (495.30)        3;5     44.9766 (0.01)         45           1
test_benchmark_adj_list[directed-20000-100000] (0042_cc15f17)           127,358.4270 (550.27)   226,152.3290 (354.55)   158,161.4499 (611.68)   46,364.9196 (>1000.0)  132,032.3320 (545.30)   73,978.2312 (>1000.0)       2;0      6.3227 (0.00)          7           1
test_benchmark_adj_list[directed-20000-100000] (NOW)                    127,907.2040 (552.64)   226,413.9760 (354.96)   157,614.2974 (609.56)   45,147.8899 (>1000.0)  134,704.9390 (556.34)   71,021.4352 (>1000.0)       2;0      6.3446 (0.00)          7           1
test_benchmark_adj_list[undirected-20000-100000] (NOW)                  176,357.6690 (761.98)   279,860.9100 (438.75)   228,874.5770 (885.16)   47,724.7421 (>1000.0)  224,443.8110 (926.96)   90,528.8710 (>1000.0)       2;0      4.3692 (0.00)          5           1
test_benchmark_adj_list[undirected-20000-100000] (0042_cc15f17)         178,570.7970 (771.54)   272,435.4760 (427.11)   216,208.0132 (836.17)   51,113.7061 (>1000.0)  179,460.4530 (741.18)   93,478.8123 (>1000.0)       2;0      4.6252 (0.00)          5           1
test_benchmark_adj_list[directed_by_other_node_type-20000-100000]       294,411.1100 (>1000.0)  400,097.0160 (627.25)   371,276.0022 (>1000.0)  43,407.9478 (>1000.0)  387,538.8160 (>1000.0)  32,097.7518 (>1000.0)       1;1      2.6934 (0.00)          5           1
test_benchmark_adj_list[undirected_by_other_node_type-20000-100000]     451,506.6990 (>1000.0)  564,240.5210 (884.58)   485,105.0646 (>1000.0)  46,503.0216 (>1000.0)  469,584.9610 (>1000.0)  53,450.1095 (>1000.0)       1;0      2.0614 (0.00)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Adjacency list allocations:

----------------------------------------------------------------------------------------------------------- benchmark 'StellarGraph adjacency lists (peak)': 12 tests -----------------------------------------------------------------------------------------------------------
Name (time in s)                                                                                Min                       Max                      Mean              StdDev                    Median               IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_allocation_benchmark_adj_list_peak[directed-100-200] (0036_cc15f17)                28,424.0000 (1.0)         28,724.0000 (1.0)         28,439.0000 (1.0)       67.0820 (1.38)        28,424.0000 (1.0)      0.0000 (1.0)           1;1  0.0000 (1.0)          20           1
test_allocation_benchmark_adj_list_peak[directed-100-200] (NOW)                         28,694.0000 (1.01)        29,012.0000 (1.01)        28,709.9000 (1.01)      71.1070 (1.47)        28,694.0000 (1.01)     0.0000 (1.0)           1;1  0.0000 (0.99)         20           1
test_allocation_benchmark_adj_list_peak[undirected-100-200] (0036_cc15f17)              55,754.0000 (1.96)        55,971.0000 (1.95)        55,764.8500 (1.96)      48.5227 (1.0)         55,754.0000 (1.96)     0.0000 (1.0)           1;1  0.0000 (0.51)         20           1
test_allocation_benchmark_adj_list_peak[undirected-100-200] (NOW)                       56,578.0000 (1.99)        56,795.0000 (1.98)        56,588.8500 (1.99)      48.5227 (1.0)         56,578.0000 (1.99)     0.0000 (1.0)           1;1  0.0000 (0.50)         20           1
test_allocation_benchmark_adj_list_peak[directed_by_other_node_type-100-200]            67,553.0000 (2.38)        68,103.0000 (2.37)        67,580.5000 (2.38)     122.9837 (2.53)        67,553.0000 (2.38)     0.0000 (1.0)           1;1  0.0000 (0.42)         20           1
test_allocation_benchmark_adj_list_peak[undirected_by_other_node_type-100-200]          94,760.0000 (3.33)        95,682.0000 (3.33)        94,806.1000 (3.33)     206.1655 (4.25)        94,760.0000 (3.33)     0.0000 (1.0)           1;1  0.0000 (0.30)         20           1
test_allocation_benchmark_adj_list_peak[directed-1000-5000] (NOW)                      518,237.0000 (18.23)      520,973.0000 (18.14)      518,373.8000 (18.23)    611.7882 (12.61)      518,237.0000 (18.23)    0.0000 (1.0)           1;1  0.0000 (0.05)         20           1
test_allocation_benchmark_adj_list_peak[directed-1000-5000] (0036_cc15f17)             518,273.0000 (18.23)      520,889.0000 (18.13)      518,403.8000 (18.23)    584.9554 (12.06)      518,273.0000 (18.23)    0.0000 (1.0)           1;1  0.0000 (0.05)         20           1
test_allocation_benchmark_adj_list_peak[undirected-1000-5000] (NOW)                    783,131.0000 (27.55)      784,335.0000 (27.31)      783,191.2000 (27.54)    269.2226 (5.55)       783,131.0000 (27.55)    0.0000 (1.0)           1;1  0.0000 (0.04)         20           1
test_allocation_benchmark_adj_list_peak[undirected-1000-5000] (0036_cc15f17)           784,145.0000 (27.59)      785,427.0000 (27.34)      784,209.1000 (27.58)    286.6639 (5.91)       784,145.0000 (27.59)    0.0000 (1.0)           1;1  0.0000 (0.04)         20           1
test_allocation_benchmark_adj_list_peak[directed_by_other_node_type-1000-5000]       1,291,283.0000 (45.43)    1,292,357.0000 (44.99)    1,291,336.7000 (45.41)    240.1537 (4.95)     1,291,283.0000 (45.43)    0.0000 (1.0)           1;1  0.0000 (0.02)         20           1
test_allocation_benchmark_adj_list_peak[undirected_by_other_node_type-1000-5000]     2,045,458.0000 (71.96)    2,046,632.0000 (71.25)    2,045,516.7000 (71.93)    262.5144 (5.41)     2,045,458.0000 (71.96)    0.0000 (1.0)           1;1  0.0000 (0.01)         20           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------- benchmark 'StellarGraph adjacency lists (size)': 12 tests ---------------------------------------------------------------------------------------------------------
Name (time in s)                                                                           Min                       Max                      Mean              StdDev                    Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_allocation_benchmark_adj_list[directed-100-200] (0036_cc15f17)                16,930.0000 (1.0)         17,296.0000 (1.0)         16,948.3000 (1.0)       81.8401 (1.69)        16,930.0000 (1.0)      0.0000 (1.0)           1;1  0.0001 (1.0)          20           1
test_allocation_benchmark_adj_list[directed-100-200] (NOW)                         17,082.0000 (1.01)        17,415.0000 (1.01)        17,098.6500 (1.01)      74.4611 (1.53)        17,082.0000 (1.01)     0.0000 (1.0)           1;1  0.0001 (0.99)         20           1
test_allocation_benchmark_adj_list[undirected-100-200] (0036_cc15f17)              27,561.0000 (1.63)        27,778.0000 (1.61)        27,571.8500 (1.63)      48.5227 (1.0)         27,561.0000 (1.63)     0.0000 (1.0)           1;1  0.0000 (0.61)         20           1
test_allocation_benchmark_adj_list[undirected-100-200] (NOW)                       31,179.0000 (1.84)        31,396.0000 (1.82)        31,193.4500 (1.84)      50.3132 (1.04)        31,179.0000 (1.84)     0.0000 (1.0)           2;2  0.0000 (0.54)         20           1
test_allocation_benchmark_adj_list[directed_by_other_node_type-100-200]            40,281.0000 (2.38)        40,807.0000 (2.36)        40,307.3000 (2.38)     117.6172 (2.42)        40,281.0000 (2.38)     0.0000 (1.0)           1;1  0.0000 (0.42)         20           1
test_allocation_benchmark_adj_list[undirected_by_other_node_type-100-200]          49,134.0000 (2.90)        50,056.0000 (2.89)        49,180.1000 (2.90)     206.1655 (4.25)        49,134.0000 (2.90)     0.0000 (1.0)           1;1  0.0000 (0.34)         20           1
test_allocation_benchmark_adj_list[directed-1000-5000] (NOW)                      174,498.0000 (10.31)      177,242.0000 (10.25)      174,635.2000 (10.30)    613.5771 (12.65)      174,498.0000 (10.31)    0.0000 (1.0)           1;1  0.0000 (0.10)         20           1
test_allocation_benchmark_adj_list[directed-1000-5000] (0036_cc15f17)             174,646.0000 (10.32)      177,246.0000 (10.25)      174,776.0000 (10.31)    581.3777 (11.98)      174,646.0000 (10.32)    0.0000 (1.0)           1;1  0.0000 (0.10)         20           1
test_allocation_benchmark_adj_list[undirected-1000-5000] (0036_cc15f17)           331,026.0000 (19.55)      332,302.0000 (19.21)      331,089.8000 (19.54)    285.3223 (5.88)       331,026.0000 (19.55)    0.0000 (1.0)           1;1  0.0000 (0.05)         20           1
test_allocation_benchmark_adj_list[undirected-1000-5000] (NOW)                    331,118.0000 (19.56)      332,274.0000 (19.21)      331,175.8000 (19.54)    258.4895 (5.33)       331,118.0000 (19.56)    0.0000 (1.0)           1;1  0.0000 (0.05)         20           1
test_allocation_benchmark_adj_list[directed_by_other_node_type-1000-5000]         609,068.0000 (35.98)      610,052.0000 (35.27)      609,117.2000 (35.94)    220.0291 (4.53)       609,068.0000 (35.98)    0.0000 (1.0)           1;1  0.0000 (0.03)         20           1
test_allocation_benchmark_adj_list[undirected_by_other_node_type-1000-5000]     1,000,528.0000 (59.10)    1,001,702.0000 (57.92)    1,000,586.7000 (59.04)    262.5144 (5.41)     1,000,528.0000 (59.10)    0.0000 (1.0)           1;1  0.0000 (0.02)         20           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Metapath walk (I ran the test in this branch against the code in develop, since I tweaked the test slightly to look at how times change as the graph gets larger/more heterogeneous). Looks like the speedup roughly ranges between 2x - 3x?

----------------------------------------------------------------------------------------------------------- benchmark: 36 tests -----------------------------------------------------------------------------------------------------------
Name (time in ms)                                                           Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_uniformrandommetapathwalk[10-4-500] (NOW)                 5.8106 (1.0)        9.4860 (1.11)       6.4435 (1.0)      0.5753 (1.38)       6.2735 (1.0)      0.6167 (1.19)         18;5  155.1962 (1.0)         128           1
test_benchmark_uniformrandommetapathwalk[10-3-500] (NOW)                 6.1243 (1.05)       8.5171 (1.0)        6.7221 (1.04)     0.4182 (1.0)        6.6243 (1.06)     0.5196 (1.0)          30;4  148.7619 (0.96)        126           1
test_benchmark_uniformrandommetapathwalk[10-4-1000] (NOW)                6.5450 (1.13)       8.7405 (1.03)       7.2068 (1.12)     0.4718 (1.13)       7.1279 (1.14)     0.6448 (1.24)         34;2  138.7576 (0.89)        111           1
test_benchmark_uniformrandommetapathwalk[10-3-1000] (NOW)                6.8049 (1.17)      11.5971 (1.36)       7.4974 (1.16)     0.6468 (1.55)       7.3635 (1.17)     0.6780 (1.30)         18;5  133.3789 (0.86)        108           1
test_benchmark_uniformrandommetapathwalk[10-3-2000] (NOW)                6.8252 (1.17)      13.1113 (1.54)       7.8623 (1.22)     1.0146 (2.43)       7.5698 (1.21)     0.7570 (1.46)         10;7  127.1900 (0.82)         92           1
test_benchmark_uniformrandommetapathwalk[10-2-1000] (NOW)                6.8725 (1.18)       9.8609 (1.16)       7.6781 (1.19)     0.5818 (1.39)       7.5195 (1.20)     0.6795 (1.31)         23;6  130.2401 (0.84)        111           1
test_benchmark_uniformrandommetapathwalk[10-2-500] (NOW)                 6.8865 (1.19)      11.3546 (1.33)       7.6187 (1.18)     0.6550 (1.57)       7.4407 (1.19)     0.6199 (1.19)         19;6  131.2567 (0.85)        120           1
test_benchmark_uniformrandommetapathwalk[10-4-2000] (NOW)                6.9626 (1.20)       9.6699 (1.14)       7.6233 (1.18)     0.5078 (1.21)       7.5368 (1.20)     0.6490 (1.25)         16;1  131.1767 (0.85)         69           1
test_benchmark_uniformrandommetapathwalk[10-2-2000] (NOW)                6.9762 (1.20)      11.9028 (1.40)       7.7711 (1.21)     0.7376 (1.76)       7.6045 (1.21)     0.7013 (1.35)         10;3  128.6827 (0.83)         95           1
test_benchmark_uniformrandommetapathwalk[10-4-500] (0040_cc15f17)       11.2608 (1.94)      14.6914 (1.72)      12.2768 (1.91)     0.6632 (1.59)      12.1402 (1.94)     0.8321 (1.60)         23;2   81.4544 (0.52)         79           1
test_benchmark_uniformrandommetapathwalk[10-3-500] (0040_cc15f17)       12.6508 (2.18)      27.1551 (3.19)      14.4338 (2.24)     2.5522 (6.10)      13.7170 (2.19)     1.0274 (1.98)          6;7   69.2820 (0.45)         68           1
test_benchmark_uniformrandommetapathwalk[10-4-1000] (0040_cc15f17)      13.8929 (2.39)      19.6450 (2.31)      15.2326 (2.36)     0.9624 (2.30)      15.0950 (2.41)     0.8085 (1.56)         12;2   65.6488 (0.42)         59           1
test_benchmark_uniformrandommetapathwalk[10-2-500] (0040_cc15f17)       15.5431 (2.67)      20.9874 (2.46)      17.2136 (2.67)     1.0308 (2.46)      16.9838 (2.71)     0.9730 (1.87)         13;3   58.0937 (0.37)         57           1
test_benchmark_uniformrandommetapathwalk[10-3-1000] (0040_cc15f17)      16.4838 (2.84)      22.0240 (2.59)      18.1795 (2.82)     1.1191 (2.68)      17.9888 (2.87)     1.4695 (2.83)         13;1   55.0072 (0.35)         49           1
test_benchmark_uniformrandommetapathwalk[10-4-2000] (0040_cc15f17)      17.4925 (3.01)      20.8271 (2.45)      18.9591 (2.94)     0.7609 (1.82)      18.8170 (3.00)     0.8590 (1.65)         14;1   52.7452 (0.34)         45           1
test_benchmark_uniformrandommetapathwalk[10-2-1000] (0040_cc15f17)      18.1114 (3.12)      22.8011 (2.68)      19.3948 (3.01)     1.0059 (2.41)      19.2194 (3.06)     1.2589 (2.42)         13;1   51.5601 (0.33)         47           1
test_benchmark_uniformrandommetapathwalk[10-3-2000] (0040_cc15f17)      18.7091 (3.22)      27.2711 (3.20)      21.0154 (3.26)     1.8865 (4.51)      20.7958 (3.31)     1.7701 (3.41)          9;3   47.5841 (0.31)         44           1
test_benchmark_uniformrandommetapathwalk[10-2-2000] (0040_cc15f17)      20.8818 (3.59)      27.9140 (3.28)      22.6180 (3.51)     1.2499 (2.99)      22.3012 (3.55)     1.0038 (1.93)          7;2   44.2126 (0.28)         39           1
test_benchmark_uniformrandommetapathwalk[50-4-500] (NOW)                24.3835 (4.20)      29.1655 (3.42)      26.3123 (4.08)     1.1441 (2.74)      26.1270 (4.16)     1.4407 (2.77)         11;1   38.0051 (0.24)         36           1
test_benchmark_uniformrandommetapathwalk[50-4-1000] (NOW)               31.0475 (5.34)      36.3336 (4.27)      32.7223 (5.08)     1.1051 (2.64)      32.5729 (5.19)     1.5332 (2.95)          6;1   30.5602 (0.20)         28           1
test_benchmark_uniformrandommetapathwalk[50-3-500] (NOW)                31.9296 (5.50)      38.7202 (4.55)      33.5805 (5.21)     1.5277 (3.65)      33.2236 (5.30)     1.6323 (3.14)          4;2   29.7792 (0.19)         27           1
test_benchmark_uniformrandommetapathwalk[50-4-2000] (NOW)               31.9304 (5.50)      35.6491 (4.19)      34.3374 (5.33)     0.9837 (2.35)      34.6116 (5.52)     1.2645 (2.43)          7;1   29.1228 (0.19)         24           1
test_benchmark_uniformrandommetapathwalk[50-3-1000] (NOW)               32.7119 (5.63)      39.6607 (4.66)      34.7665 (5.40)     1.3730 (3.28)      34.6718 (5.53)     1.2141 (2.34)          6;2   28.7633 (0.19)         26           1
test_benchmark_uniformrandommetapathwalk[50-2-500] (NOW)                33.3727 (5.74)      37.2157 (4.37)      35.0011 (5.43)     0.8976 (2.15)      35.0400 (5.59)     1.2386 (2.38)          8;0   28.5706 (0.18)         27           1
test_benchmark_uniformrandommetapathwalk[50-3-2000] (NOW)               33.4652 (5.76)      37.6337 (4.42)      35.0256 (5.44)     0.9944 (2.38)      34.8666 (5.56)     1.1132 (2.14)          5;2   28.5505 (0.18)         26           1
test_benchmark_uniformrandommetapathwalk[50-2-1000] (NOW)               33.5675 (5.78)      37.9009 (4.45)      35.3018 (5.48)     1.0373 (2.48)      35.3584 (5.64)     1.2893 (2.48)          7;1   28.3271 (0.18)         25           1
test_benchmark_uniformrandommetapathwalk[50-2-2000] (NOW)               34.0837 (5.87)      39.6224 (4.65)      35.8334 (5.56)     1.1626 (2.78)      35.7461 (5.70)     1.1165 (2.15)          5;2   27.9069 (0.18)         24           1
test_benchmark_uniformrandommetapathwalk[50-4-500] (0040_cc15f17)       54.0867 (9.31)      59.4685 (6.98)      55.9395 (8.68)     1.3827 (3.31)      55.5902 (8.86)     1.9714 (3.79)          5;0   17.8764 (0.12)         18           1
test_benchmark_uniformrandommetapathwalk[50-3-500] (0040_cc15f17)       69.6907 (11.99)     76.1763 (8.94)      72.3166 (11.22)    1.8046 (4.31)      72.1359 (11.50)    2.1411 (4.12)          4;0   13.8281 (0.09)         14           1
test_benchmark_uniformrandommetapathwalk[50-4-1000] (0040_cc15f17)      78.8268 (13.57)     84.8072 (9.96)      82.0019 (12.73)    1.8392 (4.40)      82.0519 (13.08)    2.8974 (5.58)          5;0   12.1948 (0.08)         12           1
test_benchmark_uniformrandommetapathwalk[50-2-500] (0040_cc15f17)       83.8906 (14.44)     88.6003 (10.40)     85.3964 (13.25)    1.1761 (2.81)      85.2431 (13.59)    0.9288 (1.79)          2;1   11.7101 (0.08)         12           1
test_benchmark_uniformrandommetapathwalk[50-3-1000] (0040_cc15f17)      85.2421 (14.67)     90.7248 (10.65)     87.8713 (13.64)    1.8624 (4.45)      87.8735 (14.01)    3.1376 (6.04)          5;0   11.3803 (0.07)         11           1
test_benchmark_uniformrandommetapathwalk[50-4-2000] (0040_cc15f17)      90.9771 (15.66)     98.0309 (11.51)     93.9101 (14.57)    2.1096 (5.04)      93.4466 (14.90)    2.8223 (5.43)          5;0   10.6485 (0.07)         11           1
test_benchmark_uniformrandommetapathwalk[50-2-1000] (0040_cc15f17)      92.2929 (15.88)     96.8610 (11.37)     94.0338 (14.59)    1.2186 (2.91)      94.0808 (15.00)    1.2658 (2.44)          3;1   10.6345 (0.07)         11           1
test_benchmark_uniformrandommetapathwalk[50-3-2000] (0040_cc15f17)      96.9583 (16.69)    103.1501 (12.11)     99.7080 (15.47)    2.4264 (5.80)      99.4320 (15.85)    4.4427 (8.55)          4;0   10.0293 (0.06)         10           1
test_benchmark_uniformrandommetapathwalk[50-2-2000] (0040_cc15f17)     107.2951 (18.47)    115.4023 (13.55)    111.2413 (17.26)    2.7680 (6.62)     110.2651 (17.58)    4.7315 (9.11)          4;0    8.9895 (0.06)          9           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@kjun9
Copy link
Contributor Author

kjun9 commented May 21, 2020

I've switched to contiguous arrays like #1296 - took me a little while to understand how to do it.. 😅 I did try to take out a few small functions to share with the existing code for constructing the adjacency lists, but _create_undirected_adj_lists_by_other_node_type currently still looks a bit complicated

New benchmarks after switching to contiguous arrays:

Metapath2vec walk

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_uniformrandommetapathwalk[10-3-500] (NOW)                 9.8704 (1.0)       13.4906 (1.0)       11.0875 (1.0)      0.7382 (1.05)      11.0106 (1.0)       1.0323 (1.25)         29;2  90.1916 (1.0)          89           1
test_benchmark_uniformrandommetapathwalk[10-4-500] (NOW)                11.4017 (1.16)      17.5578 (1.30)      13.4410 (1.21)     1.3345 (1.89)      13.1116 (1.19)      1.7548 (2.12)         16;2  74.3994 (0.82)         60           1
test_benchmark_uniformrandommetapathwalk[10-2-500] (NOW)                11.4081 (1.16)      18.9774 (1.41)      12.9319 (1.17)     1.2247 (1.74)      12.7304 (1.16)      1.1039 (1.33)         17;4  77.3284 (0.86)         76           1
test_benchmark_uniformrandommetapathwalk[10-2-1000] (NOW)               11.4775 (1.16)      15.0860 (1.12)      12.5705 (1.13)     0.7391 (1.05)      12.3893 (1.13)      1.0939 (1.32)         24;1  79.5516 (0.88)         76           1
test_benchmark_uniformrandommetapathwalk[10-3-1000] (NOW)               11.4792 (1.16)      17.8878 (1.33)      13.0479 (1.18)     1.1452 (1.63)      12.7832 (1.16)      1.0229 (1.23)         12;6  76.6408 (0.85)         76           1
test_benchmark_uniformrandommetapathwalk[10-2-2000] (NOW)               11.4887 (1.16)      16.3850 (1.21)      12.8020 (1.15)     0.9154 (1.30)      12.6393 (1.15)      1.2218 (1.47)         18;1  78.1128 (0.87)         69           1
test_benchmark_uniformrandommetapathwalk[10-4-2000] (NOW)               11.7484 (1.19)      21.2716 (1.58)      14.0348 (1.27)     1.8677 (2.65)      13.5541 (1.23)      2.1136 (2.55)         11;3  71.2514 (0.79)         63           1
test_benchmark_uniformrandommetapathwalk[10-3-2000] (NOW)               12.1010 (1.23)      28.2123 (2.09)      19.0184 (1.72)     3.5185 (5.00)      19.4636 (1.77)      3.8575 (4.65)         22;2  52.5807 (0.58)         74           1
test_benchmark_uniformrandommetapathwalk[10-4-1000] (NOW)               13.8436 (1.40)      25.0714 (1.86)      17.4340 (1.57)     1.8478 (2.62)      17.2061 (1.56)      1.6533 (1.99)         12;3  57.3590 (0.64)         50           1
test_benchmark_uniformrandommetapathwalk[10-4-500] (0046_cd41384)       15.8208 (1.60)      24.6789 (1.83)      17.3714 (1.57)     1.6221 (2.30)      16.8680 (1.53)      1.0430 (1.26)          6;5  57.5660 (0.64)         52           1
test_benchmark_uniformrandommetapathwalk[10-3-500] (0046_cd41384)       18.9521 (1.92)      23.6497 (1.75)      20.4179 (1.84)     1.0053 (1.43)      20.2320 (1.84)      1.1920 (1.44)         12;2  48.9767 (0.54)         45           1
test_benchmark_uniformrandommetapathwalk[10-4-1000] (0046_cd41384)      20.2379 (2.05)      24.6014 (1.82)      21.9907 (1.98)     0.9296 (1.32)      21.9236 (1.99)      0.9018 (1.09)         12;2  45.4737 (0.50)         41           1
test_benchmark_uniformrandommetapathwalk[10-2-500] (0046_cd41384)       22.5255 (2.28)      26.8943 (1.99)      24.2128 (2.18)     0.9924 (1.41)      24.2159 (2.20)      1.1919 (1.44)         12;1  41.3005 (0.46)         41           1
test_benchmark_uniformrandommetapathwalk[10-3-1000] (0046_cd41384)      23.1060 (2.34)      26.6813 (1.98)      24.7956 (2.24)     0.9113 (1.29)      24.7760 (2.25)      1.0807 (1.30)         11;0  40.3297 (0.45)         40           1
test_benchmark_uniformrandommetapathwalk[10-2-1000] (0046_cd41384)      24.3782 (2.47)      28.9344 (2.14)      25.9133 (2.34)     1.1302 (1.60)      25.7071 (2.33)      1.2305 (1.48)         13;2  38.5902 (0.43)         36           1
test_benchmark_uniformrandommetapathwalk[10-4-2000] (0046_cd41384)      24.7837 (2.51)      29.9705 (2.22)      26.1244 (2.36)     1.1029 (1.57)      25.9406 (2.36)      0.8373 (1.01)          6;3  38.2784 (0.42)         38           1
test_benchmark_uniformrandommetapathwalk[10-3-2000] (0046_cd41384)      25.5129 (2.58)      31.6947 (2.35)      27.4617 (2.48)     1.4286 (2.03)      27.5091 (2.50)      1.8986 (2.29)         11;1  36.4144 (0.40)         35           1
test_benchmark_uniformrandommetapathwalk[10-2-2000] (0046_cd41384)      27.3576 (2.77)      33.1049 (2.45)      29.2116 (2.63)     1.2305 (1.75)      28.8563 (2.62)      1.1923 (1.44)          6;2  34.2329 (0.38)         32           1
test_benchmark_uniformrandommetapathwalk[50-4-500] (NOW)                44.7374 (4.53)      64.9497 (4.81)      54.5772 (4.92)     8.0176 (11.38)     53.0025 (4.81)     15.4557 (18.65)         8;0  18.3227 (0.20)         16           1
test_benchmark_uniformrandommetapathwalk[50-3-500] (NOW)                52.1458 (5.28)      76.9201 (5.70)      62.0943 (5.60)     9.9459 (14.12)     56.9634 (5.17)     19.1126 (23.06)         5;0  16.1046 (0.18)         16           1
test_benchmark_uniformrandommetapathwalk[50-4-1000] (NOW)               55.4521 (5.62)      80.9966 (6.00)      61.2577 (5.52)     8.9198 (12.66)     56.6902 (5.15)      4.8868 (5.90)          3;3  16.3245 (0.18)         18           1
test_benchmark_uniformrandommetapathwalk[50-3-1000] (NOW)               56.6112 (5.74)      82.8767 (6.14)      62.1955 (5.61)     7.5411 (10.71)     59.3717 (5.39)      4.4207 (5.33)          2;2  16.0783 (0.18)         18           1
test_benchmark_uniformrandommetapathwalk[50-3-2000] (NOW)               57.8079 (5.86)      67.9448 (5.04)      60.4088 (5.45)     2.5104 (3.56)      59.5477 (5.41)      2.6652 (3.22)          3;1  16.5539 (0.18)         16           1
test_benchmark_uniformrandommetapathwalk[50-4-2000] (NOW)               58.0146 (5.88)      83.6341 (6.20)      66.4221 (5.99)     9.3006 (13.20)     61.7044 (5.60)     13.4114 (16.18)         2;0  15.0552 (0.17)         12           1
test_benchmark_uniformrandommetapathwalk[50-2-2000] (NOW)               58.4879 (5.93)      69.9393 (5.18)      61.0031 (5.50)     3.0216 (4.29)      60.4804 (5.49)      2.1000 (2.53)          1;1  16.3926 (0.18)         12           1
test_benchmark_uniformrandommetapathwalk[50-2-1000] (NOW)               58.8696 (5.96)      81.0091 (6.00)      64.9659 (5.86)     7.2315 (10.27)     62.0181 (5.63)      6.8114 (8.22)          3;2  15.3927 (0.17)         16           1
test_benchmark_uniformrandommetapathwalk[50-2-500] (NOW)                59.1837 (6.00)      76.7491 (5.69)      66.4931 (6.00)     6.2903 (8.93)      64.5412 (5.86)     11.6325 (14.04)         6;0  15.0392 (0.17)         16           1
test_benchmark_uniformrandommetapathwalk[50-4-500] (0046_cd41384)       82.9248 (8.40)      87.2365 (6.47)      85.4376 (7.71)     1.2288 (1.74)      85.6426 (7.78)      1.7431 (2.10)          3;0  11.7045 (0.13)         12           1
test_benchmark_uniformrandommetapathwalk[50-3-500] (0046_cd41384)       97.7590 (9.90)     104.6571 (7.76)     100.8204 (9.09)     2.2971 (3.26)     100.0851 (9.09)      3.6840 (4.45)          3;0   9.9186 (0.11)         10           1
test_benchmark_uniformrandommetapathwalk[50-4-1000] (0046_cd41384)     104.3067 (10.57)    110.5014 (8.19)     107.9361 (9.73)     2.0195 (2.87)     108.2626 (9.83)      2.9150 (3.52)          4;0   9.2647 (0.10)         10           1
test_benchmark_uniformrandommetapathwalk[50-3-1000] (0046_cd41384)     117.2997 (11.88)    122.5132 (9.08)     119.6024 (10.79)    1.8246 (2.59)     120.1795 (10.91)     3.0525 (3.68)          3;0   8.3610 (0.09)          9           1
test_benchmark_uniformrandommetapathwalk[50-2-500] (0046_cd41384)      118.2874 (11.98)    120.6687 (8.94)     119.2230 (10.75)    0.7044 (1.0)      119.3571 (10.84)     0.8288 (1.0)           2;0   8.3876 (0.09)          9           1
test_benchmark_uniformrandommetapathwalk[50-4-2000] (0046_cd41384)     123.7497 (12.54)    129.2173 (9.58)     125.8153 (11.35)    1.6711 (2.37)     125.4899 (11.40)     1.6654 (2.01)          2;1   7.9482 (0.09)          8           1
test_benchmark_uniformrandommetapathwalk[50-2-1000] (0046_cd41384)     124.5510 (12.62)    132.2876 (9.81)     128.5747 (11.60)    2.9129 (4.14)     128.6809 (11.69)     4.9707 (6.00)          3;0   7.7776 (0.09)          8           1
test_benchmark_uniformrandommetapathwalk[50-3-2000] (0046_cd41384)     130.0331 (13.17)    134.7002 (9.98)     132.2468 (11.93)    1.5553 (2.21)     131.9357 (11.98)     2.3324 (2.81)          2;0   7.5616 (0.08)          8           1
test_benchmark_uniformrandommetapathwalk[50-2-2000] (0046_cd41384)     141.6583 (14.35)    146.4903 (10.86)    143.9868 (12.99)    1.6506 (2.34)     144.2513 (13.10)     2.3606 (2.85)          2;0   6.9451 (0.08)          7           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Adjacency list construction times

---------------------------------------------------------------------------------------------------- benchmark 'StellarGraph adjacency lists (time)': 18 tests ----------------------------------------------------------------------------------------------------
Name (time in us)                                                               Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers          OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_adj_list[undirected-100-200] (0048_cd41384)                  24.5480 (1.0)         294.3800 (1.11)         32.3052 (1.0)         16.4745 (1.05)         27.7660 (1.0)          4.9880 (2.45)     654;1219  30,954.8139 (1.0)       11373           1
test_benchmark_adj_list[undirected-100-200] (NOW)                           30.4880 (1.24)        382.8110 (1.45)         37.2370 (1.15)        15.6660 (1.0)          33.2300 (1.20)         2.0370 (1.0)      749;1577  26,855.0183 (0.87)      11578           1
test_benchmark_adj_list[directed-100-200] (0048_cd41384)                    48.8760 (1.99)      3,947.6520 (14.90)        77.9877 (2.41)        76.9466 (4.91)         63.3635 (2.28)        30.3660 (14.91)     113;190  12,822.5266 (0.41)       3146           1
test_benchmark_adj_list[directed-100-200] (NOW)                             49.5970 (2.02)        264.9040 (1.0)          58.5081 (1.81)        18.5747 (1.19)         52.7800 (1.90)         3.5103 (1.72)      355;716  17,091.6572 (0.55)       4485           1
test_benchmark_adj_list[directed_by_other_node_type-100-200]               170.3110 (6.94)        595.8370 (2.25)        193.0913 (5.98)        37.5174 (2.39)        182.1765 (6.56)        11.4760 (5.63)      237;403   5,178.8963 (0.17)       2838           1
test_benchmark_adj_list[undirected_by_other_node_type-100-200]             190.0620 (7.74)        484.7940 (1.83)        210.4009 (6.51)        31.1211 (1.99)        200.3320 (7.22)        12.8848 (6.33)      175;260   4,752.8310 (0.15)       2055           1
test_benchmark_adj_list[undirected-1000-5000] (0048_cd41384)               490.7640 (19.99)     4,467.5190 (16.86)       560.8511 (17.36)      140.8498 (8.99)        528.7230 (19.04)       55.5720 (27.28)      83;147   1,783.0043 (0.06)       1791           1
test_benchmark_adj_list[undirected-1000-5000] (NOW)                        492.8560 (20.08)     1,419.8710 (5.36)        558.8858 (17.30)       92.5383 (5.91)        526.6060 (18.97)       45.8270 (22.50)     189;232   1,789.2745 (0.06)       1874           1
test_benchmark_adj_list[directed-1000-5000] (0048_cd41384)                 757.4370 (30.86)     1,749.9320 (6.61)        905.4496 (28.03)      156.1302 (9.97)        846.4790 (30.49)      168.5962 (82.77)      150;74   1,104.4237 (0.04)       1021           1
test_benchmark_adj_list[directed-1000-5000] (NOW)                          763.3580 (31.10)     7,888.8580 (29.78)       885.9209 (27.42)      345.1872 (22.03)       824.9655 (29.71)      101.1240 (49.64)       18;72   1,128.7689 (0.04)       1106           1
test_benchmark_adj_list[undirected_by_other_node_type-1000-5000]         1,299.5350 (52.94)     2,612.8160 (9.86)      1,417.8116 (43.89)      147.2268 (9.40)      1,376.2400 (49.57)       85.7830 (42.11)       63;66     705.3123 (0.02)        702           1
test_benchmark_adj_list[directed_by_other_node_type-1000-5000]           1,444.1410 (58.83)     2,396.5890 (9.05)      1,593.9811 (49.34)      141.4838 (9.03)      1,545.4120 (55.66)       95.0855 (46.68)       60;51     627.3600 (0.02)        529           1
test_benchmark_adj_list[undirected-20000-100000] (NOW)                  13,148.9710 (535.64)   26,388.4510 (99.62)    17,896.5498 (553.98)   4,053.7007 (258.76)   17,073.9930 (614.92)   6,290.1110 (>1000.0)      16;0      55.8767 (0.00)         43           1
test_benchmark_adj_list[undirected-20000-100000] (0048_cd41384)         13,165.5930 (536.32)   21,529.8700 (81.27)    13,802.7064 (427.26)   1,054.7727 (67.33)    13,615.2810 (490.36)     477.4575 (234.39)        2;3      72.4496 (0.00)         65           1
test_benchmark_adj_list[directed-20000-100000] (0048_cd41384)           20,073.5320 (817.73)   68,412.7400 (258.25)   24,364.9066 (754.21)   8,489.6125 (541.91)   21,277.4305 (766.31)   3,545.3960 (>1000.0)       4;5      41.0426 (0.00)         46           1
test_benchmark_adj_list[directed-20000-100000] (NOW)                    20,092.9070 (818.52)   22,822.2610 (86.15)    21,363.1302 (661.29)     636.8507 (40.65)    21,381.1560 (770.05)     593.6295 (291.42)       14;5      46.8096 (0.00)         43           1
test_benchmark_adj_list[undirected_by_other_node_type-20000-100000]     26,283.7130 (>1000.0)  32,821.3140 (123.90)   27,349.3274 (846.59)   1,235.1034 (78.84)    27,087.4120 (975.56)     924.2280 (453.72)        2;2      36.5640 (0.00)         34           1
test_benchmark_adj_list[directed_by_other_node_type-20000-100000]       33,165.1700 (>1000.0)  36,872.6090 (139.19)   34,440.3798 (>1000.0)  1,118.2480 (71.38)    34,059.4760 (>1000.0)    847.0395 (415.83)        8;5      29.0357 (0.00)         27           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Adjacency list allocations

------------------------------------------------------------------------------------------------------- benchmark 'StellarGraph adjacency lists (peak)': 12 tests -------------------------------------------------------------------------------------------------------
Name (time in s)                                                                              Min                     Max                    Mean              StdDev                  Median               IQR            Outliers     OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_allocation_benchmark_adj_list_peak[undirected-100-200] (NOW)                      7,651.0000 (1.0)        7,915.0000 (1.0)        7,664.2000 (1.0)       59.0322 (inf)        7,651.0000 (1.0)      0.0000 (1.0)           1;1  0.0001 (1.0)          20           1
test_allocation_benchmark_adj_list_peak[undirected-100-200] (0050_cd41384)             7,731.0000 (1.01)       7,995.0000 (1.01)       7,744.2000 (1.01)      59.0322 (inf)        7,731.0000 (1.01)     0.0000 (1.0)           1;1  0.0001 (0.99)         20           1
test_allocation_benchmark_adj_list_peak[undirected_by_other_node_type-100-200]         8,747.0000 (1.14)       9,772.0000 (1.23)       8,798.2500 (1.15)     229.1970 (inf)        8,747.0000 (1.14)     0.0000 (1.0)           1;1  0.0001 (0.87)         20           1
test_allocation_benchmark_adj_list_peak[directed-100-200] (NOW)                        8,752.0000 (1.14)       8,752.0000 (1.11)       8,752.0000 (1.14)       0.0000 (1.0)        8,752.0000 (1.14)     0.0000 (1.0)           0;0  0.0001 (0.88)         20           1
test_allocation_benchmark_adj_list_peak[directed-100-200] (0050_cd41384)               8,912.0000 (1.16)       8,912.0000 (1.13)       8,912.0000 (1.16)       0.0000 (1.0)        8,912.0000 (1.16)     0.0000 (1.0)           0;0  0.0001 (0.86)         20           1
test_allocation_benchmark_adj_list_peak[directed_by_other_node_type-100-200]           8,912.0000 (1.16)       8,912.0000 (1.13)       8,912.0000 (1.16)       0.0000 (1.0)        8,912.0000 (1.16)     0.0000 (1.0)           0;0  0.0001 (0.86)         20           1
test_allocation_benchmark_adj_list_peak[undirected-1000-5000] (NOW)                   64,903.0000 (8.48)      64,967.0000 (8.21)      64,906.2000 (8.47)      14.3108 (inf)       64,903.0000 (8.48)     0.0000 (1.0)           1;1  0.0000 (0.12)         20           1
test_allocation_benchmark_adj_list_peak[undirected-1000-5000] (0050_cd41384)          72,983.0000 (9.54)      73,047.0000 (9.23)      72,986.2000 (9.52)      14.3108 (inf)       72,983.0000 (9.54)     0.0000 (1.0)           1;1  0.0000 (0.11)         20           1
test_allocation_benchmark_adj_list_peak[undirected_by_other_node_type-1000-5000]     106,091.0000 (13.87)    106,891.0000 (13.50)    106,131.0000 (13.85)    178.8854 (inf)      106,091.0000 (13.87)    0.0000 (1.0)           1;1  0.0000 (0.07)         20           1
test_allocation_benchmark_adj_list_peak[directed-1000-5000] (NOW)                    120,428.0000 (15.74)    120,428.0000 (15.22)    120,428.0000 (15.71)      0.0000 (1.0)      120,428.0000 (15.74)    0.0000 (1.0)           0;0  0.0000 (0.06)         20           1
test_allocation_benchmark_adj_list_peak[directed-1000-5000] (0050_cd41384)           125,588.0000 (16.41)    125,588.0000 (15.87)    125,588.0000 (16.39)      0.0000 (1.0)      125,588.0000 (16.41)    0.0000 (1.0)           0;0  0.0000 (0.06)         20           1
test_allocation_benchmark_adj_list_peak[directed_by_other_node_type-1000-5000]       163,517.0000 (21.37)    163,517.0000 (20.66)    163,517.0000 (21.34)      0.0000 (1.0)      163,517.0000 (21.37)    0.0000 (1.0)           0;0  0.0000 (0.05)         20           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------- benchmark 'StellarGraph adjacency lists (size)': 12 tests ---------------------------------------------------------------------------------------------------
Name (time in s)                                                                        Min                    Max                   Mean              StdDev                 Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_allocation_benchmark_adj_list[directed-100-200] (NOW)                         672.0000 (1.0)       1,706.0000 (1.13)        723.7000 (1.0)      231.2094 (64.62)       672.0000 (1.0)      0.0000 (1.0)           1;1  0.0014 (1.0)          20           1
test_allocation_benchmark_adj_list[directed-100-200] (0050_cd41384)                688.0000 (1.02)      1,706.0000 (1.13)        738.9000 (1.02)     227.6317 (63.62)       688.0000 (1.02)     0.0000 (1.0)           1;1  0.0014 (0.98)         20           1
test_allocation_benchmark_adj_list[undirected-100-200] (0050_cd41384)              840.0000 (1.25)      1,506.0000 (1.0)         873.3000 (1.21)     148.9221 (41.62)       840.0000 (1.25)     0.0000 (1.0)           1;1  0.0011 (0.83)         20           1
test_allocation_benchmark_adj_list[undirected-100-200] (NOW)                       840.0000 (1.25)      1,506.0000 (1.0)         873.3000 (1.21)     148.9221 (41.62)       840.0000 (1.25)     0.0000 (1.0)           1;1  0.0011 (0.83)         20           1
test_allocation_benchmark_adj_list[directed_by_other_node_type-100-200]          1,592.0000 (2.37)      3,278.0000 (2.18)      1,676.3000 (2.32)     377.0011 (105.38)    1,592.0000 (2.37)     0.0000 (1.0)           1;1  0.0006 (0.43)         20           1
test_allocation_benchmark_adj_list[undirected_by_other_node_type-100-200]        2,973.0000 (4.42)      4,560.0000 (3.03)      3,052.3500 (4.22)     354.8640 (99.19)     2,973.0000 (4.42)     0.0000 (1.0)           1;1  0.0003 (0.24)         20           1
test_allocation_benchmark_adj_list[directed-1000-5000] (NOW)                    22,674.0000 (33.74)    22,706.0000 (15.08)    22,675.6000 (31.33)      7.1554 (2.00)     22,674.0000 (33.74)    0.0000 (1.0)           1;1  0.0000 (0.03)         20           1
test_allocation_benchmark_adj_list[directed-1000-5000] (0050_cd41384)           22,690.0000 (33.76)    22,706.0000 (15.08)    22,690.8000 (31.35)      3.5777 (1.0)      22,690.0000 (33.76)    0.0000 (1.0)           1;1  0.0000 (0.03)         20           1
test_allocation_benchmark_adj_list[undirected-1000-5000] (0050_cd41384)         24,844.0000 (36.97)    24,908.0000 (16.54)    24,847.2000 (34.33)     14.3108 (4.00)     24,844.0000 (36.97)    0.0000 (1.0)           1;1  0.0000 (0.03)         20           1
test_allocation_benchmark_adj_list[undirected-1000-5000] (NOW)                  24,844.0000 (36.97)    24,908.0000 (16.54)    24,847.2000 (34.33)     14.3108 (4.00)     24,844.0000 (36.97)    0.0000 (1.0)           1;1  0.0000 (0.03)         20           1
test_allocation_benchmark_adj_list[directed_by_other_node_type-1000-5000]       29,590.0000 (44.03)    29,670.0000 (19.70)    29,594.0000 (40.89)     17.8885 (5.00)     29,590.0000 (44.03)    0.0000 (1.0)           1;1  0.0000 (0.02)         20           1
test_allocation_benchmark_adj_list[undirected_by_other_node_type-1000-5000]     38,888.0000 (57.87)    39,368.0000 (26.14)    38,912.0000 (53.77)    107.3313 (30.00)    38,888.0000 (57.87)    0.0000 (1.0)           1;1  0.0000 (0.02)         20           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@kjun9 kjun9 requested a review from huonw May 21, 2020 01:51
Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome. It's definitely much harder for me to think about this implementation, but it looks like it's not too much slower than the non-node type one, and (based on comparing the benchmarks in the PR description with your latest ones) significantly faster than the old form. 👍

Comment on lines +311 to +313
neigh_counts = sum(
np.bincount(ilocs, minlength=number_of_nodes) for ilocs in node_ilocs_list
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will allocate len(node_iloc_list) extra arrays, because it's not mutating in place like the old form did (in particular it'll do something like (0 + bincount) + bincount, and each of those + will create a whole new array). We could resolve this with something more complicated like:

Suggested change
neigh_counts = sum(
np.bincount(ilocs, minlength=number_of_nodes) for ilocs in node_ilocs_list
)
neigh_counts = np.bincount(node_ilocs_list[0], minlength=number_of_nodes)
for ilocs in node_ilocs_list[1:]:
neigh_counts += np.bincount(ilocs, minlength=number_of_nodes)

But, based on the benchmarks not really changing, I don't think this is worth it, especially because this would be entirely better using numpy/numpy#7998.

Comment on lines +451 to +454
source_types = self.node_data.type_ilocs[self.sources]
target_types = self.node_data.type_ilocs[self.targets]

def _to_directed_adj_list(arr, other_node_types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peak memory use may be able to be reduced by creating the ..._types arrays more transiently, so that they don't both exist at once. In particular, I think they can probably moved inside the helper method:

Suggested change
source_types = self.node_data.type_ilocs[self.sources]
target_types = self.node_data.type_ilocs[self.targets]
def _to_directed_adj_list(arr, other_node_types):
def _to_directed_adj_list(node_ilocs, other_node_ilocs):
other_node_types = self.node_data.type_ilocs[other_node_ilocs]

And then call it like _to_directed_adj_list(self.targets, self.sources) (or vice versa) below.

Comment on lines +451 to +454
source_types = self.node_data.type_ilocs[self.sources]
target_types = self.node_data.type_ilocs[self.targets]

def _to_directed_adj_list(arr, other_node_types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you felt like it, this nested method could become a top level method, like:

def _to_directed_adj_list(self, node_ilocs, other_node_ilocs):
    ...

def _create_directed_adj_lists_by_other_node_type(self):
    return (
        self._to_directed_adj_list(self.targets, self.sources),
        self._to_directed_adj_list(self.sources, self.targets),
    )

for other_node_type in np.unique(other_node_types):

# filter node ilocs based on other node type
arr_filtered = arr[other_node_types == other_node_type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the various whole-array filters here, this loop is O(edges * node types + nodes * node types); it's probably fine, assuming there's not many node types, but we could potentially consider other approaches, e.g.

  • doing a Pandas groupby, which might be O(edges + nodes * node types), although it'll like have more overhead, e.g.
    for other_node_type, flat_filtered in pd.Series(flat).groupby(other_node_types):
        splits = _get_splits([arr[flat_filtered]], self.number_of_nodes, self._id_index.dtype)
        index[other_node_type] = FlatAdjacencyList(flat_filtered, splits)
  • sorting other_node_types (and other bits too) so that index returned by np.unique(other_node_types, return_index=True) gives the first index of each node type, and thus we can do O(1) slicing instead of O(edges) comparisons

Both of these are likely to have more overhead and potentially be a bit fiddly (I don't know if the first one is the right code), so I don't know if they'll make sense. I don't know if we have a good "real world" dataset with many node types to test this, but this may also indicate that it's not actually a concern (knowledge graphs can end up with thousands of edge types... but only one node type).

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pandas one seems relatively nice and clear, I can try something like that 👍 - I'm still finding it a bit difficult to remember how to wrap my head around the sorting and filtering going on each time I look at it, so I'm inclined to accept a bit of overhead if it makes things easier to follow. I can try something similar for the undirected version too.

Comment on lines +460 to +469
for other_node_type in np.unique(other_node_types):

# filter node ilocs based on other node type
arr_filtered = arr[other_node_types == other_node_type]
splits = _get_splits(
[arr_filtered], self.number_of_nodes, self._id_index.dtype
)

# filter edge ilocs based on other node type
flat_filtered = flat[other_node_types[flat] == other_node_type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double indexing here is potentially suboptimal, especially because other_node_types[flat] doesn't change between loops but does have to be recomputed. I wonder if there's another way we could phrase this loop to be slightly more efficient even if we don't completely switch it:

            other_node_types = other_node_types[flat]

            for other_node_type in np.unique(other_node_types):
                # filter edge ilocs based on other node type
                flat_filtered = flat[other_node_types == other_node_type]
                # choose the relevant node ilocs based on the edge ilocs
                arr_filtered = arr[flat_filtered]
                splits = _get_splits(
                    [arr_filtered], self.number_of_nodes, self._id_index.dtype
                )

(Not a GitHub suggestion because the rendering is really ugly.)

This reduces us from doing two O(edges) comparisons to doing one O(edges) one, and one O(selected edges) one. This doesn't change the overall asymptotics of the loop but should reduce the overhead. (Moving the other_node_types[flat] out of the loop could be applied even without changing the order of flat_filtered and arr_filtered.)

self._create_undirected_adj_lists_by_other_node_type()
)

def _create_undirected_adj_lists_by_other_node_type(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is... impressive. It definitely seems super hard to phrase some of this.

Comment on lines +607 to +609
return self._adj_lookup_by_other_node_type(ins=ins, outs=outs).get(
other_node_type_iloc, defaultdict(list)
)[node_iloc]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it the get+defaultdict(list) here is in case there's node types that don't have any incoming edges or any outgoing edges (e.g. all edges might be a->b)? I'm slightly concerned that creating a defaultdict for every call may be adding noticeable overhead, and, also it's returning a list rather than a NumPy array, so consumers may get confused (and/or have to spend more work converting types).

One option might be pre-filling the _edges..._dict_by_other_node_type values with every node type, e.g.:

class EmptyFlatAdjacencyList:
    def __init__(self, dtype):
        self.empty = np.array([], dtype=dtype)

    def __getitem__(self, idx):
        return empty

Or just creating an O(nodes) FlatAdjacencyList, like FlatAdjacencyList(np.array([], dtype=...), np.zeros(len(nodes) + 1))

Another option would be moving the defaulting to a separate code path, like:

adj_list = self._adj_lookup_by_other_node_type(ins=ins, outs=outs).get(other_node_type_iloc)
if adj_list is None:
    return np.array([], dtype=...)
return adj_list[node_iloc]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, I did want to also allow completely non-existent node types to return empty lists:

g = StellarGraph(...)  # has node types ["A", "B"]
g.neighbors(1, other_node_type="X")  # should return np.array([])

So I might go with the separate code path approach?

Copy link
Member

@huonw huonw May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for returning empty, rather than emitting an error? Because this is using ilocs, I suspect the conversion to a type iloc should fail for that case?

@@ -1001,14 +1005,30 @@ def f():
@pytest.mark.parametrize(
"num_nodes,num_edges", [(100, 200), (1000, 5000), (20000, 100000)]
)
@pytest.mark.parametrize("force_adj_lists", ["directed", "undirected"])
@pytest.mark.parametrize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to capture some of performance of the filtering in the inner loop, I wonder if if it's worth also parameterising this benchmark by the number of node types, e.g.

@pytest.mark.parametrize("n_types", [1, 4, 20])

and passing this through to example_benchmark_graph.

(Potentially with a if force_adj_lists in ("directed", "undirected") and n_types != 1: pytest.skip("...") to avoid spending time on retesting uninteresting things.)

@deepakagrawal
Copy link

Is this enhancement already in place? How can I start using this version? The current implementation of UniformRandomMetaPathWalk is still very slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants