RBTree for set #2313

professorcode1 · 2023-02-17T18:41:48Z

I have written the set api to use Red and Black tree, but the degree sequence game internally relies on the logic that a set iterator will be an int. It an easy fix. Another 2 integers to keep track of number of elements iterated over. I'll update it soon

codecov · 2023-02-26T18:25:27Z

Codecov Report

Merging #2313 (5736bf6) into master (1d0ea4f) will decrease coverage by 0.05%.
The diff coverage is 82.55%.

❗ Current head 5736bf6 differs from pull request most recent head f937448. Consider uploading reports for the commit f937448 to get more accurate results

@@            Coverage Diff             @@
##           master    #2313      +/-   ##
==========================================
- Coverage   83.53%   83.48%   -0.05%     
==========================================
  Files         376      376              
  Lines       61640    61732      +92     
==========================================
+ Hits        51489    51535      +46     
- Misses      10151    10197      +46

Impacted Files	Coverage Δ
src/core/set.c	`81.56% <80.52%> (-13.34%)`	⬇️
src/cliques/cliques.c	`95.74% <100.00%> (ø)`
src/games/degree_sequence.c	`98.49% <100.00%> (-0.01%)`	⬇️
src/operators/subgraph.c	`94.52% <100.00%> (ø)`

... and 25 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d0ea4f...f937448. Read the comment docs.

…to fix/set

szhorvat · 2023-02-26T21:02:40Z

Unfortunately I won't have time to look at this for a while.

Whoever does the review, please check that there is no performance degradation in the IGRAPH_DEGSEQ_CONFIGURATION_SIMPLE method of igraph_degree_sequence_game(), which relies heavily on sets.

ntamas · 2023-02-26T21:04:40Z

I'll take a look at it early next week.

szhorvat · 2023-02-26T22:05:27Z

Benchmark suggestion:

Create a random graph, record its degree sequence, use that in igraph_degree_sequence_game(). Note that the timings will strongly depend on the specific degree sequence, so set a random seed!

Undirected suggestion:

Barabasi–Albert game, n=100 (tune n as needed), m=2
G(n,m), n=100, m=300-ish (tune m as needed, note that running time explodes in terms of m)
Regular graph with degree of ~7

Directed suggestion:

igraph_static_power_law_game, n=100, m=300-ish (tune this), both exponents set to 2.1
G(n,m), n=100, m=400-450-ish (tune m as needed, note that running time explodes in terms of m)
Regular graph with degree of ~5

ntamas · 2023-03-07T10:20:57Z

FYI, this is not forgotten, I've started writing the benchmark, hopefully it will be ready today.

ntamas · 2023-03-07T19:22:38Z

Benchmark results are in; it looks like the proposal in this PR did not improve performance, although there's probably still room for optimization.

Old version:

Proposal in this PR:

Compiled in release mode, with LTO. I don't have time to debug this further, but it's probably worth looking at the code with a profiler.

ntamas · 2023-03-07T19:44:57Z

One caveat with the benchmark above is that the implementation of igraph_degree_sequence_game() seems to be very sensitive on whether the algorithm gets "lucky" and finds an appropriate wiring of the graph with no multiple edges. The timings on the same graph, with the same algorithm, vary wildly depending on the random seed being used. There's a slim chance that changing the set implementation to RB-trees also changes the generated random numbers in the algorithm and the differences that we see are not due to the performance of the RB-trees but due to the fact that the algorithm spends more time trying to find an adequate wiring of the graph. We should probably count how many times igraph_set_contains() (or some other crucial set function) is called in the benchmarks; if there's a huge difference, this could mean that we are not testing the same thing in the two benchmarks even if the random seed and everything else is the same. Does the RB tree uses randomness anywhere?

szhorvat · 2023-03-07T19:51:30Z

src/core/set.c

+    if(newNode == NULL){
+        IGRAPH_CHECK_OOM(newNode, "Cannot reserve space for the new set element.");
    }


IGRAPH_CHECK_OOM already does the NULL-check, so the if is unnecessary.

professorcode1 · 2023-03-27T16:29:53Z

Just letting everyone know that I haven't given up on this. I got a job recently and I need to write a report to participate in a hackathon by this Wednesday. I will profile the code and upload the results here probably by Sunday, latest by next Wednesday. Then continue on improving the timing.

professorcode1 · 2023-05-14T16:42:46Z

Hi @szhorvat @ntamas
I am really sorry it took so long to do this. I am still adjusting to the new job life.

I ran the analysis and here are the results.
There are screenshots of the times and flame-graphs for the profiling.

Here is the time taken by the igraph set implementation

1 Degree sequence of undirected k-regular graph, N=1000, k=7, CONFIGURATION_SIMPLE
1.3s 1.2s 0s
2 Degree sequence of undirected BA graph, N=1000, m=1, CONFIGURATION_SIMPLE
0.789s 0.728s 0s
3 Degree sequence of undirected Erdos-Renyi graph, N=150, m=u450, CONFIGURATION_SIMPLE
0.804s 0.742s 0s
4 Degree sequence of GRG graph, N=10000, r=0.013, CONFIGURATION_SIMPLE
0.156s 0.144s 0s
5 Degree sequence of directed k-regular graph, N=1000, k=5, CONFIGURATION_SIMPLE
0.865s 0.798s 0s
6 Degree sequence of directed BA graph, N=500, m=2, CONFIGURATION_SIMPLE
2.22s 2.05s 0s
7 Degree sequence of directed Erdos—Renyi graph, N=15000, m=u45000, CONFIGURATION_SIMPLE
0.653s 0.603s 0s

Here is the time taken by the simple rb tree set implementation. The commit "set constructor error t"

2.400s 2.400s 0s

1.680s 1.680s 0s

1.460s 1.460s 0s

0.294s 0.294s 0s

1.610s 1.610s 0s

4.610s 4.610s 0s

1.290s 1.290s 0s

Next is the pool implementation here.
It is roughly 3 times slower than the simple rb tree implementation.

Lastly there is the multi pool implementation which is 5 times slower than the pool implementation. The multi pool implementation works by making a new pool rather than reallocating the old pool and updating the addresses. I don't understand why it is spending 60% of the time in reserve function, so its possible there's some bug there that I am not seeing(all the tests are passing though).

Here's something that instantly jumped out to me. In the simple implementation most time is wasted just deallocating the tree(44.21%). So if it is somehow possible to start a task asynchronously it will exactly be only slightly slower than the current implementation. Does igraph have a standard for threading? If yes, by making the contains function iterative instead of recursive and making the deletion asynchronous, I feel it should be as fast as the current implementation.

If not can we please just implement it alongside the current set implementation as igraph_rbtree_t? Developers can opt in to use it in case they need the iterator to behave like c++ stl iterators.

ntamas · 2023-05-15T09:37:22Z

Does igraph have a standard for threading?

No; igraph is inherently single-threaded at the moment and there are no immediate plans to add support for threading in general. Also note that many host languages that igraph embeds itself in (Python, Mathematica and R being the most prominent) are also single-threaded so it's not clear yet how much we could gain by embracing the additional complexity that comes with threads. So far we've been using OpenMP in places in the code where we could immediately benefit from parallelizing a for loop, but your use-case does not seem to fit the bill.

If not can we please just implement it alongside the current set implementation as igraph_rbtree_t?

I don't have a strong opinion here but I'm slightly leaning towards saying no. igraph's goal is not to provide a full general-purposes data structure library for C. If there is a place in igraph's current code where we could benefit from using a red-black tree instead of sets, I'm okay with adding such a data type, but adding a red-black tree just for the sake of having it means more maintenance burden on our end for little gain. I'm pretty sure that if someone badly needs a red-black tree in C, there are plenty of open-source libraries that already provide that and it's unlikely we can do any better than those without putting significant effort in. (Opinions @szhorvat ?).

professorcode1 · 2023-05-15T20:34:35Z

@ntamas I have implemented the indexed pool igraph as you suggested. It is exactly as slow as the regular pool implementation with an identical flame graph. I guess each calloc call takes less time when you ask for very few bytes since the OS doesn't have to move memory a bunch of memory.

If there is a place in igraph's current code where we could benefit from using a red-black tree instead of sets, I'm okay with adding such a data type, but adding a red-black tree just for the sake of having it means more maintenance burden on our end for little gain.

But I do need the RB tree based sets to implement my ant colony graph coloring algorithm. The reason I cannot use the igraph set is because its iterator doesn't satisfy my use case: if an element is remove from the set the iterator will point to the wrong element, and I need to delete elements from the set while iterating over it.

It seems we have 2 options, either implementing RB Tree as an igraph export or to use it inside the ant colony graph coloring algorithm file only. Please let me know which one you would like me to proceed with.

ntamas · 2023-05-16T10:55:55Z

But I do need the RB tree based sets to implement my ant colony graph coloring algorithm.

I see three possible ways forward:

Just start using std::set in your own implementation of the ant colony graph coloring algorithm, and we can accept that this will be a small part of the igraph library that uses C++. As long as you expose a C API, I think it's okay to use C++ internally. There are already other parts in the igraph library that use C++, but they expose a C API so they can be used from pure C programs.
If you can make your igraph_rbtree_t ipmlementation more performant than an std::set, you can implement your own in the context of the ant colony optimization PR and we can merge that as is. igraph_rbtree_t might or might not be exposed as a public API, we need to decide that together with the dev team.
If you do not want to use std::set but you are not confident that you can make your igraph_rbtree_t at least as performant as an std::set, you can probably make use of the uthash library. This is a hash table so it's suitable to be used as a set as long as the items are hashable, it's pure C, it's header-only, and it supports deletion-safe iteration, so it probably fits your use-case.

professorcode1 · 2023-05-21T17:52:55Z

I have tried 3 other method to make the set faster(5 total now). So far the simplest implementation is still the fastest.

I can't make much sense of the flame graphs. I used them, hypothesized what might be causing the problem, tried to resolve it with another method and failed.

Right now my current hypothesis on why its slow is that the degree sequence game test make a lot (at least 100 if not thousand) individual sets of size 1 and 2 and then discards them, which is why the more complex tree implementation is slower. Since we are basically comparing how long it takes to allocate the 24 bytes for the current implementation vs the 32 bytes of the tree implementation times 1000.

I will implement the std::set wrapper and bench that. Just writing this comment to update you on the progress.

professorcode1 · 2023-05-28T11:21:35Z

Hey

I haven't been able to figure out how to integrate C++ into the project. Can you please point me in the right direction on how I can go about doing it.

Thanks.

professorcode1 and others added 10 commits February 12, 2023 16:48

segfault

1e83ddb

set benchmark complete

419e93f

void added to function declarators

09af232

newline added at EOF

a849d8d

style: code reformatting, nitpicking, renaming benchmark

c8e49e0

Merge branch 'igraph:master' into fix/set

1c0cb30

basic set functionality done

95d06da

set internal changed to rb-tree

d0b479a

clicques set api updated

f45111f

degree sequence set api update

cc85010

professorcode1 mentioned this pull request Feb 18, 2023

Sets should be re written using Red Black Trees. #2287

Open

professorcode1 and others added 7 commits February 19, 2023 13:47

robust set iterator

913684a

int to igraph_int

bfd8b8d

set all tests passing

5cee357

internal set doc updated

b88a536

igraph tree print test created

ccf1ef2

delete added

20ffb37

Merge branch 'igraph:master' into fix/set

c1d13ed

professorcode1 added 2 commits February 26, 2023 23:57

set print fixed

a7ffaff

Merge branch 'fix/set' of https://github.com/professorcode1/igraph in…

f6ddbe1

…to fix/set

professorcode1 marked this pull request as ready for review February 26, 2023 18:47

ntamas self-assigned this Feb 26, 2023

szhorvat reviewed Mar 7, 2023

View reviewed changes

professorcode1 added 8 commits May 13, 2023 19:52

set constructor error t

21cb96f

much slower pool implementation

5703961

index on set: 5703961 much slower pool implementation

c05ad69

WIP on set: 5703961 much slower pool implementation

e50cef0

multipool implemented

963439c

minor fix to multi pool implementation

1ea2ba0

memory leak fix

7929a85

memory leak fix

9587a47

professorcode1 added 4 commits May 16, 2023 01:07

index pool implementation working

7f404e4

index pool benchmark added

7d11ad2

empty files for set

5e04df0

Merge branch 'temp' into fix/set

54a5d29

professorcode1 added 8 commits May 19, 2023 21:06

distinct pool

659a6ab

correct itertor implemented

124aadf

iterative delete implemented

2981d84

Merge branch 'temp' into fix/set

970ff90

Merge branch 'temp1' into fix/set

d923e77

empty set files

60e144d

single node multi pool

4b4a9bc

Merge branch 'temp' into fix/set

5736bf6

cpp set implementation

f937448

szhorvat force-pushed the master branch from 41190cc to 9d06a76 Compare December 7, 2023 04:39

szhorvat force-pushed the master branch from 563d594 to 36a707a Compare May 23, 2024 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RBTree for set #2313

RBTree for set #2313

professorcode1 commented Feb 17, 2023

codecov bot commented Feb 26, 2023 •

edited

szhorvat commented Feb 26, 2023

ntamas commented Feb 26, 2023

szhorvat commented Feb 26, 2023 •

edited

ntamas commented Mar 7, 2023

ntamas commented Mar 7, 2023

ntamas commented Mar 7, 2023

szhorvat Mar 7, 2023

professorcode1 commented Mar 27, 2023

professorcode1 commented May 14, 2023

ntamas commented May 15, 2023

professorcode1 commented May 15, 2023

ntamas commented May 16, 2023

professorcode1 commented May 21, 2023

professorcode1 commented May 28, 2023

RBTree for set #2313

Are you sure you want to change the base?

RBTree for set #2313

Conversation

professorcode1 commented Feb 17, 2023

codecov bot commented Feb 26, 2023 • edited

Codecov Report

szhorvat commented Feb 26, 2023

ntamas commented Feb 26, 2023

szhorvat commented Feb 26, 2023 • edited

ntamas commented Mar 7, 2023

ntamas commented Mar 7, 2023

ntamas commented Mar 7, 2023

szhorvat Mar 7, 2023

Choose a reason for hiding this comment

professorcode1 commented Mar 27, 2023

professorcode1 commented May 14, 2023

ntamas commented May 15, 2023

professorcode1 commented May 15, 2023

ntamas commented May 16, 2023

professorcode1 commented May 21, 2023

professorcode1 commented May 28, 2023

codecov bot commented Feb 26, 2023 •

edited

szhorvat commented Feb 26, 2023 •

edited