Wishlish: Choice of allocation strategy for `vector_append()` #2589

szhorvat · 2024-04-23T11:18:16Z

Currently vector_append() uses an allocation strategy that optimizes for space: it increases storage only by as much as needed. This means that repeatedly appending small vectors to a larger vector will be inefficient.

In contrast, vector_push_back() optimizes for time performance with repeated appends, by doubling storage each time it runs out.

The choice between the two can make a huge difference in performance:

Examples:

However, whenever appending more than a single element to a vector, vector_append() would be more convenient to use, and it could also be more efficient if it used a doubling allocation strategy.

I suggest making it possible to choose which allocation strategy to use: optimize for memory (extend by as much as needed) or optimize for time (double). This can be done either by having two separate functions (my preference) or adding a parameter to vector_append() which chooses between the two strategies.

@ntamas, which one shall we go with?

A third option is of course to just change the allocation strategy of vector_append(). Note that it can't just double storage. vector_append(v1, v2) should use whichever is larger: 2*len(v1) or len(v1) + len(v2).

The text was updated successfully, but these errors were encountered:

szhorvat · 2024-05-12T15:06:16Z

@ntamas Can you give some feedback here?

Another example where this would be useful is in igraph_get_all_simple_paths()

ntamas · 2024-05-14T12:26:46Z

Use two separate functions. This is an implementation detail that most users do not (and do not want to) care about. Adding an extra argument is API-breaking and it forces a cognitive burden on the end user.

szhorvat · 2024-05-14T14:02:42Z

Adding two functions is fine, and also my preference.

I would not call it an implementation detail though as it directly impacts the complexity of algorithms, as we just painfully found out with GraphML.

When I get the time, I'll look to see if there are any uses of vector_append where NOT doubling allocated storage is actually useful. An option would be to always double, and not even provide a memory-efficient version. Yet another option is to change the old function to double, and add a new memory-efficient variant.

ntamas · 2024-05-14T16:36:53Z

I would not call it an implementation detail though

It's not an implementation detail from our perspective as library maintainers, but for the ordinary end user that just needs a dynamic resizable vector it is an implementation detail 99% of the time. That's what I meant.

szhorvat added this to the 1.0 milestone Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wishlish: Choice of allocation strategy for `vector_append()` #2589

Wishlish: Choice of allocation strategy for `vector_append()` #2589

szhorvat commented Apr 23, 2024

szhorvat commented May 12, 2024

ntamas commented May 14, 2024

szhorvat commented May 14, 2024

ntamas commented May 14, 2024

Wishlish: Choice of allocation strategy for vector_append() #2589

Wishlish: Choice of allocation strategy for vector_append() #2589

Comments

szhorvat commented Apr 23, 2024

szhorvat commented May 12, 2024

ntamas commented May 14, 2024

szhorvat commented May 14, 2024

ntamas commented May 14, 2024

Wishlish: Choice of allocation strategy for `vector_append()` #2589

Wishlish: Choice of allocation strategy for `vector_append()` #2589