Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wishlish: Choice of allocation strategy for vector_append() #2589

Open
szhorvat opened this issue Apr 23, 2024 · 4 comments
Open

Wishlish: Choice of allocation strategy for vector_append() #2589

szhorvat opened this issue Apr 23, 2024 · 4 comments
Milestone

Comments

@szhorvat
Copy link
Member

Currently vector_append() uses an allocation strategy that optimizes for space: it increases storage only by as much as needed. This means that repeatedly appending small vectors to a larger vector will be inefficient.

In contrast, vector_push_back() optimizes for time performance with repeated appends, by doubling storage each time it runs out.

The choice between the two can make a huge difference in performance:

Examples:

However, whenever appending more than a single element to a vector, vector_append() would be more convenient to use, and it could also be more efficient if it used a doubling allocation strategy.

I suggest making it possible to choose which allocation strategy to use: optimize for memory (extend by as much as needed) or optimize for time (double). This can be done either by having two separate functions (my preference) or adding a parameter to vector_append() which chooses between the two strategies.

@ntamas, which one shall we go with?

A third option is of course to just change the allocation strategy of vector_append(). Note that it can't just double storage. vector_append(v1, v2) should use whichever is larger: 2*len(v1) or len(v1) + len(v2).

@szhorvat szhorvat added this to the 1.0 milestone Apr 23, 2024
@szhorvat
Copy link
Member Author

@ntamas Can you give some feedback here?

Another example where this would be useful is in igraph_get_all_simple_paths()

@ntamas
Copy link
Member

ntamas commented May 14, 2024

Use two separate functions. This is an implementation detail that most users do not (and do not want to) care about. Adding an extra argument is API-breaking and it forces a cognitive burden on the end user.

@szhorvat
Copy link
Member Author

Adding two functions is fine, and also my preference.

I would not call it an implementation detail though as it directly impacts the complexity of algorithms, as we just painfully found out with GraphML.

When I get the time, I'll look to see if there are any uses of vector_append where NOT doubling allocated storage is actually useful. An option would be to always double, and not even provide a memory-efficient version. Yet another option is to change the old function to double, and add a new memory-efficient variant.

@ntamas
Copy link
Member

ntamas commented May 14, 2024

I would not call it an implementation detail though

It's not an implementation detail from our perspective as library maintainers, but for the ordinary end user that just needs a dynamic resizable vector it is an implementation detail 99% of the time. That's what I meant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants