Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up syncing #1548

Open
hsanjuan opened this issue Jan 20, 2022 · 2 comments
Open

Speed up syncing #1548

hsanjuan opened this issue Jan 20, 2022 · 2 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding

Comments

@hsanjuan
Copy link
Collaborator

Downloading and processing 50 deltas that have surpassed 1MB of size takes about 5 minutes (including the download).

That gives, about 10 deltas per minute.

Each delta would have around 4000 pins. So we are processing 600 items per second, including callback notifications to the pintracker. That seems like a very low figure and we should strive to increase it. The more we do, the faster we can sync new nodes.

Is the bottleneck in the pin-tracker notifications? Or in the applying of the deltas? Or somewhere else?

@hsanjuan hsanjuan added kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding labels Jan 20, 2022
@RubenKelevra
Copy link
Collaborator

RubenKelevra commented Feb 8, 2022

I noticed that this is slow too while I was using my first version of my collab cluster where I was pinning each file individually.

At least for onboarding we could fetch all deltas and patch the status until we reach the last version (at the time we started) and then notify with the calculated result?

This avoids that we're calling the pin-tracker with half-calculated results which will be overridden just seconds later (at least that's what I understand from the current approach).

We could also use this approach if we need to "catch up" more than one delta, as we would calculate all deltas, merge the results and then notify the tracker.

@hsanjuan
Copy link
Collaborator Author

I am not sure the issue is results being overriden. Since deltas are processed from latest to newest there is usually no flip on the result.

Skipping notifications altogether and doing them at the end might be an option but I have to find out where exactly it is slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature need/analysis Needs further analysis before proceeding
Projects
None yet
Development

No branches or pull requests

2 participants