Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make import/export uniform for all storage plugins #2640

Closed
markus2330 opened this issue Apr 22, 2019 · 12 comments
Closed

make import/export uniform for all storage plugins #2640

markus2330 opened this issue Apr 22, 2019 · 12 comments
Milestone

Comments

@markus2330
Copy link
Contributor

Problem: Currently some plugins implement workarounds (see #2419) others do not support all forms of serialization (e.g. those who use fseek with a pos != 0, e.g. csvstorage). Furthermore, operating systems not having /dev/stdin or /dev/stdout are not fully supported.

Solution: The import/export framework should provide a temporary file where storage plugins write to (as they would do normally). The content of this file is copied from/to stdin/stdout.

@kodebach
Copy link
Member

We should let the plugin decide whether it needs the temporary file or not (e.g. via a status flag in the README). Otherwise IPC the way it is done in specload for example will be slowed down significantly, because the processes can no longer read/write in parallel.

We should also avoid creating the temporary file as far as possible. If there is anyway to find the regular file behind stdin/stdout we should use that, as the time spent copying to/from a temporary file might be significant for larger keysets.

@mpranj
Copy link
Member

mpranj commented Apr 22, 2019

I don't think there is a way to find out where something was piped in from. What we can do, is simply use the file path instead of piping it in.

Also, you are definitely right about the slowdown with the temp file. I simply had to use a temp file, because mmap (POSIX) can only work with regular files.

@kodebach
Copy link
Member

I simply had to use a temp file

Yes, I know that. The issue description just made it sound like a temp file should always be used l, which would be a bad solution IMO.

@mpranj
Copy link
Member

mpranj commented Apr 22, 2019

sound like a temp file should always be used l, which would be a bad solution IMO.

I fully agree. Many other plugins work well with non-regular files. They must not be slowed down by this.

@markus2330
Copy link
Contributor Author

I was considering to write an implementation hint that the temporary file should be avoided (if possible via infos/status) or that sendfile should be used. But all of this would be a premature optimization.

Of course it is very valuable if you share your experience here. Do you already know the difference in run-time for running all tests cases if all the content of import/export is copied?

@mpranj
Copy link
Member

mpranj commented Apr 22, 2019

No I have neither implemented nor benchmarked this. Writing to disk will be slower by orders of magnitude by definition for some years to come, until persistent memory with DRAM speed is standard.

@markus2330
Copy link
Contributor Author

Writing to disk will be slower by orders of magnitude by definition for some years to come

This is not so clear as file systems can basically avoid accessing the disk at all as long as fsync is not called. Furthermore temporary files can also be in a tmpfs.

So I would prefer that we have a look where the actual bottleneck with import/export is before we start optimizing.

@kodebach
Copy link
Member

No file system optimization will come close to writing to and reading from a pipe in parallel, like it can be done now in specload. This is simply the case because the temporary file forces a new step (copy from temporary to actual and then to another temporary) in between write and read.

@markus2330 markus2330 assigned MasterToney and unassigned llukask Oct 24, 2019
@markus2330 markus2330 added this to Backlog in 1.0 Nov 13, 2019
@markus2330 markus2330 moved this from Backlog to In Progress in 1.0 Nov 13, 2019
@markus2330 markus2330 removed this from In Progress in 1.0 Nov 13, 2019
@stale
Copy link

stale bot commented Oct 23, 2020

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@stale stale bot added the stale label Oct 23, 2020
@stale stale bot removed the stale label Oct 24, 2020
@markus2330
Copy link
Contributor Author

This issue is actually a quite good example, how we were unable to fix a very simple problem (kdb export/import failing for some storage plugins that cannot directly read from stdin/stdout) just because of performance considerations without any knowledge if it is actually relevant (we are speaking here about import/export, not something that slows down application startup). I hope we can improve in our discussions and focus more on the goals in the future.

@stale
Copy link

stale bot commented Oct 24, 2021

I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@stale stale bot added the stale label Oct 24, 2021
@stale
Copy link

stale bot commented Nov 7, 2021

I closed this issue now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@stale stale bot closed this as completed Nov 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants