New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make import/export uniform for all storage plugins #2640
Comments
We should let the plugin decide whether it needs the temporary file or not (e.g. via a status flag in the README). Otherwise IPC the way it is done in specload for example will be slowed down significantly, because the processes can no longer read/write in parallel. We should also avoid creating the temporary file as far as possible. If there is anyway to find the regular file behind stdin/stdout we should use that, as the time spent copying to/from a temporary file might be significant for larger keysets. |
I don't think there is a way to find out where something was piped in from. What we can do, is simply use the file path instead of piping it in. Also, you are definitely right about the slowdown with the temp file. I simply had to use a temp file, because mmap (POSIX) can only work with regular files. |
Yes, I know that. The issue description just made it sound like a temp file should always be used l, which would be a bad solution IMO. |
I fully agree. Many other plugins work well with non-regular files. They must not be slowed down by this. |
I was considering to write an implementation hint that the temporary file should be avoided (if possible via infos/status) or that sendfile should be used. But all of this would be a premature optimization. Of course it is very valuable if you share your experience here. Do you already know the difference in run-time for running all tests cases if all the content of import/export is copied? |
No I have neither implemented nor benchmarked this. Writing to disk will be slower by orders of magnitude by definition for some years to come, until persistent memory with DRAM speed is standard. |
This is not so clear as file systems can basically avoid accessing the disk at all as long as fsync is not called. Furthermore temporary files can also be in a tmpfs. So I would prefer that we have a look where the actual bottleneck with import/export is before we start optimizing. |
No file system optimization will come close to writing to and reading from a pipe in parallel, like it can be done now in |
I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue. |
This issue is actually a quite good example, how we were unable to fix a very simple problem (kdb export/import failing for some storage plugins that cannot directly read from stdin/stdout) just because of performance considerations without any knowledge if it is actually relevant (we are speaking here about import/export, not something that slows down application startup). I hope we can improve in our discussions and focus more on the goals in the future. |
I mark this issue stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping the issue by writing a message here or create a new issue with the remainder of this issue. |
I closed this issue now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue. |
Problem: Currently some plugins implement workarounds (see #2419) others do not support all forms of serialization (e.g. those who use fseek with a pos != 0, e.g. csvstorage). Furthermore, operating systems not having /dev/stdin or /dev/stdout are not fully supported.
Solution: The import/export framework should provide a temporary file where storage plugins write to (as they would do normally). The content of this file is copied from/to stdin/stdout.
The text was updated successfully, but these errors were encountered: