Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple simultaneous mounts #83

Open
Vadiml1024 opened this issue Apr 23, 2022 · 12 comments
Open

Multiple simultaneous mounts #83

Vadiml1024 opened this issue Apr 23, 2022 · 12 comments

Comments

@Vadiml1024
Copy link

I'm working on a project where we need to simultaneously mount a LOT of archives and I'm talking about thousands of them.
For the moment while experimenting with ratarmount we launch separate instance of ratarmount for each archive.
This generates ENORMOUS memory pressure on the system (especially when ratarmount distributed as AppImage because each invocation has seprate nonshared instance of python interpreter)
So we are considering to add an option to to read a file containing pairs:
"filename" "mountpoint"
and have single instance of ratarmount create all FuseMount objects for them.

Do you have any feedback on this idea?

@mxmlnkn
Copy link
Owner

mxmlnkn commented Apr 23, 2022

I'm not even sure whether FUSE and/or fusepy allows such as setup but I didn't try. For the mount point, it forks off a background process that keeps running. Maybe it works to do that multiple times but I kinda doubt it.

You could use union mounting (if each archive has a top-level directory) or recursive mounting for that, or are there problems?

In the worst case, if you really want specific mount point locations, you could use symbolic links like this:

# Create two archives at random positions
tar1=$( mktemp --suffix=.tar )
tar2=$( mktemp --suffix=.tar )
sample=$( mktemp )
echo foo > "$sample"
tar -cf "$tar1" "$sample"
tar -cf "$tar2" "$sample"
# Create an intermediary structure to mount both of them
folder=$( mktemp -d )
# Unfortunately, recursive mounting does not follow symlinks, so use hardlinks. I'd almost categorize this as a bug.
ln "$tar1" "$folder/$( basename -- "$tar1" )"
ln "$tar2" "$folder/$( basename -- "$tar2" )"
# Mount recursively
ratarmount --recursive "$folder" mountpoint
# Create links for desired mountpoint locations
ln -s "mountpoint/$( basename -- "$tar1" )" mountpoint1
ln -s "mountpoint/$( basename -- "$tar2" )" mountpoint2

One problem I see with this is that there is no option to specify the recursion depth, but that seems easier to implement than the pair-wise parsing.

Btw what kind of archives are you using and what are your performance requirements? If all file system calls go through the same ratarmount instance, it might bottleneck at some point if there are too many. And when using compressed archives, especially bz2 archives, the default is to use parallel decoding, which increases the memory usage. Use -P 1 to change that.

@Vadiml1024
Copy link
Author

Vadiml1024 commented Apr 23, 2022 via email

@mxmlnkn
Copy link
Owner

mxmlnkn commented Apr 24, 2022

It is an interesting idea, but I'm afraid will not work in our case,
because the various archives (and directories) are residing on
different disks
so handling are nogo....

I forgot to update the code and comment. Symbolic links and therefore multi-disk also works! The problem in my tests was that I created the test files without any extension, which makes the recursive mounting fail because it only looks at extensions first to avoid expensive "disk" or archive/decompressor accesses.

However it makes me think about another possibility -
VirtualMountSource class
which will serve the role of $folder from your example....
Should be not too complicated to implement....
So ratarmount will receive (somehow) a list of filenames will create a
VirtualMountSource and populate it with the supplied filenames
and will create *FuseMount *with this VirtualMountSource...

I think the UnionMounting feature is already very similar. You can do:

ratarmount foo1.tar foo2.tar foo3.tar mountpoint

But this will mount the contents of each archive into the mount point. To be more generic, you would want to mount the contents of each archive under a different folder under the mount point. I think that should be doable with a command line flag and I would prefer this to a new pairwise command line syntax. It would be similar in semantic to 7-zip's "Extract here" vs "Extract to Folder" options.

@mxmlnkn mxmlnkn changed the title Multiple simulatanious mounts Multiple simultaneous mounts May 4, 2022
@mxmlnkn
Copy link
Owner

mxmlnkn commented May 28, 2022

This generates ENORMOUS memory pressure on the system (especially when ratarmount distributed as AppImage because each invocation has seprate nonshared instance of python interpreter)

Are you sure the only problem is the Python interpreter binary? I assume your hands are bound to use that AppImage :/

One other thing I can think of might be the SQLite cache size. There currently is no option for that but as you have already forked, you can add it yourself inside SQLiteIndexedTar._openSqlDb: PRAGMA CACHE_SIZE 16 and maybe repeat for the PAGE_CACHE_SIZE. But if #85 is responsible for the memory usage, then this might also not help much.

@Vadiml1024
Copy link
Author

Are you sure the only problem is the Python interpreter binary? I assume your hands are bound to use that AppImage :/

Actually, You are right, I can't be sure about it, it was simply the first thing that came to mind as I saw the machine brought to its knees with a 1.5K ratarmount instances running.

Now seeing you mentioning CACHE_SIZE and PAGE_CACHE size brings me to understand why I've seen (and continue to see even with single instance of ratarmount) the 160G of virtual memory usage on 16G machine.
These are probably zero-filled pages without physical memory backing, preallocated by SQLite.

@mxmlnkn
Copy link
Owner

mxmlnkn commented Feb 20, 2023

I'm not sure whether you are still interested. I have two ideas for realizing something like that:

  1. --disable-union-mount or maybe --mount-in-subfolders. This option would simply mount each archive in an identically named subfolder. I guess that it does not scale up if there are multiple archives with the same name.
  2. --batch-mount <file>. Each line in the given file would be a set of arguments just as if calling ratarmount. All given mountpoints are limited to the actual mount point subfolder.

Example for 2:

# Create a.zip containing foo.txt
# Create /tmp/b.zip containing bar.txt
ratarmount --batch-mount <<HEREDOC all-mounted
--recursive a.zip mounted-a
b.zip mounted-b
a.zip b.zip union-mounted
HEREDOC
tree all-mounted

Expected output:

all-mounted
+- mounted-a
   +- foo.txt
+- mounted-b
   +- bar.txt
+- union-mounted
   +- foo.txt
   +- bar.txt

The --recursive in the first line is only to exemplify that in the ideal case all mount-relevant options can be specified per submount. Of course some command line arguments are not applicable for these submounts, I'll have to make a list of applicable options.

As for files with newlines in them, I guess I'll need a second --batch-mount0 similar to find's -print0. In that case, I might take this one step further and have each argument be separated with one \0 and each submount with \0\0. That should cover all kinds of problematic input.

@Vadiml1024
Copy link
Author

Sure, It will be pretty useful in our use case

@mxmlnkn
Copy link
Owner

mxmlnkn commented Feb 20, 2023

  1. Same as 2 but, instead of specifying a file, each -- in the command line would start a new submount.

Same example for 3:

ratarmount all-mounted --recursive a.zip mounted-a -- b.zip mounted-b -- a.zip b.zip union-mounted

Normally -- is used to stop command line argument parsing and take all remaining arguments verbatim. I'm not sure whether I use that or need that. If there is an archive starting with -- it could also be specified as ./--archive-name.zip instead. Or alternatively, I could use --- to separate submounts. So something like this would work:

ratarmount --batch-mount all-mounted --- --recursive a.zip mounted-a --- --recursive -- --b.zip mounted-b --- a.zip b.zip union-mounted

I'm not yet sure how to specify the outer mount folder location.

@Vadiml1024
Copy link
Author

Outer mount folder location: --target mountpoint

@mxmlnkn
Copy link
Owner

mxmlnkn commented Feb 20, 2023

  1. ratarmount --batch-mount-listen. Instead of specifying a single file, ratarmount can be used like a server. The format would be the same as proposed for the files but (zero-delimited) lines can be piped to ratarmount after the startup / after the mount point has been created. This also includes lines for unmounting. My first idea was communication via stdin but it might be more conventional to use a socket, which I have not much experience of using. Or maybe something like named pipes?

    ratarmount --batch-mount-listen special-file all-mounted
    echo "a.zip mounted-a" >> special-file
    echo "--unmount mounted-a" >> special-file
    ...

    Heck, this doesn't even need a socket or named pipe, ratarmount could simply offer a writable special file in the root of the mountpoint that it monitors similar to some files in /sys/.

@Vadiml1024
Copy link
Author

Great idea

@mxmlnkn
Copy link
Owner

mxmlnkn commented Feb 21, 2023

ratarmount-manylinux2014_x86_64.AppImage.zip

For now, I have implemented the simplest solution, the --disable-union-mount. I would be very interested if this actually reduces the memory usage you observed. The usage would be:

ratarmount-manylinux2014_x86_64.AppImage --disable-union-mount denormal-paths.zip large.zip mountPoint
tree mountPoint

Possible output:

mountPoint/
├── denormal-paths.zip
│   ├── foo
│   ├── root
│   │   └── bar
│   └── ufo
└── large.zip
    └── 10k-1MiB-files.tar.gz

3 directories, 4 files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants