Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming or creation of file fails (Telegram won't start) #999

Open
RazielleS opened this issue Dec 22, 2021 · 14 comments
Open

Renaming or creation of file fails (Telegram won't start) #999

RazielleS opened this issue Dec 22, 2021 · 14 comments

Comments

@RazielleS
Copy link

RazielleS commented Dec 22, 2021

Describe the bug

I have recently changed from nfs4 to virtiofs for some mounts (including /home) in a VM.
Since I changed the mount on the home folder, I get a weird error when starting Telegram in this VM.
It complains something about not being able to open a file:

[2021.12.22 18:20:15] Launched version: 3003000, install beta: [FALSE], alpha: 0, debug mode: [FALSE]
[2021.12.22 18:20:15] Executable dir: /usr/bin/, name: telegram-desktop
[2021.12.22 18:20:15] Initial working dir: /home/raziel/
[2021.12.22 18:20:15] Working dir: /home/raziel/.local/share/TelegramDesktop/
[2021.12.22 18:20:15] Command line: telegram-desktop --
[2021.12.22 18:20:15] Executable path before check: /usr/bin/telegram-desktop
[2021.12.22 18:20:15] Logs started
[2021.12.22 18:20:15] Launcher filename: telegramdesktop.desktop
[2021.12.22 18:20:15] Connecting local socket to /run/user/1000/a0b1cff8440a846e992153710f78f8a0-{87A94AB0-E370-4cde-98D3-ACC110C5967D}...
[2021.12.22 18:20:15] Socket connect error 0, starting server and app...
[2021.12.22 18:20:15] Could not open '/home/raziel/.local/share/TelegramDesktop/log.txt' file to start new logging: No such file or directory
[2021.12.22 18:20:15] Executable path before check: /usr/bin/telegram-desktop
[2021.12.22 18:20:15] FATAL: Could not move logging to '/home/raziel/.local/share/TelegramDesktop/log.txt'!

It looks like it has something to do with mergerfs, because if I change the mount to the non-mergerfs-source, everything works.

It might be related to #989 because it sounds a bit familiar. Although this runs on an Arch Linux Server and client.

I have added an strace of the app and mergerfs, done at the same time.
I hope you can do something with them, because I am a bit lost there.
As far as I understand app.strace, the log file gets deleted and another one is moved in its place. But I don't understand the mergerfs.strace.

I can delete (rm) the mentioned file without any issue, and it even gets recreated when starting Telegram. But Telegram seems to see this differently.

To Reproduce

  • Bind home as mount with virtiofs
  • Start telegram

I could not reproduce it with any default tools like rm, mv, ln or touch

Expected behavior

The file should get copied / moved without errors

System information:

  • OS, kernel version: Linux sec 5.15.10-arch1-1-vfio overflow risk in ioctl.cpp #1 SMP PREEMPT Mon, 20 Dec 2021 17:55:46 +0000 x86_64 GNU/Linux
  • mergerfs version: 2.33.3
  • mergerfs settings
 /mnt/disk1:/mnt/disk2   /mnt/userHdd/   fuse.mergerfs   allow_other,noforget,inodecalc=path-hash,nfsopenhack=all,cache.files=partial,dropcacheonclose=true,category.create=mfs,ignorepponrename=true,fsname=diskHdd  0       0
/mnt/cache:/mnt/userHdd /mnt/user/      fuse.mergerfs   allow_other,noforget,inodecalc=path-hash,nfsopenhack=all,cache.files=partial,dropcacheonclose=true,category.create=epff,func.mkdir=epall,moveonenospc=true,minfreespace=100G,ignorepponrename=true,fsname=disk*      0       0
  • List of drives, filesystems, & sizes:
    • df -h:
diskHdd                  15T  7.9T  5.9T  58% /mnt/userHdd
disk*                    17T  8.8T  6.7T  57% /mnt/user
/dev/sdb1               7.3T  4.0T  3.0T  58% /mnt/disk1
/dev/sdc1               7.3T  4.0T  3.0T  58% /mnt/disk2
/dev/sda1               7.3T  3.8T  3.2T  55% /mnt/parity0
/dev/md127p1            1.8T  905G  835G  53% /mnt/disk
  • lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda              8:0    1   7.3T  0 disk  
└─sda1           8:1    1   7.3T  0 part  /mnt/parity0
sdb              8:16   1   7.3T  0 disk  
└─sdb1           8:17   1   7.3T  0 part  /mnt/disk1
sdc              8:32   1   7.3T  0 disk  
└─sdc1           8:33   1   7.3T  0 part  /mnt/disk2
nvme2n1        259:0    0 931.5G  0 disk  
└─nvme2n1p1    259:1    0 931.5G  0 part  
  └─md127        9:127  0   1.8T  0 raid0 
    └─md127p1  259:7    0   1.8T  0 part  /mnt/cache
                                          /mnt/disk0
nvme0n1        259:2    0 931.5G  0 disk  
└─nvme0n1p1    259:3    0 931.5G  0 part  
  └─md127        9:127  0   1.8T  0 raid0 
    └─md127p1  259:7    0   1.8T  0 part  /mnt/cache
                                          /mnt/disk0
  • A strace of the application having a problem:
    app.strace.txt.gz
    (sorry for zipping it, github didn't allow me to upload another txt file somehow...)

  • strace of mergerfs while app tried to do it's thing:
    mergerfs.strace.txt

@trapexit
Copy link
Owner

The mergerfs trace doesn't show interacting with the log files. Either the temp file or log.txt.

Are you able to reproduce this with Telegram (or better yet simple tooling) without all that setup?

@trapexit
Copy link
Owner

BTW... you need to use use_ino

@RazielleS
Copy link
Author

I had a bunch of other issues with use_ino together with nfs. But since I am moving away from it, I can give it another try.

I now disabled the FS caching of virtiofs (server and client), because I thought maybe I get a better trace then...and instead it works now.
This leaves me very confused on how virtiofs and mergerfs interact, but I am also happy that it works.
I would create an issue at virtiofs...but honestly I am not sure on how to describe the issue anymore.

@trapexit
Copy link
Owner

As the docs mention use_ino is required for inodecalc to make any sense. Otherwise libfuse is just giving out random inode values.

caching could explain some of the situation but I don't know enough about virtiofs to comment without more investigation (which btw is being changed in the kernel to use basically the same codepath as FUSE filesystems)

Caching shouldn't be keeping calls from reaching mergerfs generally. Especially not create calls. Would need more investigation.

@RazielleS
Copy link
Author

I will finish the migration and change use_ino as soon as possible. Maybe it helps.

I will also play around with caching a bit and make a few more traces.
Thank you so far :-)

@RazielleS
Copy link
Author

RazielleS commented Dec 22, 2021

Turned out I got the wrong pid when doing the strace of mergerfs...There are two instances and I got the wrong one :-(

These traces now both show log.txt. Traces done when the error occured (so cache back to normal)
traces.zip

@RazielleS
Copy link
Author

Oh, I also added use_ino, which no visible change so far

@trapexit
Copy link
Owner

use_ino impacts the inode generation. You set inodecalc and that has no meaning/impact unless you enable use_ino.

@trapexit
Copy link
Owner

It looks like the traces are from different times.

@RazielleS
Copy link
Author

You're right. I'll have a look at it tomorrow, looks like I am too tired to manage it right now.

@RazielleS
Copy link
Author

So this is the one. Picked the wrong file yesterday, sorry for this.
Now both of them are from the same time and say something about log.txt.

traces.zip

@trapexit
Copy link
Owner

After the unlink there is no further evidence that mergerfs is being requested to do anything with log.txt. You said if you turn off some caching it works? Can you trace that?

It's calling renameat2 which mergerfs doesn't yet implement but that shouldn't matter as the kernel errors and the software falls back to a link + unlink behavior.

@RazielleS
Copy link
Author

This is a trace with disabled caching, when the app works.

strace.zip

@trapexit
Copy link
Owner

Everything looks the same between the two until after the temporary file for log.txt is unlinked. After that the client tries to chmod the linked log.txt file. When caching is off it succeeds and when on it fails. It fails before it even gets to mergerfs. Interestingly, after the chmod, even in the case where the chmod fails, it tries to open log.txt to append to it. With caching it fails with ENOENT implying part of the path doesn't exist and I'm not seeing anything obvious in mergerfs causing it (or that it even gets called).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants