New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syncing stuck on XX% on one device, despite the synced files are all ok and checksums match on all devices involved. #9482
Comments
I have removed the sync directory from the Syncthing only from the newest device and readded it again with default settings except Alphabetic download order (pointing to already synced data folder, entering the desired directory ID) and after the scan Syncthing said that everything is synced OK. I guess there is some kind of a bug in Syncthing and it happen already twice for me. I will probably setup a VM for testing purposes and I will try to make another full sync just for my curiosity. If there is any advise what shall I check during that process, please let me know. One think which may be a culprit is that the one directory has approx 200_000 super small files and maybe because of that Syncthing got stuck also with other directories, IDK. This is super weird. |
Shouldn't the Syncthing rescan the whole directory when it sees such error like: |
I have reproduced the issue for the 2nd time. I will preserve the whole Syncthing configs with logs from A, B and C if anybody wants to ask for more details, maybe I will be able to dig through it. Explanation: What I see is that the newest device C got the same name as my device A somehow. Please tell me what should I look for in the logs to help fixing this issue. |
In the config file I have found such definition which points to DEVICE C, but the name of that DEVICE is WRONG
|
Interesting... Why "C" got a name of "A"?
|
The device name is whatever you put in there. If you leave it blank, it will copy the remote device's name as configured there, maybe that wasn't set right at the time? Thank you very much for digging into this. It should prove really helpful to get logs if you can reproduce, although I understand it's tedious. Just maybe better to move it to the forum, as long as the bug isn't clearly identified. It helps keep a better overview in the issue tracker, for when a developer starts working on a fix. Thus we like to have the debugging sessions on the forum and just describe the final findings here. |
IMO the bug is clearly described, isn't it? Why my remote is getting stuck from one of devices point of view? I don't care about the names, I just care about sync stuck and trying to find something useful. The bug is about sync stuck, all other things are extras. As a .net dev myself I like as much info as it can be provided in one place, no too look for it, so I did this way. But It looks like I have to start learning a GO to try to dig into some code and see if something catches my eyes or try to debug it. |
Something is really wrong with my setup or IDK. I'm having the same situation usually with one client all the times, It is getting stuck even with very small transfers like few MiB. I'm going to get rid of it and replace with a fresh new config. |
Again, the same story. I have added a new REMOTE which synced fine (in terms of files integrity), but some of the source devices (one Android, and one Windows 11 PRO) are reporting that there are out of sync files and syncing process pending on their GUIs. Any guidance how I can debug this thing? I can's see anything useful in the logs though, there is no errors, nothing, even I checked all the debug output options. Worth to notice that this time I'm not seeing any log entries: "Puller isn't making sync progress", and the names of the devices seems to be ok (I have recreated one device from scratch - the one which was failing me before, but yet again the new one fails!). What is super strange, that list of 629 out of sync items, shows me that the MOST of the files were modified by some other device which is nowhere to be found or it was not involved to any activities during the synchronization. I had such device before (IYZXYIA - please see previous experiment with sync) but it is no more (I removed it and it is completely gone along with the whole VM), why it is shown here as a "Mod. Device" in that list? And the rest of those files should not be marked as modified. Anyway that list is a lie, because as I wrote everything is already synced which I verified by checksum calculations. Please advise. |
Same here, Syncthing v1.27.5, Linux (64-bit Intel/AMD): |
Same here:
I checked the sha1sum of the two elements, they are the same. Also I did not modified those files since like 2018, so that's strange seeing this popping right now. My device is running syncthing I grepped the out of sync files in the logs: nothing:
(yes the two "out of sync" files are sh3d files). In my side I have a .stignore containing:
in the remote there's no .stignore file. I don't know where to gather more info from here. |
While your issue might be the same one as otherwise reported here, there a lot of changes and some very relevant to components involved here. Generally involving multiple versions increases the problem space a lot and we don't support/troubleshoot those. If you want support/fixes for distro versions, please report to your distros bug tracker. As to the overall issue: We are very aware that there is a problem with "some index information missing", it's also not new. Unfortunately a way to reproduce it reliably and independently has yet to come up. There's no lack of looking into it, multiple maintainers did multiple times. I know this doesn't help anyone and isn't a satisfactory update, unfortunately it's the only one I have. |
@imsodin because I think I can easily reproduce this issue (got it 4 times in a row) I guess I have to learn the GO and how to debug it (I'm a .NET developer and have no knowledge or any experience with the GO, but at least I may try to do so). Can you give me some starting points where should I look first? Or where I can learn how effectively debug the GO please? If I can have some kind of conditional break point thing here than I think that won't be so difficult maybe - at least I learn something new. |
I think I also have this issue now. I have a script that tells me when a last successful 100% sync was finished, and it tells me it was 1 day, 6 hours, 9 minutes ago. Usually a full sync is done every 5 minutes or so. So it likely means that in a recent change, something got broken. Both of my servers run syncthing version v1.27.5 |
The latest RC has a potential fix for this/someting related: https://github.com/syncthing/syncthing/releases/tag/v1.27.6-rc.1 @digital-spinner I'd suggest to try the RC to see if you can still reprodcue. Generally for this "kind of reproducer" (run in a large production system), I think logging is more likely to be helpful than a debugger (there's of course a debugger for go too). As for where to look: That's problematic, because it can be basically anywhere given how elusive it is. The forum thread linked above gives a good example: indexhandler.go is where the additional logging went, as that's a central piece of functionality to exchange index information. The potential fix is in the DB. However it might also concievable be in lib/protocol or many places in lib/model. The hard thing here is not go, it's the domain (I went from go to .NET, went fine, I think the other way around should be even easier). |
I have just updated my instances to 1.27.6 and it works well now ! |
@imsodin thank you, I'm going to test it asap, maybe even today at night, because my sync takes few hours and I need to setup fresh vm for testing on my homelab. |
I saw there is a 1.27.6 version update in place already. I have upgraded my devices accordingly and.... No luck. IDK what is going on.. Also there is no info in the log that puller is not getting any files.. but it should be. |
I think during last synchronization it started to pouring files into a new directories with names of folder ID instead of the name it should, still IDK what is going on here, but I'm going to run it again. |
Another run and it just stopped pumping - view from the device that I'm resyncing. Other deices also report no active transfers. I have a lot of those in the log.. Update: it looks like it started to sync again.. Well let me see what it does. |
Now it went to a crawl speed, when usually it was not respecting the speed limits going the first time. CPU usage is almost nonexistent. There are no new log entries despite those that it finished initial scans and some other sharing chit-chat. It is maxing at 6 MiB/sec almost doing nothing. It should go full speed but it is not. I leave it here. I have a backup so I don't care if it damages something. I wonder if I should recreate all my network from scratch if this kind of things started happening. .... restart from UI crashed / stalled it completely, I restarted the client device and it is going on again at full speed (ignoring speed limits). .... (few minutes later) my linux machine gets disconnected often which was not happening with previous version and Syncthing is randomly stuck on sync also (some of those ends up of disconnection of everything with that machine (xrdp, ssh dies)).. I'm going to create a fresh Windows machine for testing I guess on different env. So far I think it is much worse than it was before with older version. I never had a connection issues of any kind before. |
I was super annoyed with the issues I got as described and I have recreated the env with fewer devices with fresh 3 instances of Syncthing (I have wiped the configuration directories) with the same payload about 600 GiB of data and guess what? Again, the synchronization got stuck. But this time at least it didn't broke my connection (not once) with the 3rd device (which is actually the same as previously but with wiped Syncthing configuration). So no, for me issue still exist as it was before. It super annoys me but I have no time currently to dig into GO LANG world to see what is going on by myself. I guess having the smaller payload will not generate this issue so maybe many people can live with it, but apparently not me as it looks like. |
I did some more testing with fresh instances. What I have noticed the Syncthing is more stable and not causing so many syncing issues when I decreased the devices performance - testing on the VM I just simply decreased the cores amount just to 2 and RAM to 2GB on linux machine and it got synced easier (just one resetup / resync was needed instead of many previously with 6 CPU cores). Another thing is that if the syncing directories are smaller I guess it is also more stable. Anyway the issue persist and it is worse than before IMO. |
It seems I have the same issue. Only one of my 3 devices claims syncing issues. Still the files were transferred correctly. |
I think I'm seeing the same issue. 3 devices (2 Linux, 1 Windows 11) all running V.1.27.6. Each device says it is synced and up to date, but one device constantly shows another device as "Syncing (50%, 8.22GiB)". I observe that the description of the out of sync remote says: If I click on the items link, the "Out of Sync Items" window is empty. Perhaps this is a clue. |
I've noticed the same issue: although all folders appear to be fully synchronized, not all remote devices are (version 1.27.6 for all devices). On my Raspberry Pi (running Linux 64-bit ARM), configured as a "Receive encrypted" folder type, problems arise between this Raspberry Pi and other devices (Windows 11/Android/Linux), as well as from other devices to this Raspberry Pi. However, other devices do not encounter this issue. It seems that the "Receive encrypted" mode is responsible for this malfunction, as I haven't noticed any problems in my case when this mode isn't activated. Additionally, the number of unsynchronized files between device A and the Raspberry Pi differs from that between the Raspberry Pi and device A, and all encounter connection problems with the Raspberry Pi and vice versa. |
I will add that doing a rescan on the folder or restarting Syncthing does not help. However, running Syncthing with the "--reset-database" parameter clears the error. |
Just adding my 2¢ to the issue, I have the same, syncing between multiple machines, one Win10 (Machine A) and one Win11 (Machine B), whilst having an Ubuntu (Machine C) and another Win11 (Machine D) in the following setup:
On A it says that B is out of sync by 290B (2 files, both temporary files already deleted; I really should have excluded them...) Starting syncthing on both A and B with |
What happened?
I was syncing a new device (Windows fresh install) to my pool of devices, and synchronization went well but at the end on that new device Syncthing shows 26% syncing progress and it is completely stuck since 12h+. All other devices show status synced (up to date). The checksum of all files within multiple directories match, so it means that desired content is properly synced, but Syncthing says something else. I tried rescaning, restarting etc. nothing changed. It is just completely stuck.
In the log I see a lot of those entries related to the affected directories and device reporting stuck syncing:
I think the same thing happen to me we another device few hours before, but I was not paying sufficient attention and I just recreated sync pair and resynced one affected folder. But then data transfer was not completed and half of the files were missing on the newest device. Now I have 3 affected folders and a lot of more data which is actually synced.
Also what is worth to mention, there should be some kind of better error message maybe, because it is not possible to reattach missing source device because there is no more, plus only viable info can be found buried in log files.
More info on previous sync stuck problem: https://forum.syncthing.net/t/missing-half-of-the-photo-files-but-the-status-shows-100-completed-on-src-and-target/21841/3
Syncthing version
v1.27.4
Platform & operating system
Windows 11 PRO x64, Linux Ubuntu 22 x64
Browser version
Firefox 124.0.1 (64-bit)
Relevant log output
The text was updated successfully, but these errors were encountered: