Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages can take hours to be received #6644

Closed
jeremy447 opened this issue May 15, 2017 · 32 comments
Closed

Messages can take hours to be received #6644

jeremy447 opened this issue May 15, 2017 · 32 comments

Comments

@jeremy447
Copy link

Bug description

Messages can take hours to be received. Note that I don't have google services.

Steps to reproduce

  • Someone send me a message

Actual result:

  • I receive it minutes or hours later

Expected result:

  • I should receive it immediately

Device info

Device: Motorola Razr i
Android version: 4.4 (CM11)
Signal version: 4.5.3

Link to debug log

Don't have one now but can provide it if needed.

Thanks !

@moxie0
Copy link
Contributor

moxie0 commented May 15, 2017

When you installed did it warn you about battery life and all that, or did you remove play services after already having Signal installed? Does the background service appear to be running?

@jeremy447
Copy link
Author

jeremy447 commented May 15, 2017

It didn't warn of anything. I never had play services on my phone since I installed CM11.
I have the notification (drop down info) that say the background connexion is active.

@micressor
Copy link

micressor commented May 16, 2017

@Paviluf I have something similar. Since one of our phones has 4.6.0 installed. The other phone is still on 4.5.3. As a workaround I had to disable/enable mobile data. After that, messages comes immediately. I see the drop down info, that signals background service is running.

Device info

Device: Fairphone 2
Android version: 5.1 (without play services)
Signal version: 4.5.3 and 4.6.0

@ghost
Copy link

ghost commented May 16, 2017

I've been having a similar issue lately. Granted my setup is a tad different.

Device Info

Device: Nexus 5X
Android version: 7.1.2 (CopperheadOS)
Signal version: 4.5.3 (Noise)

If I open running services under developer options and kill all Signal(Noise) related services, then re-open the app, I receive all my missed notifications. I'll try the disable/enable mobile data option next time it happens. I have the app "not optimized" like it should be as well, because I do not have Google Play Services installed.

@moxie0
Copy link
Contributor

moxie0 commented May 17, 2017

It sounds like your device is killing the process. Not sure if there's anything we can do beyond run a foreground service, this is really what GCM is for.

If you're running Noise, that's not our software. You'll have to contact the Noise developers for support.

@micressor
Copy link

I sent a message on Monday at 18:42 which was successfully delivered. See checkmark on the right side. The last message on screenshot I received this morning (Wednesday) with the date of Monday morning 06:39.

Impact: In my case, this happens 1 time every few days with a contact.
Expected result: The last message should arrive no later than Monday 18:42.
Workaround: Disable/Enable mobile data on both side.

In this scenario, I can not prove that the background service of Signal ran on the other side.

messages-can-take-hours-to-be-received-1

@ghost
Copy link

ghost commented May 17, 2017

Also, I may be wrong, but for my situation, it seems to happen once a day around the time I switch from 4G to 3G due to the location of my work. When sending a message I only receive one check (same if others send messages to me), until it finally completes sending which may take hours. It's not exactly when I switch network type, but a little after. Once again this could be completely unrelated as I haven't tested it much myself. Just an observation.

If you're running Noise, that's not our software. You'll have to contact the Noise developers for support.

As a side note, Noise should simply be a re-branding of Signal with 100% the same code. That's the impression I'm under.

@moxie0
Copy link
Contributor

moxie0 commented May 17, 2017

@KehrBehr It is not. I would encourage you to install Signal if you want support here, otherwise please contact the Noise developers.

@paride
Copy link

paride commented May 17, 2017

@moxie0 unless you know something that's not public, Noise is just Signal rebranded with sed and a few s/Signal/Noise lines:

https://github.com/copperhead/Noise/blob/4.5.3/replace.sh

It exists only to avoid violating the trademark.

This is just for your information, in case you didn't know. I perfectly understand your point about not supporting unofficial products.

@paride
Copy link

paride commented May 17, 2017

@moxie0 I'll just add that if you want to see Noise disappear you'll just have to explicitly allow the redistribution of the official Signal binary package, at least of the "website" flavor. I think @thestinger would be happy to stop building Noise and distribute that instead.

@thestinger
Copy link

thestinger commented May 17, 2017

It's really the non-website one that's needed without the self-update mechanism, to instead have the automatic updates via F-Droid and not needing an unknown source enabled. It doesn't really matter. I'll probably grow disinterested and stop building it anyway. It originally existed for a more compelling reason.

@sigenc
Copy link

sigenc commented May 18, 2017

@thestinger pls twitter about it when you stop

@ghost
Copy link

ghost commented May 18, 2017

@moxie0 Update: I removed Noise and installed the Signal apk from your website. The issue still occurred. As I was saying above, it seems to occur when I was switching network types. I went from 4G LTE -> intermittent service for a few minutes (no service to few bars of 3G to no service) -> steady 3G/3.5G (HSDPA). I'm curious if in the time where I'm having intermittent service issues it somehow loses a steady connection? Either way I did what @micressor said and just disabled/enabled mobile data and it fixed the issue instantly. All my signal messages that were hung up on 1 check mark sent right away and I received the ones I missed.

@sigenc
Copy link

sigenc commented May 18, 2017

@KehrBehr Post a log pls

@ghost
Copy link

ghost commented May 18, 2017

@sigenc This good? It happened about 1 - 1.5 hours ago from the time of this post. If those time stamps are correct, then I think the issue happened during the time span of this logfile.

https://gist.github.com/d7ddf7339b300200b8b4a374479b8afd

@moxie0
Copy link
Contributor

moxie0 commented May 19, 2017

@KehrBehr Thanks, where in that log is the time period where you're not receiving messages? From the looks of the log you're connected and sending/receiving keepalives for most of the time.

@ghost
Copy link

ghost commented May 19, 2017

@moxie0 I'd say the whole transition of network types would be between the first first log (06:24:29.628) to around 06:40. I sent a test signal message to one of my contacts at 06:38, and it only gave the one check mark, so I waited a little, turned mobile on/off, and it gave me the double check mark right after that. Around this time I think I also received a message, probably around 6:35 or something, and I didn't receive it until I refreshed the mobile data.

I'll start keeping an eye on it more, and If I can, send another log where I can detail when things were actually happening.

Thanks!

@ghost
Copy link

ghost commented May 23, 2017

@moxie0 Just an update. The issue occurred again. The timeline is as below

06:32 - 06:38 > Intermittent data connection
06:39 > Sent signal message to friend A, only 1 check.
06:41 > Friend A responded. Did not receive response.
06:44 > Still one check on my sent message, refreshed data connection. Received response friend A sent.

After refreshing my connection my message now had the double check and i received the response message that was sent to me at 6:41. It seems the person I sent the Signal message to did receive the message when I initially sent it, but I didn't receive their response until a data refresh.

https://gist.github.com/9ea84f49598b61d297226a30e1654dae

@moxie0
Copy link
Contributor

moxie0 commented May 23, 2017

@KehrBehr Thanks but it looks like this log doesn't go back far enough =/

@ghost
Copy link

ghost commented May 23, 2017

@moxie0 Can you elaborate? The timestamps I was referring to are all listed in the log I posted. I submitted the log right after the issue occurred as well. What else should I be doing to generate a better log for your guys?

Edit: Also, may be unrelated, but I'm posting just in case it may be.

I just received a message half an hour late. I was sent the message around 14:00, and didn't receive it until around 14:34. The log seems to have some exceptions that were thrown.

https://gist.github.com/469798d6ba3d68260114c95239e4c5c5

@christianfl
Copy link

Can confirm the issue.

Wiped my phone, installed a ROM without Play services and installed the Signal web build. It asked me to turn off the battery optimization for Signal which I confirmed. So there is a permanent notification for the background service.

Somethimes I get no new messages and outgoing messages only show one checkmark. This often happens after changing from WIFI to 3G or the other way around. Most of the time, I only get new messages after force closing Signal and opening it again. The issue happens about every 1-2 days.

I have a debug log attached. I sent my last message at 14:14 (which can't be seen in the debug log). I got no new messages until I changed from 3G to WIFI and then opened Signal at 16:17. Then a few messages come (E.g. one from 14:25 and on from 15:50).

Complete Debug log:

https://gist.github.com/christianfl/81764d624765de2e601cc0955300e56b

Hope that helps a bit.

moxie0 added a commit that referenced this issue Jun 2, 2017
@moxie0
Copy link
Contributor

moxie0 commented Jun 2, 2017

I've added additional debug information for the next release. It looks like something is killing the thread, either the OS or some uncaught exception. I suspect the OS, but I've added an uncaught exception handler that should print additional info. Please post a debug log if this reoccurs after the next release, thanks!

@timotk
Copy link

timotk commented Jun 26, 2017

Same as #6447

@jeremy447
Copy link
Author

Any news with the additional debug information ?

@haffenloher
Copy link
Contributor

@Paviluf Looks like no one posted a log with the additional info yet (i.e. a log from Signal v4.7.0 or later).

@ghost
Copy link

ghost commented Jul 16, 2017

I know @moxie0 said he only added additional debug info, but I haven't been experiencing the problem since then. Not sure what has changed on my end, but I've been planning on uploading a log if the issue occurs. It just hasn't yet for me.

@lefcha
Copy link

lefcha commented Oct 2, 2017

I think the problem I have is related to this one so I'm not opening a new issue. This is on the latest Lineage OS 14.1 (Android 7.1.2) and latest version of Signal (4.10.12) from the website, as I don't have GApps and thus no GCM. The phone is a Motorola Moto G 3rd gen (2015).

What happens is that a few minutes after the phone is left locked, it's not possible to receive any messages or calls (I am testing this with another phone that has Signal from Google Play installed). If I pick up the phone, I then receive all the messages or lost calls, and they seem to have been sent a lot earlier (like minutes or hours) according to the time-stamps. Sometimes I receive the messages without picking up the phone, but a lot later like half an hour or more. The phone has a strong WiFi connection all this time.

From the logs it seems Signals keeps the web socket alive for some time, but then an exception happens, and Signal fails to reconnect or setup the web socket again, until I pick the phone, when it establishes a connection again. Here's the debug log from when I started Signal, and left the phone for a few minutes, but I can provide longer logs if needed:

https://gist.github.com/lefcha/9790f9be45158e167aed72c91a10fbb8

I have tried Signal on both the stock Android on this same phone, and on Lineage OS with GApps (so both Signal using GCM) and I didn't observe this behavior, as Signal seemed to receive the messages and calls instantly.

Also, something else, when I connected a USB cable to see the logs as they were printed with adb's logcat, Signal seemed to never lose the web socket connection, as it would manage to keep it alive continuously, and any messages or calls were received with no problems.

I have also set Signal to not be optimized for battery as suggested in the app, I have verified that myself, too, and also another application I have on the phone that keeps a connection open, the K9 Mail, which is setup to use an IMAP IDLE connection for being notified on new messages by the server. It seems to not have any problems and receives any new mail instantly (verified with Thunderbird on the desktop), but only after I also set it as not optimized for battery.

Hopefully, this issue is some bug somewhere, and not the default OS behavior, ie. something that can't be easily fixed...

@moxie0
Copy link
Contributor

moxie0 commented Oct 2, 2017

@lefcha Sounds a lot like Doze to me. We're definitely not doing anything in the app to detect device motion, only the OS would be doing that.

@lefcha
Copy link

lefcha commented Oct 3, 2017

@moxie0 I also was afraid that Doze would be the culprit, and I started looking for some option or way to completely disable Doze on Lineage OS, but...

I then tried to debug this more, after reading the following documentation and articles (no experience with Android development, so hopefully that's the right and latest documentation):

https://developer.android.com/training/monitoring-device-state/doze-standby.html
https://developer.android.com/about/versions/nougat/android-7.0-changes.html
https://www.bignerdranch.com/blog/diving-into-doze-mode-for-developers/

So I connected a USB cable, then I did adb shell dumpsys battery unplug otherwise Doze would not kick in due to the phone being charged. I also verified that Signal is in the white-list with adb shell dumpsys deviceidle whitelist and I can see there user,org.thoughtcrime.securesms,10071.

Then I stepped the phone in Light-Doze mode with adb shell dumpsys deviceidle step light, and adb shell dumpsys deviceidle get light confirmed it's in IDLE state. Signal seemed to be doing fine, so no exceptions, and I could see it sending keep alive every 55 secs or so. Also messages were received instantly from the device.

I then tried to step the phone in Deep-Doze mode by running adb shell dumpsys deviceidle step deep 4 times, so the phone was then in IDLE Deep-Doze (and OVERRIDE Light-Doze). Signal still seemed to be doing fine, like before, sending keep alive and receiving messages instantly.

I then opened the screen to get the phone out of the Doze mode, reset the battery, removed the cable, and left the phone alone. After 2 mins I sent a message to Signal from another phone, and the message was not received. I picked up the phone, message is received at that time, and from the logs I can see the java.io.EOFException I mentioned in my previous message when Signal failed to send keep alive.

One last thing I tried was to disable completely Doze mode on the phone with adb shell dumpsys deviceidle disable all, and then again unplug the cable, and leave the phone on the desk. After 2 mins I send a message from the other phone, and it was again not received by my phone. My thinking is that possibly Doze mode is not the cause of this...

So what goes here? Why does Signal work fine when the cable is connected and I'm debugging it (even though I am not charging it), and also works fine in both Doze modes, but it fails when I leave it on the desk without a cable plugged? And the fact that it seems to fail even when Doze is completely disabled on the phone, could it mean there's some other issue here?

Let me know if I can do something else to debug this further...

@dpapavas
Copy link

dpapavas commented Oct 9, 2017

I've looked into this, together with @lefcha and we've arrived at the following conclusions: The problem doesn't seem to be connected to doze per se, but with device sleep (assuming they're unrelated, as finding precise documentation seems to be difficult).

As can be seen in the documentation on SystemClock.uptimeMillis(), this clock stops ticking as soon as the device enters sleep (say because the user pushed the power button). Furthermore all timed thread blocking functions, such as Thread.sleep() and Object.wait() use this clock so that, as soon as the device falls asleep, all KeepAliveSender threads, which blocked via Thread.sleep() will never resume, until the device wakes up. This results to keep-alives being missed and the connection closing.

The results, as far as I can see, is that a minute or so after the device is put to sleep, the connection is lost and the socket is closed, throwing an EOFException soon afterwards. It is then replaced with a new socket and the cycle repeats. You'd expect any pending messages to arrive with a maximum delay of about a minute in that scenario, as the underlying sockets work, until they time out, but it seems like the connection is lost until the device wakes up, for some reason. I suspect it may have something to do with the Util.wait() call in readRequest, which would also hang as soon as the device enters sleep, but further investigation is required.

In any case, the situation seems to be the same with any other pure-Java (meaning not Android-specific) timing mechanisms I've tried (such as Object.wait(), ScheduledExecutorService, etc.), as they all rely on the same clock. We would need some way to block based on the elapsedRealtime() clock, but there seems to be nothing to make this possible. We have therefore turned to another alternative, which relies on the AlarmManager to schedule the keep-alives, which seems to work fine. We'll try it out for a couple of days and create a pull request so that it can be reviewed, if no problems arise.

Disclaimer: I have, personally, limited experience with Java, Android and the Signal codebase, so that some or all of the above may be inaccurate, or plain silly. Sorry in advance, if that proves to be the case.

dpapavas pushed a commit to dpapavas/Signal-Android that referenced this issue Oct 15, 2017
Added an AlarmManager alarm, to periodically "prod" the
SignalServiceMessagePipe, waking any threads that may be
stuck in blocking wait, while the device sleeping.

Fixes signalapp#6644
// FREEBIE
@dpapavas
Copy link

I have created a couple of PRs, with a proposed fix for the issue. Please refer to them for more details on both the problem and the proposed solution.

dpapavas pushed a commit to dpapavas/Signal-Android that referenced this issue Oct 28, 2017
Added an AlarmManager alarm, to periodically "prod" the
SignalServiceMessagePipe, waking any threads that may be
stuck in blocking wait, while the device sleeping.

Fixes signalapp#6644
// FREEBIE
dpapavas pushed a commit to dpapavas/Signal-Android that referenced this issue Jan 21, 2018
Added an AlarmManager alarm, to periodically "prod" the
SignalServiceMessagePipe, waking any threads that may be
stuck in blocking wait, while the device sleeping.

Fixes signalapp#6644
// FREEBIE
rkohrt pushed a commit to rkohrt/Signal-Android that referenced this issue Feb 3, 2018
@automated-signal
Copy link

GitHub Issue Cleanup:
See #7598 for more information.

@signalapp signalapp locked and limited conversation to collaborators Apr 3, 2018
dpapavas pushed a commit to dpapavas/Signal-Android that referenced this issue Apr 10, 2018
Use the WebSocketAlarm, to periodically wake the WebSocket and with
it, any threads that may be stuck in blocking wait, while the device
is sleeping.

Fixes signalapp#6644
// FREEBIE
dpapavas pushed a commit to dpapavas/Signal-Android that referenced this issue Apr 23, 2018
Specify sleep timing method for the websocket's keep-alive sender,
based on whether GCM is disabled.

Fixes signalapp#6644
// FREEBIE
dpapavas pushed a commit to dpapavas/Signal-Android that referenced this issue Jul 11, 2018
Specify sleep timing method for the websocket's keep-alive sender,
based on whether GCM is disabled.

Fixes signalapp#6644
// FREEBIE
greyson-signal added a commit that referenced this issue Sep 15, 2018
This also fixes the situation where we block group-leave messages,
preventing blocked contacts from leaving groups.

Fixes #7970

Also, this forced us to upgrade libsignal-service, which fixes the
websocket timeout issues. Thanks to @dpapavas!

Fixes #6644
greyson-signal added a commit that referenced this issue Sep 18, 2018
This also fixes the situation where we block group-leave messages,
preventing blocked contacts from leaving groups.

Fixes #7970

Also, this forced us to upgrade libsignal-service, which fixes the
websocket timeout issues. Thanks to @dpapavas!

Fixes #6644
ksander314 pushed a commit to ksander314/Signal-Android that referenced this issue Apr 8, 2019
This also fixes the situation where we block group-leave messages,
preventing blocked contacts from leaving groups.

Fixes signalapp#7970

Also, this forced us to upgrade libsignal-service, which fixes the
websocket timeout issues. Thanks to @dpapavas!

Fixes signalapp#6644
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.