Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and recycle broken connections on non-GCM devices #8230

Closed
wants to merge 2 commits into from

Conversation

dpapavas
Copy link

@dpapavas dpapavas commented Sep 30, 2018

Contributor checklist

  • Xiaomi Mi 4, Android 7.1.2 (LineageOS 14.1, w/o GCM support)
  • Motorola Moto G3, Android 7.1.2 (LineageOS 14.1, w/o GCM support)
  • My contribution is fully baked and ready to be merged as is
  • I ensure that all the open issues my contribution fixes are mentioned in the commit message of my first commit using the Fixes #1234 syntax

Description

For a detailed description of the problem and proposed fix, see the accompanying PR signalapp/libsignal-service-java#62. I'm fairly certain that this fixes #7638, but the supplied log doesn't allow me to confirm this. I therefore didn't mention it in the commit message.

@dpapavas
Copy link
Author

This likely also fixes #7966 but again, the posted logs are inconclusive.

@Socob
Copy link

Socob commented Oct 1, 2018

Thank you so much for your continued work on this! Let’s just hope it doesn’t take months to get merged again.

(I apologize for making this “pointless” comment.)

@greyson-signal
Copy link
Contributor

FYI MessageRetrievalService is likely going to go through some changes during our process to move our targetSdk to 26. We won't be looking at this until that process is done. Thanks!

@dpapavas
Copy link
Author

dpapavas commented Oct 1, 2018

Thanks for letting me know. I know you have your schedule and priorities, but please consider reviewing the PRs before making any changes to MessageRetrievalService, so that the changes will take the functionality introduced by the patch into account (assuming it is merged at all of course). My changes shouldn't create any problems with respect to API level (in fact, I've added a special code path to address the deprecation of the CONNECTIVITY_ACTION broadcast in level 28), but if MessageRetrievalService is changed, it'll require another rewrite of this patch (which will be the third) plus the necessary testing, which is actually worse.

@greyson-signal
Copy link
Contributor

I'm going to be upfront and say that the highest priority for me is making sure existing things work on targetSdk 26. I'm on a tight timeline. I know you've put effort into this, and I want you to know that the effort is appreciated, we just can't look at it until afterwards. Who knows, maybe the underlying problem you're trying to address will be fixed along the way.

@mejo-
Copy link

mejo- commented Nov 9, 2018

@greyson-signal: just to let you know, the issues with dangling net sockets on non-GCM devices still exist with most recent Signal (4.30.2), so the changes related to targetSdk 26 didn't change anything here.

@dpapavas
Copy link
Author

I'm not sure if the "process to move the targetSdk to 26", that was keeping this PR on hold, is complete or not, but I'll soon be forced to upgrade my patched Signal app, which is based on this, so I rebased this PR on top of the latest sources. A lot seems to have changed; the MessageRetrievalService class is not there any more and there seem to be two message pipes now. I'll be honest: I haven't really investigated what prompted these changes, what purpose they serve and whether my patch would need to be changed somehow in view of them. I've just rebased, more or less mechanically and tested the result.

As it turns out, the two message pipes broke my previous patch so that now Signal probably doesn't work at all again on non-GCM devices. You seem to already have discovered that, as it's mentioned on #8402. The alarm functionality needed to keep the keep-alive pings going on non-GCM devices, as it is now, i.e. in the form of the RealtimeSleepTimer class, is not really in the best position to gracefully handle multiple pipes (or even one for that matter), but it was specifically requested that way by moxie0, so I've patched it as best I could. Now, it'll essentially use one alarm for both pipes, which should also avoid keeping the phone awake more than absolutely necessary.

All I can say about the PR now, is that after testing for a week or so on my phone, it seems to work well enough. I'm not sure if it works as well as it used to; perhaps it doesn't respond as quickly as it did before, but I haven't ever been left without a connection for more than a minute or so (probably thanks to the secondary recycle mechanism mentioned in signalapp/libsignal-service-java#62). I understand all this is less than ideal, but given past experience, I estimate the chances of this PR getting merged are rapidly approaching zero anyway, so I couldn't justify spending any more time and effort on this, than absolutely necessary. The fact that the issue this PR attempted to fix (#7638), has just been closed, even though the issue wasn't fixed, the patch that fixed it was, more or less, ignored and there was, in fact, a regression making Signal completely unusable on non-GCM devices again, only serves to confirm this impression.

@mejo-
Copy link

mejo- commented Dec 15, 2018

FYI MessageRetrievalService is likely going to go through some changes during our process to move our targetSdk to 26. We won't be looking at this until that process is done. Thanks!

@greyson-signal, would you mind writing a short comment on whether you intend to take a look now that the targetSdk migration is done?

@Diapolo
Copy link

Diapolo commented Dec 18, 2018

@greyson-signal This needs to be looked at ASAP, currently I often need to force-quit Signal to get new messages 👎. I'm running LOS 15.1 and my phone i GAPPS-free.

@simonschmeisser
Copy link

@greyson-signal any news?

@five-c-d
Copy link

five-c-d commented May 7, 2019

Folks, please stop pinging signal-foundation devs, Greyson needs to HAVE time in order to review any PRs, and every "+1 metoo" ping eats up a little more of that time...

yes, this is a painful situation for all, but please be constructive in helping mutually get where we all want to be: actual problem actually fixed

...which he cannot spare, he is under enough time-pressure already (Google always revising playStore rules and Whatsapp pouring on endless resources and internal Signal Foundation constraints and quite literally hundreds of other open github-issues, several of which are equally-or-greater in terms of severity/impact). Crash-bugs, for example: https://github.com/signalapp/Signal-Android/issues?q=is%3Aissue+sort%3Aupdated-desc+is%3Aopen+crash which are not merely an unhappy user-experience, but also a security-hole-waiting-to-happen (potentially at least ... if an adversary can send a crafted message that causes the crash in question, and parlay that into a remote arbitrary execution exploit, they would get access to all non-disappeared chat-history and all the on-device metadata and so on).

I feel your pain, this 8230 stuff is a supremely annoying bug for people that experience it: you have spent a lot of time and effort removing the stock ROM from your google-device to replace it with LineageOS or equivalent, and then spent a lot MORE time and effort removing playSvcs and gApps and whatnot from your device. Signalapp mostly-kinda-works, still, even on that highly-privacy-conscious type of extreme setup. But it is not really-actually-working, on certain flavors at least.

8230 pull-request is still open, but as you can see from the volunteer developer's final comment in December 2015, it is NOT ready-to-be-upmerged anymore. Indeed, "this branch has conflicts that must be resolved" now... it flat out cannot be upmerged in the present state, any more. This is an unfortunate situation for all concerned. Greyson cannot accept the PR at this point (making it pointless to code-review it again), Dpapavas is discouraged by the process-flaws, and endusers are unable to get their highly-privacy-conscious setups to reliably use the best-in-class private messenger, signalapp.

Instead of futilely commenting here in 8230, asking for an ETA or news or whatever, which are unlikely to ever occur as comments upthread make clear, please visit the discussion linked below, where an attempt to revive 8230 in a new form is being made. We need people that are impacted by the problem, to please download the April test-APK, and then upload a pair of debuglogs (one debuglog from the current signal4android codebase and another debuglog from the same handset whilst running the patched-APK that tries to solve the broken-connection problem). https://community.signalusers.org/t/call-for-testing-fixing-realtimesleeptimer-to-work-in-concurrent-threads/7189 You "should" be able to login there, the unofficial but highly-active forums, with your existing github credentials ...though at least one person said that single-sign-on was broken for them, a couple months ago?

I don't have a LineageOS-compatible handset, so I'm unable to personally verify the test-APK solves 8230, but I strongly suspect it may. If it doesn't quite, I also strongly suspect it can be improved to fully solve the bug. Every comment here in github on 8230 emails 700+ repo-watchers, people we want to keep watching the repo, since that is what helps prevent any serious privacy-flaws or security-flaws from getting pushed live. It also lowers the chances of serious bugs... but as 8230 clearly demonstrates, does not eliminate that possibility, especially on non-stock-ROMs and especiallyEspecially upon non-stock-sans-gApps-configurations thereof.

Point being, at the end of the day...

I want to see the problem solved, just like you (anybody reading all the way to the bottom of 8230 badly wants this fixed), but the WAY to solve the problem is to help push an active PR over the finish-line, please. Again == https://community.signalusers.org/t/call-for-testing-fixing-realtimesleeptimer-to-work-in-concurrent-threads/7189 , to which conversation dpapavas has already been pinged ("one ping only" just as the famous Sean Connery movie-line says)

@dpapavas
Copy link
Author

dpapavas commented May 7, 2019

Although I'm seeing the call for testing for the first time (probably since there's no mention of me), I did, in fact, see the related patch. Note that the patch in question is not the fix under discussion in this thread. It is a fix for a previous patch of mine, which, as predicted, was broken during the "move to targetSdk 26" mentioned here.

So, just to put things in perspective, the situation is: half a patch was applied after a year of efforts (not only to create it, but mostly to get it applied) then the other half was pending for long enough for the first half to get broken again. Whether this seems like something that could eventually lead to some sort of successful outcome is a matter of interpretation, but one thing can be stated more or less objectively I think: patches take pretty long to apply, so the more separate patches you have the longer you'll have to wait and the greater the chance that some other problem will have cropped up by then.

Now the fix in the call-for-testing link above is already bundled in the present PR. It was duplicated as a separate PR apparently because I "tried to patch too many things at once, so the patch is difficult to apply because there are a lot of changes". Well, while that is a legitimate view in general, under the present circumstances, described above, splitting up a fix into multiple PRs would seem counterproductive.

Furthermore, the patch applied fine when I created it more than four months ago and one of the reasons it was bundled into this PR, was that it would take significantly less effort on my part, effort which I expected to be wasted after the patch was ignored. I feel that four months are enough to conclude that I was right in my expectation. While it currently doesn't apply, I have made the necessary changes, but haven't bothered to push them here, as there's evidently no point. As far as I can see, the problem lies in the fact that the patches get ignored, not in that they don't apply after four months.

TL;DR As I have already stated, I can't see how one could reasonably expect the situation with Signal on non-GCM devices to ever improve, given the history (which includes more than just my efforts), but even assuming there is a way, I wouldn't think splitting up the required fixes into multiple, overlapping patches is a step in the right direction.

@five-c-d
Copy link

five-c-d commented May 7, 2019

but haven't bothered to push them here

Well, can you please push them somewhere? https://github.com/dpapavas/Signal-Android was last modified 2017. I will absolutely take a look, because I want this fixed.

as there's evidently no point. As far as I can see, the problem lies in the fact that the patches get ignored

Yes this is a process-problem. But it is not a people-problem, Greyson is not doing it to be rude, he is just, swamped and unable to test on this kind of config since his Signal-Foundation-provided test-handsets do not include a sans-gApps variant. (He doesn't have time to add such a test-handset either... he is already out of time.)

not in that they don't apply after four months

Yes, correct. I'm not saying you are to blame -- I'm just trying to get people to stop commenting here, asking for updates and ETAs, when those are not going to be forthcoming. Neither yourself nor Greyson is planning to move this 8230 forwards, at present. So no point in pinging a bunch of repo-watchers. (8230 is still open because plans might change, but I don't want it locked-and-limited-to-contributors due to thread-bumping.) https://github.com/signalapp/Signal-Android/blob/master/CONTRIBUTING.md#dont-bump-issues

even assuming there is a way, I wouldn't think splitting up the required fixes into multiple, overlapping patches is a step in the right direction

Well, I have reason to believe this scheme CAN be made to work -- in a nutshell that a series of small tight-focus PRs that each incrementally improves things will be easier for Greyson to review quickly/promptly -- but per above, I would rather explain myself over in signalUsers, since here on github I am emailing 700 repo-watchers, three times today :-)

Although I'm seeing the call for testing for the first time (probably since there's no mention of me), I did, in fact, see the related patch

Yes, that is what I meant by "one ping only" ... @theBoatman and I discussed whether to try and bother you in the forums or not, or to comment in 8230, or whatever, and decided you were probably discouraged enough that we should just ping you in the pr#70, just in case you had not lost all interest.

I can't see how one could reasonably expect the situation with Signal on non-GCM devices to ever improve

I have some ideas. Some of them are probably hair-brained ideas, which won't work in practice. But I think it can be done, in a way that makes life mutually comfy/productive/pleasant for both Greyson and also for volunteer programmers wanting to see their efforts upmerged. I will edit this comment with a link, later, once the new stuff being cooked up, is posted publicly.








edit: as of July 2019, the ideas are still half-finished, but perhaps not half-baked? ;-) You can read all about sigX+sigGesT, over in the signalUsers forum ==

I would strongly prefer not to talk here in 8230 (here is better -- github login credentials are accepted by the forum-software). This is partly, because general discussion is discouraged for the signalapp github repo -- but moreover, even discussion of sigX+sigGesT is mostly off-topic for this 8230 pull-request, as well. With the exception, that there is a "partial-fix-of-the-first-half" (as you call it @dpapavas in the next comment) baked into libsigsvc4j via myself on behalf of theBoatman, that is then compiled into sig4a along with a bunch of other eXperiments.

some thoughts on circumstantial situations

the problem lies in the fact that the patches get ignored...

Yes this is a process-problem. But it is not a people-problem...

casting it as a "process-problem" makes it seem like
it's just a matter of circumstance, or misunderstanding

There are two causes methinks: if the code in any given PR is complex or (even potentially) adds risk, it is time-consuming to code-review it.

Signal Foundation is just a small team (though larger now than in 2018), and have to ruthlessly triage their time -- THAT hard fact is not different than in 2018, for sure.

whether it's just too little time or [programmer teamsize]...
resources are always limited, making it necessary to establish priorities...
getting Signal to run without GCM is [apparently] not a priority

I would not disagree, that other things got higher priority assigned. The paid devs have priorities, and decide what they have time and devs to "officially support" ... plus at a higher level, Signal Foundation people have to decide how many devs to hire, and what projects to assign them unto. Where money is spent, basically. This does not happen in a vacuum though: signalapp is competing against billionaire hypercorps and major governments, who WOULD be very happy to see signalapp crushed and made helpless, PLUS are actively in pursuit of legislative and/or software-industry attack vectors to accomplish that.

So in a nutshell... I believe it is a matter of circumstances. We don't want Signal Foundation bloated with hundreds of paid devs, and thousands of paid SQA professionals: that is too much money, too much risk, it will result in a permanent gotta-fundraise-more situation developing, exceedingly dangerous to the long-term viability of Signal. But if the teamsize does NOT bloat up with a massive headcount... there will literally never be enough time to get sans-gApps-LineageOS and microG and boringPhone and librem5 and XPrivacyLua and all the other interesting-but-esoteric tools of the ultra-paranoid ultra-freedom-loving enduser properly working and properly tested. Unless we come up with some way to build a viable libre-community around signalapp again, that is. See above.

Anyways, sorry to reply to the NEXT comment, in my edit to THIS comment. And since I do really not want to keep cluttering 8230 with this meta-discussion about process-problems (or the meta-meta-discussion about whether there really IS a process-problem rather than just flat lack of interest :-) ...probably best if I just collapse this portion away.

p.s. Despite not wanting to discuss HERE in signalapp github-repos, I'm happy to chat further about these matters, I find them interesting. But since I don't wanna go offtopic in 8230 plus whilst emailing 700 repo-watchers every time I comment either. You can please get to me over on signalUsers if you like, via DM or via https://community.signalUsers.org/t/Quick-questions-on-sigX-sigGesT-all-q-a-welcome/8498 , or if those are not acceptable just ping me in my github.com/five-c-d repo by creating an issue there or something :-)

@dpapavas
Copy link
Author

Well, can you please push them somewhere? https://github.com/dpapavas/Signal-Android was last modified 2017. I will absolutely take a look, because I want this fixed.

I don't update the master branch, only the branches I've created for the PRs. I've just updated the netchangefix-2 branch, which is the base for the this PR. I've been using this for several days now and it seems to work well, although I don't bother to monitor it thoroughly any more.

As for the root of the problem, I never said Greyson is personally responsible somehow. Nevertheless casting it as a "process-problem" makes it seem like it's just a matter of circumstance, or misunderstanding. Whether it's a shortage of handsets (although that doesn't seem to be the case), or whether it's just too little time or manpower, the thing of the matter is, resources are always limited, making it necessary to establish priorities and although it wasn't Greyson, someone evidently decided, consciously, that getting Signal to run without GCM is not a priority. (Reviewing the history of the issue, one might be inclined to make the case that it's not even a matter of priorities, but that patches to make Singal work wihtout GCM are not welcome, but I don't wish to make that case, as it doesn't make a difference really.) One can plainly see that plenty of new features have been worked on and added to Signal during the past 6 months say, features that must have needed a lot of work. Reviewing and merging an already developed and thoroughly tested patch to make Signal work without GCM, would probably have been much less work than any one of them and the fact that it was never done evidently goes to show how low of a priority it really is. In fact, it doesn't seem to be worth the trouble to post an update on the status of the issue, in answer to the several people asking about it.

@stale
Copy link

stale bot commented Jan 26, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the wontfix label Jan 26, 2022
@stale stale bot closed this Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Network connection seems to get stuck (without Google services)
7 participants