Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve potential data race in replaceNotification on batched edit notifications? #5764

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

donaldguy
Copy link

First time contributor checklist

Contributor checklist

  • My commits are rebased on the latest main branch
  • My commits are in nice logical chunks
  • My contribution is fully baked and is ready to be merged as is

... to the best of my knowledge

  • I have tested my contribution on these devices:

I have not tested changes because the situation is a hypothetical and/or difficult to reproduce data race.

That said, whereas the whole effect of the change is re-ordering two occurrences of 2 sequential lines of code to an order more logical, I would expect this to:

  • be a hypothetical-to-practical improvement in the best case and
  • a no-op in the average-to-worst-case no-op

UNLESS a more experienced iOS programmer knows a specific reason for the extant order to be preferred

(and if that is the case, I would suggest that a // XXX: comment or similar is in order to explain the less-intuitive order)


Description

It seems presently possible that a notification can fail-to-cancel & end up left [near-]duplicated instead, as a result of precise unfortunate timing wherein:

which appear presently in that order, in error?

I have not been able to come up with any1 reasoning for this order to be preferred.

Context suggests that previous touching authors (in de61926 and af6ff96 ) errantly believed2 order to be irrelevant (owing to claims of synchronous action that are not met by the underlying iOS API)

My suspicion there could be a good, unspecified reason for this order obscure to me is further weakened by noticing that the analogous calls to removeAll{Delivered,Pending}Notifications happen in [this PR's / the intuitively-safer] pending-then-delivered order.

That has been the case for the removeAll* calls since introduction to present, to wit:

    as introduced in 1bfe691
    Because of the many files involved, and github's collapsing thereof, it seemed worth detailing my blame-findings that

    (^this author hasn't touched the withIdentifiers: lines where they live now—and I thought that represented a preference matching my intuition w/o contradiction—

    but *sigh* in fact, I've belatedly realized they did introduced mismatched delivered-then-pending usage as cancelNotificationonly 23 lines up same file in same commit 🤔

  • left this (inconsistent) way during aforementioned refactor of af6ff96
    diff --git a/SignalMessaging/Notifications/UserNotificationsAdaptee.swift b/SignalMessaging/Notifications/UserNotificationsAdaptee.swift
    index 2aa55ea5129..eff43089277 100644
    --- a/SignalMessaging/Notifications/UserNotificationsAdaptee.swift
    +++ b/SignalMessaging/Notifications/UserNotificationsAdaptee.swift
    [… snip …]
    @@ -391,14 +390,14 @@ class UserNotificationPresenterAdaptee: NSObject, NotificationPresenterAdaptee {
                 return
             }
     
    -        notificationCenter.removeDeliveredNotifications(withIdentifiers: identifiersToCancel)
    -        notificationCenter.removePendingNotificationRequests(withIdentifiers: identifiersToCancel)
    +        Self.notificationCenter.removeDeliveredNotifications(withIdentifiers: identifiersToCancel)
    +        Self.notificationCenter.removePendingNotificationRequests(withIdentifiers: identifiersToCancel)
         }
     
         // This method is thread-safe.
         private func cancelNotificationSync(identifier: String) {
    -        notificationCenter.removeDeliveredNotifications(withIdentifiers: [identifier])
    -        notificationCenter.removePendingNotificationRequests(withIdentifiers: [identifier])
    +        Self.notificationCenter.removeDeliveredNotifications(withIdentifiers: [identifier])
    +        Self.notificationCenter.removePendingNotificationRequests(withIdentifiers: [identifier])
         }
     
         // This method is thread-safe.
    @@ -423,8 +422,8 @@ class UserNotificationPresenterAdaptee: NSObject, NotificationPresenterAdaptee {
         // This method is thread-safe.
         func clearAllNotifications() {
             pendingCancellations.removeAllValues()
    -        notificationCenter.removeAllPendingNotificationRequests()
    -        notificationCenter.removeAllDeliveredNotifications()
    +        Self.notificationCenter.removeAllPendingNotificationRequests()
    +        Self.notificationCenter.removeAllDeliveredNotifications()
         }

In particular,

I believe this race may on occasion be actually occurring when e.g. multiple edits to the same message are being (belatedly) batch processed back-to-back

(after re-establishment of interrupted network connection;

or possibly mere sleep and resume of envelope processing per iOS Background App refresh)


(Except for) hoisting this out for emphasis —and also because apparently the callout parsing is suppressed inside a <details> even where other markdown is processed (and I don't think there is even effective html option to get it back?)

Important

In particular, a friend with whom I frequently converse informed me that this claim [to wit "edits… don't generate push notifications ... at least"] was wrong that they were:

99% sure I've gotten pushes of you editing though -- the same message showing up a few times in a row (same preview text)

... This, with foreknowledge of further investigation suggests that this race (or similar, such as Apple simply not completing requested cancellation for other reasons) is occurring, not just hypothetical?

I went ahead and squashed this into a <details> once I realized how already also long my (originally a ) TL;DR had become.

But it should answer any questions of why I am proposing this (outside of ~mismatch to intuition)

Full Motivation & relevant code path

Oops I was push-spamming actually…

I am an inveterate serial-editor. Both, as the feature is probably mainly intended, for fixing typos and such. AND as a matter of more dramatically rephrasing or expanding yet-unread messages before recipient's first read receipt fires

This last practice was one undertaken intentionally under presumption that (as informed by experience with e.g. Discord, Slack, Facebook-comments, reddit comments, etc) by making my expansions thusly I was avoiding spamming recipient with additional push notifications

(if they were not looking at Signal, but within earshot of device or had it in pocket, etc)

Earlier today, I learned this was (sort of) not true! 😱

<part that="is important-ed above">
In particular, a friend with whom I frequently converse informed me that this claim [to wit "edits… don't generate push notifications ... at least"] was wrong that they were:

99% sure I've gotten pushes of you editing though -- the same message showing up a few times in a row (same preview text)

... This, with foreknowledge of further investigation suggests that this race (or similar, such as Apple simply not completing requested cancellation for other reasons) is occurring, not just hypothetical?
</part>

Embarrassed at being (even) noisier than I thought I was, I decided to dive down this rabbit hole.

I discovered that indeed, the logic is explicitly to generate notifications only for edits of unread but previously delivered messages (in opposition of my intuition) here:

// Only update notifications for unread/unedited message targets
let targetEditState = incoming.message.editState
if targetEditState == .latestRevisionUnread || targetEditState == .none {
self.notificationsManager.notifyUser(
forIncomingMessage: message,
editTarget: incoming.message,
thread: thread,
transaction: tx
)

(assuming indeed that multiple dispatch makes this the path for the definition from L1591 always taken into handleIncomingEnvelope ; as it stands I do not understand when that from L1535 would instead be run - but whereas it does not clearly generate notifications under any circumstance, it is not at issue?)

…or was I?

(yes, but maybe not as bad as I briefly thought)

Now my embarrassment was somewhat relieved when I looked into the rest of the origin commit (66d7592) for that block / the present main pieces of that code path

and discovered:

1. at least I'm spamming quietly?

Stepping directly in from that call to notifyUser, then skipping over the first 117 lines that follow (including early descent into relevant notifyUserInternal), we come eventually on:

performNotificationActionAsync { completion in
let sound = (editTarget != nil) ? nil : self.requestSound(thread: thread)
let notify = {
self.presenter.notify(
category: category,
title: notificationTitle,
body: notificationBody,
threadIdentifier: threadIdentifier,
userInfo: userInfo,
interaction: interaction,
sound: sound,
completion: completion
)
}

with line 686's suppression of noise for this notification at least... maybe4 comforting

and

2. the code clearly intends to avoid doing notification-spam

Picking up from the next line and continuing to end of function:

if let editTarget {
self.presenter.replaceNotification(messageId: editTarget.uniqueId) { didReplaceNotification in
guard didReplaceNotification else {
completion()
return
}
Self.notificationQueue.async {
notify()
}
}
} else {
notify()
}
}
}

its fairly clear that the intention is explicitly to replace the notification with one reflecting any edit (ironic in my particular case where only rarely has what prefix would fit in the push's preview actually changed) and not to generate an additional redundant notification

in particular things seem to prefer to drop (re-)queueing a revised notification entirely over creating duplication (under conditions where replaceNotification doesn't, to its knowledge, succeed - by for example failing to locate a previous version of the message

(as might occur if e.g. user dismissed it but hasn't opened Signal? - in which case it is especially correct to not seemingly "grow back")

... I'm pretty sure, at least

I am inexperienced enough with swift in particular that I'm a little uncertain as to what calling completion() inside the guard's else will do in practice

best as I can tell it will just effect a jump/co-routine-yield to

making that call no more than a ~cleanup / continuation-passing-style-return (before the stack pop return on the next line)

I'm a bit befuddled by why this is written this way, and my second guess is that the completion in proceeding this block makes this some sort of recursive binding making that call a re-entrant retry (but were that the case, I see nothing to stop this from becoming, under-tail-call-optimization, an infinite loop; and it seems contradicted by the call of completion also deeper at end of the happy path

redundancy "should" be impossible

Taking a gander at replaceNotification:

func replaceNotification(messageId: String, completion: @escaping NotificationReplaceCompletion) {
getNotificationsRequests { requests in
let didFindNotification = self.cancelSync(
notificationRequests: requests,
matching: .messageIds([messageId])
)
completion(didFindNotification)
}
}

we are given something of a false assurance by the name cancelSync, a function which

proceeds for a hefty-ish 66 lines,

private func cancelSync(
notificationRequests: [UNNotificationRequest],
matching cancellationType: CancellationType
) -> Bool {
let requestMatchesPredicate: (UNNotificationRequest) -> Bool = { request in
switch cancellationType {
case .threadId(let threadId):
if
let requestThreadId = request.content.userInfo[AppNotificationUserInfoKey.threadId] as? String,
requestThreadId == threadId
{
return true
}
case .messageIds(let messageIds):
if
let requestMessageId = request.content.userInfo[AppNotificationUserInfoKey.messageId] as? String,
messageIds.contains(requestMessageId)
{
return true
}
case .reactionId(let reactionId):
if
let requestReactionId = request.content.userInfo[AppNotificationUserInfoKey.reactionId] as? String,
requestReactionId == reactionId
{
return true
}
case .missedCalls(let threadUniqueId):
if
(request.content.userInfo[AppNotificationUserInfoKey.isMissedCall] as? Bool) == true,
let requestThreadId = request.content.userInfo[AppNotificationUserInfoKey.threadId] as? String,
threadUniqueId == requestThreadId
{
return true
}
case .storyMessage(let storyMessageUniqueId):
if
let requestStoryMessageId = request.content.userInfo[AppNotificationUserInfoKey.storyMessageId] as? String,
requestStoryMessageId == storyMessageUniqueId
{
return true
}
}
return false
}
let identifiersToCancel: [String] = {
notificationRequests.compactMap { request in
if requestMatchesPredicate(request) {
return request.identifier
}
return nil
}
}()
guard !identifiersToCancel.isEmpty else {
return false
}
Logger.info("Removing delivered/pending notifications with identifiers: \(identifiersToCancel)")
Self.notificationCenter.removeDeliveredNotifications(withIdentifiers: identifiersToCancel)
Self.notificationCenter.removePendingNotificationRequests(withIdentifiers: identifiersToCancel)
return true

... but as best as I've noticed is up until the last 2 lines, reordered in this PR, simply staking out early returns for various edge cases, which reifying an edit is not

meanwhile the simpler function, cancelNotificationSync

which is:

  • only these two lines of body
  • after a log ... which is, of note, Logger.warn to cancelSync's Logger.info
  • immediately follows cancelSync in the file
  • is also flipped over in this PR

seems to only be called here to nope out of notify() in case of replacingIdentifier - which in turn happens only in 2 cases:

  • presentIncomingCall
  • presentMissedCall
    ... I can't say I understand exactly what that path is, but its not clear it needs its own abbreviated entry point directly to these 2 lines ... but it has it

is proceeded with a reassuring comment // This method is thread-safe.

...except for the lies (asynchronous system API)

So, I think it's clear that these functions basically claim and their callers expect that by return from cancelSync the notification has been canceled.

But these are synchronous only so far as Signal has control of the situation.

Take a look at the docs for the functions and you'll find:

It's thus clear these are very much not synchronous removals

...and the blind spot (data race)

While it would not shock me if in some cases Apple simply decides that a requested removal is not actually very important (to the end at least of letting any error stand without retry, and definitely no (facilitation for) notification of the caller; but perhaps also like - battery save state, CPU load etc),

if that is not the case (and apple will be very diligent) this mismatch of cancelSync( named code and async-proceeding reification shouldn't actually ever result in redundant notifications like my friend reported

but with the current order of calls, it does seem plausible that you could slip through the situation where:

  • when you first call removeDeliveredNotifications, that there "The method executes asynchronously, returning immediately and removing the specified notifications on a background thread." still-so-far accurately describes the situation (cause in fact the notification is still pending, never delivered)
  • but as/by-the-time you second call removePendingNotificationRequests, its now the case that "the trigger condition for that request has already been met, [so] this method ignores the identifier" (that you are now talking about a delivered notification, never canceled)

That seems... like a tight timing constraint to come out that way, so I'm kinda still inclined to think maybe sometimes Apple de-prioritizes such a retraction to filth, but ... as mentioned top level, the batch-processing scenarios (e.g. upon network reconnect with multiple pending edits to same message) seem at least plausible

but maybe only if you busywait?

If in fact it's all about the data race, then this (grossly over-explained) swapping of lines ought to suffice to eliminate it. And I don't think it could hurt regardless (though again, a more experienced iOS programmer may disagree for reasons yet obscure)

But if the "lazy Apple" scenario is on the table, or if indeed you just really want to deliver on the Sync promise of the cancel function names (and the past tense of the did in didReplaceNotification), then I think you need to actually check that the notification is gone now

as a busy-wait loop over (ideally 1 but possibly multiple) calls to getDeliveredNotifications(completionHandler: ) and an O(N) traversal of the present set ; ditto maybe getPendingNotificationRequests(completionHandler:) (or you can perhaps wait for it to be first delivered, since I don't think Signal is in the habit of scheduling in the later-than-ASAP future?)

This feels ... wasteful, so I understand why you don't already do it (beyond possibly incorrectly assuming its unneeded)

but maybe there are other options?

falling out of the bottom of the <details>

a better way?

by the time you've reached cancelSync, you already did getNotificationsRequests which did exactly one turn each of those O(N) loops over getDeliveredNotifications(completionHandler: ) and getPendingNotificationRequests(completionHandler:)

you are nominally holding a UNNotificationRequest, which might be for a genuinely pending notification, or may have been .requested off of a bonafide UNNotification briefly held in passing

either way, you proceed to (peek at .content.userInfo to differentiate edge cases, but by the time we reach an attempted cancel), reduce down to the NSString that is .identifier

targeted observation?

you held a reference to the actual notification (request) object.

Idk if in modern iOS this is a pass-by-value clone or literally a pass-by-reference handle/delegate on the notification as it exists presented in Notification Center

broadly my impression of Apple's approach in the past is that it might be the latter? (though conformance to `NSCopying may say no?)

So is there something in that that is observable? Does anything happen to it if you had a handle and the async piece of the remove*(withIdentifiers: ) have completed? Perhaps (but perhaps not):

  • the .date of UNNotification gets blanked or set to epoch?
  • the object might be scheduled for GC ... except whereas you (in theory) have a handle to it? Is there a way to hold a WeakRef and await a disappearance?

These sorts of things could at least avoid repeated O(N) fetch for a "make sure its actually gone"; also could potentially duck any busy waiting if there's something there (or indeed up on the UNUserNotificationCenter for NSKeyValueObserving to target more actively?)

differentiated handling?

.trigger

The wording is confusing, but it would appear possibly that the .trigger of the request coming into cancelSync actually should tell you whether you need to call removeDeliveredNotifications OR removePendingNotificationRequests about it, yes?

per that page's

For notifications that the system has delivered, use this property to determine what caused the delivery to occur.

Idk if that means it actually changes or just you know ... is still there

But also regardless the flattening between the two happening in getNotificationsRequests doesn't have to happen per se

.content

As re my specific concern about excess, and indeed potentially unnecessary re-delivery for edits, there is definitely a chance to look into the body of the notification to see whether the edit actually changed it? (presuming either that there is actually any truncation of what's given there, or a windowing heuristic is applicable?)

actual modification?

not for nothing, and I'm sure there are relevant limitations and/or overhead but I do notice that UNMutableNotificationContent is a thing

as well as this whole system(s): https://developer.apple.com/documentation/usernotifications/modifying-content-in-newly-delivered-notifications / https://developer.apple.com/documentation/usernotifications/unnotificationcontent/updating(from:)

so it seems possible that in place modification is not out of the question?

Footnotes

  1. outside of code style preference for alphabetical order iff its equivalent

  2. They might also have known it to be irrelevant due to other features of previous versions of the code since removed. I didn't find those, but I didn't do an exhaustive search and commit messages perhaps hint so

  3. though e.g. github.com/blob/${fullsha}~1/Signal/… is a valid URL, it apparently fouls whatever regex or parser that grants inline-embed rendering to permalinks ( and as must be pretty strictly /[0-9a-f]{n,m}/ as it would not be tricked by/tolerate urlencodes [^ = %5e or ~ = %7e] even with tricksy commensurate reduction of sha prefix-length). Granted idk that the renderer wouldn't correct back to showing 3123842 even if it did work per these urls

  4. whereas my friends are mostly phone-always-on-silent types, I hope, and have yet to comfort myself with tests or more code reading, that "silence" extends to vibration/haptics and e.g. popping up on an apple watch, vibrating wrist.

@donaldguy donaldguy changed the title resolve potential data race in cancelNotifications on batched edit notifications? resolve potential data race in replaceNotification on batched edit notifications? Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant