read invalid errors sent to prevent message drops #14 #43

kbrock · 2013-03-12T05:31:42Z

Hi,

This is basically a reformatted version of #29 that was submitted to address #14. Looking closer, it looks similar to #21.

Glad so many people are solving the problem in a similar way.

Added some specs. Some of them felt like they were just going through the motions instead of really testing anything. But not sure what you want to do about delegate methods anyway. Also not sure how close to 100% you want to get.

Is this closer to what you want?

All of these solutions look like they could leverage / use your branch https://github.com/grocer/grocer/tree/error_response

Thanks

stevenharman · 2013-03-12T12:34:28Z

lib/grocer/notification.rb

    def payload_too_large?
      encoded_payload.bytesize > MAX_PAYLOAD_SIZE
    end

+    def truncate(field)


How, or where, is #truncate ever used?

NOTE: It is entirely possible I'm not yet awake enough to realize the answer.

you call this on the notification to ensure that the complete payload is not too large

By "you", do you mean the client does? I can't find, nor do I recall, us doing this directly in Grocer.

lol sorry. yea the client calls that on typically the payload to limit the length.
It was in there, I fixed it and added some specs. I can remove if you want.

@kbrock: It would be nice to have, but I think we should let everyone handle truncating their fields on their own.

stevenharman · 2013-03-13T23:42:38Z

Please see (and provide feedback on) my comment here: #14 (comment)

kbrock · 2013-03-14T15:21:35Z

@Aleph7 / @stevenharman

Is it possible to evaluate this pull request independent of the multi-threaded discussion?
Issue 14 seems best for discussing architectural choices.
Even if we do not go the single-threaded route, nailing down some of these implementation decisions can help all implementations in question.

Questions:

Should Pusher#push always call the non blocking select with errors? Basically merge/remove Pusher#push_and_check
~~Do we want an object for the AppleResponse instead of the hash?~~
~~Should we include the human text version of the response code?~~

Any other issues?

Thanks
--kbrock

UPDATE: ErrorResponse has been pulled out and is in master

kbrock · 2013-03-15T18:05:07Z

Completely rewrote client reading code to introduce objects and be more similar to the IO.select solutions proposed by the others.

vanstee · 2013-03-15T19:51:41Z

grocer.gemspec

@@ -26,6 +26,10 @@ Gem::Specification.new do |gem|
  gem.require_paths = ["lib"]
  gem.version       = Grocer::VERSION

+  if RUBY_PLATFORM =~ /java/
+    gem.add_development_dependency "jruby-openssl"
+  end


I think this only runs at build time, so we would have to build a separate grocer-jruby gem or something if we wanted to use this.

agreed.
And to be honest, if the user is running jruby - they already include this gem.

I added it because the build will fail on travis jruby without it.

and agreed: creating the 2 different versions of gems is a pain.

stevenharman · 2013-03-27T22:29:59Z

@kbrock, re: your above questions:

Should Pusher#push always call the non blocking select with errors? Basically merge/remove Pusher#push_and_check

Is there any good reason not to always push and check? @vanstee?

As for questions 2 and 3, we now have an abstraction for the error response, and it decodes the status into human-friendly text, so we're good there.

vanstee · 2013-03-28T03:02:18Z

@stevenharman The only reason I can think of is design related. Do we want to read and handle error responses in-between sending notifications or do we want to make a blocking read in another thread and isolate the error response related code there?

stevenharman · 2013-03-28T15:21:40Z

lib/grocer/ssl_connection.rb

+    def can_read?(timeout=nil)
+      return unless connected?
+      if timeout
+        read_sockets, _, _ = IO.select([@ssl], [], [@ssl], timeout)


In the event timeout is 0, this effectively becomes a non-blocking select, right? I mean, yes, it will block, but the timeout will immediately kick in if there is nothing to read - so it's a very tiny block.

I thought I saw someone doing a select on the connection for both reading and writing without a timeout. That way you would always return because the connection should always be writable, but sometimes you'll also get the connection returned in the readable array too. I don't really know much about how that works. Thoughts?

huh. that is an bug
it should be passing in a nil not a 0. I think 0 will block forever.

I had this before:

IO.select([@ssl],[@ssl],[@ssl], timeout)

I changed to

@ssl.pending > 0

Think they do the same thing, unsure. I can change back to the IO.select version.

Why not just use read_nonblock instead of checking can_read? and then reading?

My understanding is that a [timeout](http://en.wikipedia.org/wiki/Select_(Unix\)) of zero will cause the select(2) (which is what ultimately gets called) to timeout immediately if there is nothing to read. This is effectively like a read_nonblock, except you don't need to rescue from the exception in the event it can't read.

Of course, I could be wrong here.

I needed to add a special method for read that didn't connect to a closed server. So that suggests adding something like this.

@Aleph7 I was concerned that it would only read back 4 of the 6 bytes from the wire.

Darn:

require 'io/wait' f=File.open('CHANGELOG.md') f.close IO.select([f],[f],[f],0) # => error f.ready? # => error f.read_nonblock(6) # => error

found ready? on stack overflow

stevenharman · 2013-03-28T15:23:41Z

@vanstee We're kind of skating around the threading issue in this pull request. The thinking being: we can make it better right now just by reading the errors, and then we can go for thread-safety in follow-up changes.

Given that, the current default is to NOT block (or at least block for a very small time, where that time is 0). @kbrock Am I correct on this?

kbrock · 2013-03-28T17:08:47Z

@stevenharman / @vanstee

There is so much code that is common between the single threaded and multi-threaded solutions.
This pull request may be in the context of a single threaded implementation, but the intent is to find the nuggets in common and at least get those into the code base.

E.g.: The ErrorResponse class that you wrote. Regardless of single/multi threaded decision, it is needed. So we got it into master.

Optimally, someone will say something like: I like the RingBuffer, lets get that out of here and into a pull request, hash that out (e.g.: make it thread safe), and get it into master.

And then I'll reduce this pull request / rebasing it to hell for the 4th(?) time - rinse and repeat.

vanstee · 2013-03-28T18:08:06Z

@kbrock Yeah small, common pieces are going to be the easiest way to move forward towards an eventual solution.

Just want to take a second again to say thank you so much for working on this! ❤️

kbrock · 2013-03-29T11:10:55Z

@stevenharman / @vanstee

I am probably tempted to tear Connection out of my code and just link Pusher directly to SSLConnection

The retry logic on read is not right - we don't want retry logic there. We don't want to connect for it.

The retry logic around write closes the connection (without reading an error off of it) Which we don't want.

If there is an error on write, most of the time we want to surface that up to the pusher, so it can hit the history or something.

But it sounds a little dangerous to me, but not sure why. Thoughts?

http://redth.info/the-problem-with-apples-push-notification-ser/

add delegates don't throw away the ssl object, just disconnect

better guards around connection's calls to ssl connection (ensure no invalid methods are called)

…t no longer needs to hold onto the connection attributes.

… an error but the notification could not be found

…ured)

thedaniel · 2013-05-01T18:58:22Z

I'm very interested in using this library, but the bug in #14 is a blocker for me. What's the status of this PR? If I'd like to use grocer in the near future, am I better off hacking in a fix in my own fork, waiting on this branch to land, or building the gem from another branch here / in someone else's fork?

thedaniel · 2013-05-16T18:17:41Z

ping?

schmidp · 2013-05-19T16:54:30Z

I just pushed my own implementation based on the ping4v3 branch by @kbrock .
Check it out here: https://github.com/openresearch/grocer/commits/openresearch

It basically tries returns the user of grocer 4 arrays when calling pusher.push:

sent: notifications we that have been sent
maybe_sent: notifications that might have been sent (but because of apples interface, we have no way of knowing if they were really sent)
failed: notifications that we got an error response for (currently this array can contain only one notification)
not_sent: notifications that were not sent because they happended after a notification that we got an error response for

pusher.push now also expects an array of notifications to be sent, not each notification individually.

After the last notification sent, it waits for 5 seconds if apple returns an error message. if now data is received after the 5 seconds, we assume that all notifications have been sent.

It worked pretty well in our first production tests.
Here is an example using this code:

https://gist.github.com/schmidp/5608224

I prefer the idea of letting the user of grocer handle retries instead of grocer itself (as @KbRocks branch does).

I would be happy to get some feedback.

kbrock · 2013-05-20T13:07:31Z

@schmidp thanks. I'll look into that. please also look at some recent changes to that branch.

for our purposes, we listen via event machine (rabitmq gem) and every 5 seconds call the idle method check_and_retry

after idle for 10 seconds we call prune since no notifications will be waiting on apple for that long.

I'm mixed between the retry work. There is a basic retry due to network blips, but there are also more serious retries like the internet has a blip or apple does. This work we are doing outside, and I'm looking for a good way to try and solve that too.

kbrock · 2013-05-20T13:20:27Z

@schmidp, are you seeing bad notifications come back on your branch?

For me, Apple did not notify me of the problems until a dozen messages later. Often, this was while a different batch of messages were being sent out.

My first attempt blocked after sending a group of messages out. And this worked great - if you are willing to take the hit. Simplifies things as you have all the messages in an array right there and can easily send them out.

Once you are no longer willing to block after sending - then it gets complicated, as you need to keep state across method calls (hence the History class)

schmidp · 2013-05-20T19:04:12Z

Yes, we had sandbox tokens mixed up with production tokens in our database and our fork handled the errors very well.

Our grocer fork quits sending messages after the first error it experiences and returns the list of sent, maybe sent, failed and unsent notifications.
We currently use a loop to send messages until the unsent array is empty, deleting invalid tokens from the database before calling pusher with the unsent messages again.

Eg:

unsent = all_my_notifications_i_want_to_send
while unsent.count > 0
sent, maybesent, failed, unsent = pusher.push(unsent)
failed.each do |f| <delete f.device_token from db if f.error_response.status_code == 8> end
end

failed currently always contains no more than one notification as we return on every error to the user. The notification in failed has the corresponding error response as attribute.

If there is no error we make a blocking read for 5 seconds after the last notification has been sent.
We always send notifications in worker processes so the 5 seconds shouldn't be a problem.

I also thought about sending an invalid token on purpose as the last notification, this way the final blocking read would only block a few hundred milliseconds.

Sent from my iPad

On 20.05.2013, at 15:20, Keenan Brock notifications@github.com wrote:

@schmidp, are you seeing bad notifications come back on your branch?

For me, Apple did not notify me of the problems until a dozen messages later. Often, this was while a different batch of messages were being sent out.

My first attempt blocked after sending a group of messages out. And this worked great - if you are willing to take the hit. Simplifies things as you have all the messages in an array right there and can easily send them out.

Once you are no longer willing to block after sending - then it gets complicated, as you need to keep state across method calls (hence the History class)

—
Reply to this email directly or view it on GitHub.

iwarshak · 2013-11-27T17:23:01Z

@kbrock If I am willing to take a performance hit like you mentioned, blocking after a send, which branch of your should I use?

My use case is that I am not sending a lot of push notifications, so I definitely want to give each one the best chance of being received (i.e. not dropped because of a bad token in the batch)

Thank you

stevenharman reviewed Mar 12, 2013
View reviewed changes

This was referenced Mar 13, 2013

Socket connection drops next notification if invalid notification is sent #14

Open

Possible fix for #22 Simple notifications #47

Closed

kbrock mentioned this pull request Mar 14, 2013

Added read_reply method to pusher #21

Open

vanstee reviewed Mar 15, 2013
View reviewed changes

stevenharman reviewed Mar 28, 2013
View reviewed changes

Thread save History

0b7c1f7

kbrock and others added 21 commits March 30, 2013 00:29

history with knowledge of the framework

7bc072c

call destroy/disconnect in all places close

441e534

add delegates don't throw away the ssl object, just disconnect

test configuration option of a certificate that is a file

9bb2fa1

better guards around connection's calls to ssl connection (ensure no invalid methods are called)

rename disconnect to close

6babcd0

test sslconnection does not double create a connection

f9e2db1

remove unneeded stub_everything

ce97fa8

added change log edits

4e742bd

since connection is not creating the ssl connection multiple times, i…

0765c66

…t no longer needs to hold onto the connection attributes.

merge feedback from @vanstee

d08aa57

add read_with_timeout

29e1dcd

remove stub_everything from pusher_spec

cfd50b3

tests around read_with_timeout

323de4f

add SSLConnection#ready?

ef0f097

quick error response fixes

733ac06

read error

d24e570

Merge branch 'knowledgeable_history' into connection_drops5

0ca643c

incorporate history into pusher

49a2832

configuration for send all previously sent notifications if there was…

80dba01

… an error but the notification could not be found

check_and_ready will not block by default

8127fd1

rename size to history_size (to better represent what is being config…

a447138

…ured)

history_size (v2)

ebf1898

scottburton11 mentioned this pull request Aug 14, 2014

IO.select only block for readability; introduce timeout nomad-cli/houston#73

Closed

stevenharman force-pushed the master branch from f29b7e3 to 8e311ae Compare March 19, 2015 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read invalid errors sent to prevent message drops #14 #43

read invalid errors sent to prevent message drops #14 #43

kbrock commented Mar 12, 2013

stevenharman Mar 12, 2013

kbrock Mar 12, 2013

stevenharman Mar 12, 2013

kbrock Mar 13, 2013

vanstee Mar 13, 2013

stevenharman commented Mar 13, 2013

kbrock commented Mar 14, 2013

kbrock commented Mar 15, 2013

vanstee Mar 15, 2013

kbrock Mar 16, 2013

stevenharman commented Mar 27, 2013

vanstee commented Mar 28, 2013

stevenharman Mar 28, 2013

vanstee Mar 28, 2013

kbrock Mar 28, 2013

alejandro-isaza Mar 28, 2013

stevenharman Mar 28, 2013

kbrock Mar 28, 2013

kbrock Mar 29, 2013

stevenharman commented Mar 28, 2013

kbrock commented Mar 28, 2013

vanstee commented Mar 28, 2013

kbrock commented Mar 29, 2013

thedaniel commented May 1, 2013

thedaniel commented May 16, 2013

schmidp commented May 19, 2013

kbrock commented May 20, 2013

kbrock commented May 20, 2013

schmidp commented May 20, 2013

iwarshak commented Nov 27, 2013

read invalid errors sent to prevent message drops #14 #43

Are you sure you want to change the base?

read invalid errors sent to prevent message drops #14 #43

Conversation

kbrock commented Mar 12, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenharman commented Mar 13, 2013

kbrock commented Mar 14, 2013

kbrock commented Mar 15, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenharman commented Mar 27, 2013

vanstee commented Mar 28, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenharman commented Mar 28, 2013

kbrock commented Mar 28, 2013

vanstee commented Mar 28, 2013

kbrock commented Mar 29, 2013

thedaniel commented May 1, 2013

thedaniel commented May 16, 2013

schmidp commented May 19, 2013

kbrock commented May 20, 2013

kbrock commented May 20, 2013

schmidp commented May 20, 2013

iwarshak commented Nov 27, 2013