Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read invalid errors sent to prevent message drops #14 #43

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

kbrock
Copy link
Contributor

@kbrock kbrock commented Mar 12, 2013

Hi,

This is basically a reformatted version of #29 that was submitted to address #14. Looking closer, it looks similar to #21.

Glad so many people are solving the problem in a similar way.

Added some specs. Some of them felt like they were just going through the motions instead of really testing anything. But not sure what you want to do about delegate methods anyway. Also not sure how close to 100% you want to get.

Is this closer to what you want?

All of these solutions look like they could leverage / use your branch https://github.com/grocer/grocer/tree/error_response

Thanks

def payload_too_large?
encoded_payload.bytesize > MAX_PAYLOAD_SIZE
end

def truncate(field)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How, or where, is #truncate ever used?

NOTE: It is entirely possible I'm not yet awake enough to realize the answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you call this on the notification to ensure that the complete payload is not too large

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "you", do you mean the client does? I can't find, nor do I recall, us doing this directly in Grocer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol sorry. yea the client calls that on typically the payload to limit the length.
It was in there, I fixed it and added some specs. I can remove if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrock: It would be nice to have, but I think we should let everyone handle truncating their fields on their own.

@stevenharman
Copy link
Member

Please see (and provide feedback on) my comment here: #14 (comment)

@kbrock
Copy link
Contributor Author

kbrock commented Mar 14, 2013

@Aleph7 / @stevenharman

Is it possible to evaluate this pull request independent of the multi-threaded discussion?
Issue 14 seems best for discussing architectural choices.
Even if we do not go the single-threaded route, nailing down some of these implementation decisions can help all implementations in question.

Questions:

  1. Should Pusher#push always call the non blocking select with errors? Basically merge/remove Pusher#push_and_check
  2. Do we want an object for the AppleResponse instead of the hash?
  3. Should we include the human text version of the response code?

Any other issues?

Thanks
--kbrock

UPDATE: ErrorResponse has been pulled out and is in master

@kbrock
Copy link
Contributor Author

kbrock commented Mar 15, 2013

Completely rewrote client reading code to introduce objects and be more similar to the IO.select solutions proposed by the others.

@@ -26,6 +26,10 @@ Gem::Specification.new do |gem|
gem.require_paths = ["lib"]
gem.version = Grocer::VERSION

if RUBY_PLATFORM =~ /java/
gem.add_development_dependency "jruby-openssl"
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only runs at build time, so we would have to build a separate grocer-jruby gem or something if we wanted to use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.
And to be honest, if the user is running jruby - they already include this gem.

I added it because the build will fail on travis jruby without it.

and agreed: creating the 2 different versions of gems is a pain.

@stevenharman
Copy link
Member

@kbrock, re: your above questions:

  1. Should Pusher#push always call the non blocking select with errors? Basically merge/remove Pusher#push_and_check

Is there any good reason not to always push and check? @vanstee?

As for questions 2 and 3, we now have an abstraction for the error response, and it decodes the status into human-friendly text, so we're good there.

@vanstee
Copy link
Member

vanstee commented Mar 28, 2013

@stevenharman The only reason I can think of is design related. Do we want to read and handle error responses in-between sending notifications or do we want to make a blocking read in another thread and isolate the error response related code there?

def can_read?(timeout=nil)
return unless connected?
if timeout
read_sockets, _, _ = IO.select([@ssl], [], [@ssl], timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event timeout is 0, this effectively becomes a non-blocking select, right? I mean, yes, it will block, but the timeout will immediately kick in if there is nothing to read - so it's a very tiny block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw someone doing a select on the connection for both reading and writing without a timeout. That way you would always return because the connection should always be writable, but sometimes you'll also get the connection returned in the readable array too. I don't really know much about how that works. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh. that is an bug
it should be passing in a nil not a 0. I think 0 will block forever.

I had this before:

IO.select([@ssl],[@ssl],[@ssl], timeout)

I changed to

@ssl.pending > 0

Think they do the same thing, unsure. I can change back to the IO.select version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use read_nonblock instead of checking can_read? and then reading?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that a [timeout](http://en.wikipedia.org/wiki/Select_(Unix\)) of zero will cause the select(2) (which is what ultimately gets called) to timeout immediately if there is nothing to read. This is effectively like a read_nonblock, except you don't need to rescue from the exception in the event it can't read.

Of course, I could be wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to add a special method for read that didn't connect to a closed server. So that suggests adding something like this.

@Aleph7 I was concerned that it would only read back 4 of the 6 bytes from the wire.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn:

require 'io/wait'

f=File.open('CHANGELOG.md')
f.close

IO.select([f],[f],[f],0) # => error
f.ready? # => error
f.read_nonblock(6) # => error

found ready? on stack overflow

@stevenharman
Copy link
Member

@vanstee We're kind of skating around the threading issue in this pull request. The thinking being: we can make it better right now just by reading the errors, and then we can go for thread-safety in follow-up changes.

Given that, the current default is to NOT block (or at least block for a very small time, where that time is 0). @kbrock Am I correct on this?

@kbrock
Copy link
Contributor Author

kbrock commented Mar 28, 2013

@stevenharman / @vanstee

There is so much code that is common between the single threaded and multi-threaded solutions.
This pull request may be in the context of a single threaded implementation, but the intent is to find the nuggets in common and at least get those into the code base.

E.g.: The ErrorResponse class that you wrote. Regardless of single/multi threaded decision, it is needed. So we got it into master.

Optimally, someone will say something like: I like the RingBuffer, lets get that out of here and into a pull request, hash that out (e.g.: make it thread safe), and get it into master.

And then I'll reduce this pull request / rebasing it to hell for the 4th(?) time - rinse and repeat.

@vanstee
Copy link
Member

vanstee commented Mar 28, 2013

@kbrock Yeah small, common pieces are going to be the easiest way to move forward towards an eventual solution.

Just want to take a second again to say thank you so much for working on this! ❤️

@kbrock
Copy link
Contributor Author

kbrock commented Mar 29, 2013

@stevenharman / @vanstee

I am probably tempted to tear Connection out of my code and just link Pusher directly to SSLConnection

The retry logic on read is not right - we don't want retry logic there. We don't want to connect for it.

The retry logic around write closes the connection (without reading an error off of it) Which we don't want.

If there is an error on write, most of the time we want to surface that up to the pusher, so it can hit the history or something.

But it sounds a little dangerous to me, but not sure why. Thoughts?

http://redth.info/the-problem-with-apples-push-notification-ser/

@thedaniel
Copy link

I'm very interested in using this library, but the bug in #14 is a blocker for me. What's the status of this PR? If I'd like to use grocer in the near future, am I better off hacking in a fix in my own fork, waiting on this branch to land, or building the gem from another branch here / in someone else's fork?

@thedaniel
Copy link

ping?

@schmidp
Copy link

schmidp commented May 19, 2013

I just pushed my own implementation based on the ping4v3 branch by @kbrock .
Check it out here: https://github.com/openresearch/grocer/commits/openresearch

It basically tries returns the user of grocer 4 arrays when calling pusher.push:

  • sent: notifications we that have been sent
  • maybe_sent: notifications that might have been sent (but because of apples interface, we have no way of knowing if they were really sent)
  • failed: notifications that we got an error response for (currently this array can contain only one notification)
  • not_sent: notifications that were not sent because they happended after a notification that we got an error response for

pusher.push now also expects an array of notifications to be sent, not each notification individually.

After the last notification sent, it waits for 5 seconds if apple returns an error message. if now data is received after the 5 seconds, we assume that all notifications have been sent.

It worked pretty well in our first production tests.
Here is an example using this code:

https://gist.github.com/schmidp/5608224

I prefer the idea of letting the user of grocer handle retries instead of grocer itself (as @KbRocks branch does).

I would be happy to get some feedback.

@kbrock
Copy link
Contributor Author

kbrock commented May 20, 2013

@schmidp thanks. I'll look into that. please also look at some recent changes to that branch.

for our purposes, we listen via event machine (rabitmq gem) and every 5 seconds call the idle method check_and_retry

after idle for 10 seconds we call prune since no notifications will be waiting on apple for that long.

I'm mixed between the retry work. There is a basic retry due to network blips, but there are also more serious retries like the internet has a blip or apple does. This work we are doing outside, and I'm looking for a good way to try and solve that too.

@kbrock
Copy link
Contributor Author

kbrock commented May 20, 2013

@schmidp, are you seeing bad notifications come back on your branch?

For me, Apple did not notify me of the problems until a dozen messages later. Often, this was while a different batch of messages were being sent out.

My first attempt blocked after sending a group of messages out. And this worked great - if you are willing to take the hit. Simplifies things as you have all the messages in an array right there and can easily send them out.

Once you are no longer willing to block after sending - then it gets complicated, as you need to keep state across method calls (hence the History class)

@schmidp
Copy link

schmidp commented May 20, 2013

Yes, we had sandbox tokens mixed up with production tokens in our database and our fork handled the errors very well.

Our grocer fork quits sending messages after the first error it experiences and returns the list of sent, maybe sent, failed and unsent notifications.
We currently use a loop to send messages until the unsent array is empty, deleting invalid tokens from the database before calling pusher with the unsent messages again.

Eg:

unsent = all_my_notifications_i_want_to_send
while unsent.count > 0
sent, maybesent, failed, unsent = pusher.push(unsent)
failed.each do |f| <delete f.device_token from db if f.error_response.status_code == 8> end
end

failed currently always contains no more than one notification as we return on every error to the user. The notification in failed has the corresponding error response as attribute.

If there is no error we make a blocking read for 5 seconds after the last notification has been sent.
We always send notifications in worker processes so the 5 seconds shouldn't be a problem.

I also thought about sending an invalid token on purpose as the last notification, this way the final blocking read would only block a few hundred milliseconds.

Sent from my iPad

On 20.05.2013, at 15:20, Keenan Brock notifications@github.com wrote:

@schmidp, are you seeing bad notifications come back on your branch?

For me, Apple did not notify me of the problems until a dozen messages later. Often, this was while a different batch of messages were being sent out.

My first attempt blocked after sending a group of messages out. And this worked great - if you are willing to take the hit. Simplifies things as you have all the messages in an array right there and can easily send them out.

Once you are no longer willing to block after sending - then it gets complicated, as you need to keep state across method calls (hence the History class)


Reply to this email directly or view it on GitHub.

@iwarshak
Copy link

@kbrock If I am willing to take a performance hit like you mentioned, blocking after a send, which branch of your should I use?

My use case is that I am not sending a lot of push notifications, so I definitely want to give each one the best chance of being received (i.e. not dropped because of a bad token in the batch)

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants