Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pubsub): ordering keys #26

Merged
merged 6 commits into from Feb 5, 2020
Merged

Conversation

pradn
Copy link
Contributor

@pradn pradn commented Feb 4, 2020

Fixes #6

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 4, 2020
Copy link
Contributor

@plamut plamut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on the original PR, a system test would be very good to have, although that can be added separately.

There is a possibility that the code temporarily deviates from the FlowControl.max_bytes limit in certain usage scenarios - which are believed to be atypical - but I'd like to hear from @kamalaboulhosn for the final ruling on that.

The PR still needs changes from the latest master, GitHub reports that it's out of date and not mergeable yet.

Copy link
Contributor

@kamalaboulhosn kamalaboulhosn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like pubsub/google/cloud/pubsub_v1/subscriber/_protocol/messages_on_hold.py didn't get pulled over.

google/cloud/pubsub_v1/publisher/client.py Outdated Show resolved Hide resolved
google/cloud/pubsub_v1/publisher/client.py Show resolved Hide resolved
@kamalaboulhosn
Copy link
Contributor

As discussed on the original PR, a system test would be very good to have, although that can be added separately.

There is a possibility that the code temporarily deviates from the FlowControl.max_bytes limit in certain usage scenarios - which are believed to be atypical - but I'd like to hear from @kamalaboulhosn for the final ruling on that.

The PR still needs changes from the latest master, GitHub reports that it's out of date and not mergeable yet.

@plamut I'm actually okay with the deviation in this way. Here is the way I think about it: Max bytes flow control is to save the client from OOMing when the messages could vary, but the size of the message is the overwhelmingly interesting thing that takes up memory. Given that that message is already in memory in the client library and buffered, no more memory is being used to send it to the user callback.

If the interesting factor in memory utilization is state that has to be loaded due to the message, e.g., have to load a large record from a database or a file from GCS, then the number of messages is the better thing to use for flow control. For that flow control, I believe we won't deliver more messages unless we can.

@pradn
Copy link
Contributor Author

pradn commented Feb 5, 2020

@kamalaboulhosn I hadn't thought of that max-bytes limit as primarily an OOM-protection feature. I'll keep that in mind.

I actually just now tried the code changes that would need to be made to prevent this overflow. It's not much of a change, but it adds more ifs-ands-and-buts to the design, making it harder to understand. Ie: the invariant that keys in the "pending ordering keys" dict all have messages in flight is diluted: now, there's an extra possibility that the key is waiting to be "activated".

In any case, I prefer to keep things simple for now, because it's only gonna get more complicated when we do an optimization pass and fix bugs.

@pradn pradn added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 5, 2020
@pradn
Copy link
Contributor Author

pradn commented Feb 5, 2020

Going to release a version of the library before merging this branch.

@camerondavison
Copy link

we have been seeing this new assert trigger in production https://github.com/googleapis/python-pubsub/pull/26/files#diff-a9838428343bb40f2ac9b6cf2beb9f77R332-R335 I feel like if there is an error that it should try and create a new batch rather than hit that assert. given the rest of the code if I am not mistaken the assert should possible be moved down below

if not self.will_accept(message):
    return future

so that it will return None and create a new batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PubSub: Implement ordering keys
5 participants