Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ZMQError "Address already in use" #30

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Oct 29, 2020

This PR fixes an occasional zmq.error.ZMQError: Address already in use I get in Trollflow2. This happens when a new publisher is created with the same port as the one closed immediately before and OS hasn't had time to free the port.

Minimal code that shows the error with current master branch:

from posttroll.publisher import Publish
while True:
    with Publish("gdal_warper", port=50000) as pub:
        pass

@pnuu pnuu added the bug label Oct 29, 2020
@pnuu pnuu requested a review from mraspaud October 29, 2020 07:42
@pnuu pnuu self-assigned this Oct 29, 2020
@coveralls
Copy link

coveralls commented Oct 29, 2020

Coverage Status

Coverage increased (+0.2%) to 80.364% when pulling 254c58e on pnuu:bugfix-zmqerror into 67b8f3d on pytroll:master.

@pnuu
Copy link
Member Author

pnuu commented Oct 29, 2020

I'm ignoring CodeFactor and Stickler complaints, they are something that should be handled in a separate test refactoring PR.

@mraspaud
Copy link
Member

mraspaud commented Nov 2, 2020

How about syncing with the linger timeout? Because I think this is the reason the bind fails.

@pnuu
Copy link
Member Author

pnuu commented Nov 2, 2020

What does "linger timeout" mean?

@mraspaud
Copy link
Member

mraspaud commented Nov 2, 2020

We set the linger timeout here: https://github.com/pytroll/posttroll/blob/master/posttroll/publisher.py#L143

Basically that's how much time we give the socket to shut off all connections cleanly, and it's set here to one second. And I suspect that's why you get the error in the first place, since the previous connection doesn't have time to shut down cleanly before you create a new one one the same port.

@pnuu
Copy link
Member Author

pnuu commented Nov 2, 2020

This page says the timeout is in milliseconds: http://api.zeromq.org/2-1:zmq-setsockopt

At the moment I'm still having occasional problems with 2 second (10 x 0.2 seconds) wait.

@mraspaud
Copy link
Member

mraspaud commented Nov 2, 2020

oh, ok. Then I suppose TIME_WAIT is the culprit. Not too much we can do about that.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. Some inline comments.

posttroll/tests/test_pubsub.py Outdated Show resolved Hide resolved
posttroll/tests/test_pubsub.py Outdated Show resolved Hide resolved
posttroll/tests/test_pubsub.py Show resolved Hide resolved
posttroll/publisher.py Outdated Show resolved Hide resolved
@pnuu
Copy link
Member Author

pnuu commented Nov 6, 2020

I started to make the requested changes and add tests, but have come to the conclusion that this won't really solve the problem I've been having. So, instead of defining the port, I now let the Trollflow2 publisher to have any free port from a given range by defining POSTTROLL_PUB_MIN_PORT and POSTTROLL_PUB_MAX_PORT environment variables.

I can still finish this PR if it is seen to be usable, IIRC I have only one of the suggestions unfinished. @mraspaud?

@pnuu
Copy link
Member Author

pnuu commented Sep 1, 2021

Pushed the changes in any case, but will convert to a Draft PR.

@pnuu pnuu marked this pull request as draft September 1, 2021 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants