Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor stop_host method to use topology #292

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

petroav
Copy link
Contributor

@petroav petroav commented Dec 7, 2016

  • Use requests instead of urllib because it handles connection reset errors better.
  • Use upload_topology to mimic behavior of stop_host.

'workers': good_hosts}
self.upload_topology(topology)
self.cluster.stop_host(bad_host)
self.cluster.stop_host(self.cluster.slaves[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've already put the bad host name in the topology, so stopping the host should really do anything. Maybe we don't need stop_host at all, just upload bad worker/coordinator topologies all around. In any case it should be one or the other.

# Change the topology to something that doesn't exist
ips = self.get_ip_address_dict()
down_hostname = self.get_down_hostname()
self.exec_cmd_on_host(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need all three of these sed commands? Doesn't the topology usually just contain the internal host name?

@petroav
Copy link
Contributor Author

petroav commented Dec 13, 2016

@rschlussel updated version that addresses your comments. I also added a commit that fixed the Connection reset errors I was seeing while building the offline installer.

@rschlussel-zz
Copy link
Member

travis failures are real. It looks like you missed a few usages of stop_host

@rschlussel-zz
Copy link
Member

looks good % fixing the test failures.

@rschlussel-zz
Copy link
Member

Can you also revert the timeout increase (and make sure that this fixes the issue)?

- The requests library handles connection resets from the server
  better than urllib.
- Stopping a host is now done implicitly by uploading
  a topology that points to a host that doesn't exist.
  This was done because stopping a container caused nasty
  race conditions.
@rschlussel-zz
Copy link
Member

@petroav what's the status of this?

@petroav
Copy link
Contributor Author

petroav commented Mar 3, 2017

@rschlussel I put this on pause since we were seeing all those UnixPoolConnectionError I thought I would wait until we fixed the intermittent errors before going forward with this. I just saw the Travis has actually been very stable for presto-admin so I might rebase this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants