Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Redis instances shut down when scheduler restarted #56

Open
eastlondoner opened this issue Mar 30, 2017 · 13 comments
Open

Redis instances shut down when scheduler restarted #56

eastlondoner opened this issue Mar 30, 2017 · 13 comments

Comments

@eastlondoner
Copy link
Contributor

We have built mr-redis from the latest master and are running it on DC/OS (using zookeeper rather than etcd).

The basics work ok but when the scheduler is restarted the existing redis instances shut down and don't come back.

If you call the /STATUS endpoint it says that the redis instances are up - but looking in mesos they're not running any more

@eastlondoner
Copy link
Contributor Author

It looks to me like the failover_timeout logic is not quite right in mesoslib.go

see here:
http://mesos.apache.org/documentation/latest/high-availability-framework-guide/

  • recommended settings are much greater than the 60 seconds that is set

I think the logic of using the failover timeout in GetFrameworkID is not correct:
e.g. if my scheduler has been up for longer than failover timeout and then restarts it shouldn't loose the old framework id (and all the running tasks).

@eastlondoner
Copy link
Contributor Author

See this PR which fixes the behaviour when a scheduler is restarted:
#57

@dhilipkumars
Copy link
Member

the PR looks good to me.

@dhilipkumars
Copy link
Member

First of all thanks a lot for the contribution. Glad to hear that you are using mr-redis. I think mr-redis needs the leader-follower logic to be implemented so that more than one instance of this scheduler can be run at once for high-availability. Would you like to contribute that functionality?

@dhilipkumars
Copy link
Member

@eastlondoner
How are you running it with DC/OS?
if you have re-packaged it would you be interested in contributing it to universe as version 01.

@dhilipkumars dhilipkumars reopened this Apr 1, 2017
@eastlondoner
Copy link
Contributor Author

Hi @dhilipkumars We're running it by installing the package from universe then going into Marathon and changing the docker image to point at out docker image: https://hub.docker.com/r/tractableio/mr-redis/

@eastlondoner
Copy link
Contributor Author

We also had to change the docker client API version setting in mr-redis to match the version of Docker running on our Agents before we built that docker image.
You can see the code change on my fork. I've not issued a PR because I think there is a better way of doing it where it determines the docker api version from DOCKER_HOST env variable - but I've not had time to look into it.

@eastlondoner
Copy link
Contributor Author

I guess I could push a new version to the universe, but I wouldn't want to push something that includes code changes that aren't in this (mainline) repo. Furthermore for the latest DC/OS I think that the docker API should be 1.25!

@eastlondoner
Copy link
Contributor Author

n.b. this is the commit I am concerned about:
eastlondoner@10bdba0

@daguero
Copy link

daguero commented Feb 26, 2018

@eastlondoner
Hello, I'm trying to access the image of docker https://hub.docker.com/r/tractableio/mr-redis/ but it is not accessible, could you give me some other option ???

Thank you

@daguero
Copy link

daguero commented Feb 28, 2018

Hi @dhilipkumars I have the same problem that is discussed in this issue, I would like to access the docker image https://hub.docker.com/r/tractableio/mr-redis/ to do some tests.

Thank you

@eastlondoner
Copy link
Contributor Author

@daguero I don't work at Tractable anymore and I recall I did some hacky things that I didn't want to publish to make it work.
However you should be able to build your own docker image that will work if you use my fork: https://github.com/eastlondoner/mr-redis

@daguero
Copy link

daguero commented Mar 1, 2018

@eastlondoner OK, Thanks for your help, I'll prove it

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants