Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Nomad strategy #181

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

mikenomitch
Copy link

Summary of changes

This PR adds a strategy for clustering on HashiCorp Nomad using Nomad's (relatively new) native service discovery feature.

This is something I am planning to use on my own projects, and I figured I would open a WIP PR in case this is something you want to upstream. If so, I will clean up the code and add better testing, docs, and typespecs.

I am a bit biased (I work for HashiCorp), but I would love to see Nomad alongside K8s and Rancher as a natively supported orchestrator for libcluster. :)

If you're interested, let me know and I can make this prod-ready, if not, feel free to close it out and I'll just maintain my fork.

Checklist

  • New functions have typespecs, changed functions were updated
  • Same for documentation, including moduledocs
  • Tests were added or updated to cover changes
  • Commits were squashed into a single coherent commit
  • Notes added to CHANGELOG file which describe changes at a high-level

To Dos

  • Ensure this works in prod-like Nomad environment
  • Add typespecs
  • Add tests
  • Improve documentation
  • Clean up code

* `namespace` - The Nomad namespace to query (optional; default: "default")
* `nomad_server_url` - The short name of the nodes you wish to connect to (required; e.g. "https://127.0.0.1:4646")
* `node_basename` - The erland node basename (required; e.g. "app")
* `poll_interval` - How often to poll in milliseconds (optional; default: 5_000)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: must add "token" to docs

@HammamSamara
Copy link

Thanks for your work @mikenomitch, as someone who also looks for a strategy that works with Nomad, I would like to know if you have used it in a production environment, and if I can do so as well. Much appreciated!

@mikenomitch
Copy link
Author

mikenomitch commented Nov 29, 2022

Hey @HammamSamara, I haven't gotten this working (even in testing) yet but I'm pretty sure it is close. Unfortunately, this was something I was doing for fun on parental leave and now I am back at work full time, so I haven't had a chance to get it done yet.

I believe I ended up getting stuck in the networking stage and couldn't get the apps connecting correctly. I think this was due to me misconfiguring either the erlang port range ("inet_dist_listen_min/inet_dist_listen_max), or my AWS port security rules, or my Nomad networking rules. Got cut off mid-debug though!

In case you or anybody else wants to take a crack at this, I can push up my latest code and the jobspecs I was using to test it in case that is helpful.

Side note: I think we'll get around to this task soon on the Nomad team, which should make it easier to open up the whole port range properly.

EDIT: for anybody who would be interested in running with this: Here is a link to my nomad jobspec and parts of my elixir app where I tried getting this working: https://gist.github.com/mikenomitch/14f3214789f5b3335b466b42721682e4

Also - hope you're liking Nomad! :)

@HammamSamara
Copy link

Thank you for the great details, and pointing out the port range issue as well. This is my first time using Nomad with Elixir and getting this to work is key to convince Elixir teams to use it more often.

Will use your excellent gist as a kick start to test your work on my setup and post back once I have any tangible results.

P.S. Do you think having a docker image containing a production release instead of running elixir in development mode (iex -S mix) has anything to do with it? It's highly irrelevant but pointing it out anyway since I am used to precompiled elixir on prod. which offers control over VM flags and commands to connect to the running system remotely.

@sukidhar
Copy link

Hey @HammamSamara, I haven't gotten this working (even in testing) yet but I'm pretty sure it is close. Unfortunately, this was something I was doing for fun on parental leave and now I am back at work full time, so I haven't had a chance to get it done yet.

I believe I ended up getting stuck in the networking stage and couldn't get the apps connecting correctly. I think this was due to me misconfiguring either the erlang port range ("inet_dist_listen_min/inet_dist_listen_max), or my AWS port security rules, or my Nomad networking rules. Got cut off mid-debug though!

In case you or anybody else wants to take a crack at this, I can push up my latest code and the jobspecs I was using to test it in case that is helpful.

Side note: I think we'll get around to this task soon on the Nomad team, which should make it easier to open up the whole port range properly.

EDIT: for anybody who would be interested in running with this: Here is a link to my nomad jobspec and parts of my elixir app where I tried getting this working: https://gist.github.com/mikenomitch/14f3214789f5b3335b466b42721682e4

Also - hope you're liking Nomad! :)

After going through some testing and understanding the problem, I have figured out that it is to do with container networking than with Erlang port range. I have personally tested and discovered that docker containers with in same network bridge are able to discover and connect to each other.

image
image

It is found that Nomad by default assumes host based networking. The containers on the host are isolated from the network based on documentation from docker. To achieve connection we have to enable nomad jobspec to have bridge mode in case of docker driver and configure the network bridging. Even if we use Consul and try the DNS lookup or HTTP API instead of nomad HTTP API, the scenario is same as long as the containers are not bridged between each other. Kubernetes does some heavy lifting of setting up network bridge between containers from multiple hosts. However, I found that using exec driver, I had luck on deploying elixir applications without taking the network bridge approach which slightly seemed to be a hustle to set up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants