Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_class is required for target checks even when datacenter is set #592

Open
protochron opened this issue Jul 6, 2022 · 2 comments
Open

Comments

@protochron
Copy link

It seems that even though the code doesn't require it, the autoscaler will not select any nodes unlessnode_class is set in a target. The check errors out with a message failed to query source: no nodes identified within pool. I had hoped to let the autoscaler manage all nodes in a single datacenter, since both the docs and the code imply that is supported.

Example config:

scaling "aws_cluster_policy" {
  enabled = true
  min     = 3
  max     = 20

  policy {
    cooldown            = "2m"
    evaluation_interval = "30s"

    check "cluster_memory" {
      source = "nomad-apm"
      query  = "percentage-allocated_memory"

      strategy "threshold" {
        lower_bound = 90
        delta       = 1
      }
    }

    target "constellations" {
      aws_asg_name = "workers"
      datacenter   = "workers"
      node_purge   = true
      node_class   = "test"
      dry_run      = true
    }
  }
}

Output:

2022-07-06T15:20:58.985Z [DEBUG] policy_eval.worker: fetching current count: id=2efc8c99-1fd5-771b-34ab-05f19e528be4 policy_id=a671710f-0d33-0cd4-9bf8-36cfcf17459e queue=cluster target=workers
2022-07-06T15:20:59.229Z [DEBUG] policy_eval.worker.check_handler: received policy check for evaluation: check=cluster_memory id=2efc8c99-1fd5-771b-34ab-05f19e528be4 policy_id=a671710f-0d33-0cd4-9bf8-36cfcf17459e queue=cluster source=nomad-apm strategy=threshold target=workers
2022-07-06T15:20:59.229Z [DEBUG] policy_eval.worker.check_handler: querying source: check=cluster_memory id=2efc8c99-1fd5-771b-34ab-05f19e528be4 policy_id=a671710f-0d33-0cd4-9bf8-36cfcf17459e queue=cluster source=nomad-apm strategy=threshold target=workers query=node_percentage-allocated_memory//class source=nomad-apm
2022-07-06T15:20:59.229Z [DEBUG] internal_plugin.nomad-apm: performing node pool APM query: query=node_percentage-allocated_memory//class
2022-07-06T15:20:59.233Z [WARN]  policy_eval.worker: failed to run check: id=2efc8c99-1fd5-771b-34ab-05f19e528be4 policy_id=a671710f-0d33-0cd4-9bf8-36cfcf17459e queue=cluster target=workers check=cluster_memory on_error="" on_check_error="" error="failed to query source: no nodes identified within pool"
2022-07-06T15:20:59.233Z [DEBUG] policy_eval.worker: no checks need to be executed: id=2efc8c99-1fd5-771b-34ab-05f19e528be4 policy_id=a671710f-0d33-0cd4-9bf8-36cfcf17459e queue=cluster target=workers

Setting the node_class field fixes the error and the autoscaler is able to identify nodes to manage with the policy. The fix in my case is pretty straightforward: just set the same node_class value on every node in the datacenter. But is this behavior intentional? It doesn't seem like it if the code in https://github.com/hashicorp/nomad-autoscaler/blob/main/sdk/helper/scaleutils/nodepool/nodepool.go#L35-L47 is anything to go by.

@protochron
Copy link
Author

#255 seems related, but it's pretty old and I think predates being able to filter nodes by datacenter.

@lgfa29
Copy link
Contributor

lgfa29 commented Sep 22, 2022

Thanks for the report @protochron, and apologies for taking this long to get back to you.

I will need some time to investigate this further, but yeah, I think node_class is only one of the possible node selection options, so it should be optional.

And thanks for pointing ou #255, I will try to tackle that documentation gap as well 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants