Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to probe any host for Elasticsearch version #9550

Closed
Ghostbird opened this issue Nov 19, 2020 · 7 comments
Closed

Unable to probe any host for Elasticsearch version #9550

Ghostbird opened this issue Nov 19, 2020 · 7 comments
Labels

Comments

@Ghostbird
Copy link

Ghostbird commented Nov 19, 2020

I upgraded my docker-compose configuration to Elasticsearch 7.9.3 and Graylog 4.0

After fixing some other issues that occurred due to errors on my side, it started fine. Then I stopped it, to bring it up as a system service, but it didn't work. So I tried to run it manually again, and now every time it spews a stack trace telling me it cannot connect to ElasticSearch.

I've even removed all docker images and containers related to graylog and let docker-compose rebuild everything, but the error stays.

Expected Behavior

Graylog starts and connects to Elasticsearch 7 container.

Current Behavior

Graylog attempts to start, then spews a stack trace that starts with:
starting with:

graylog_1        | 2020-11-19 08:45:15,229 ERROR: org.graylog2.storage.versionprobe.VersionProbe - Unable to retrieve version from Elasticsearch node: 
graylog_1        | java.net.ConnectException: Failed to connect to elasticsearch/172.23.0.2:9200

…and ends with:

graylog_1        | ################################################################################
graylog_1        | 
graylog_1        | ERROR: Unable to probe any host for Elasticsearch version!
graylog_1        | 
graylog_1        | Please see the following link(s) to help you with this error:
graylog_1        | 
graylog_1        | * http://docs.graylog.org/en/4.0/pages/configuration/elasticsearch.html
graylog_1        | 
graylog_1        | Need further help?
graylog_1        | 
graylog_1        | * Official documentation: http://docs.graylog.org/
graylog_1        | * Community support: https://www.graylog.org/community-support/
graylog_1        | * Commercial support: https://www.graylog.com/support/
graylog_1        | 
graylog_1        | Terminating. :(
graylog_1        | 
graylog_1        | ################################################################################
graylog_graylog_1 exited with code 252

Possible Solution

I'm not sure, but can it be that it tries to connect to the elasticsearch container before it is fully up and running? Or that it tries to connect to the wrong endpoint? A few seconds after the termination messages this line appears in the log:

elasticsearch_1  | {"type": "server", "timestamp": "2020-11-19T08:57:31,347Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "docker-cluster", "node.name": "bb31f6972abe", "message": "publish_address {localhost/127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}" }
elasticsearch_1  | {"type": "server", "timestamp": "2020-11-19T08:45:31,356Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "docker-cluster", "node.name": "042982e184b2", "message": "Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[graylog_0][0]]]).", "cluster.uuid": "XjGNkRT4RLyy0tbdEHQdeg", "node.id": "KuhVWSrPRb-xY7ED5XrQlA"  }

Steps to Reproduce (for bugs)

Run this docker compose file
Note: I have replaced GRAYLOG_PASSWORD_SECRET, GRAYLOG_ROOT_PASSWORD_SHA2, and GRAYLOG_HTTP_EXTERNAL_URI with the default values from the wiki.

Context

Upgrade from graylog 3 to 4, elasticsearch 6 to 7

Your Environment

Linux 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Graylog Version: 4.0 (docker)
  • Java Version: System version openjdk 11.0.9 2020-10-20, but I think the version in Docker is used instead.
  • Elasticsearch Version: 7.10 (docker)
  • MongoDB Version: 4.4 (docker)
  • Operating System: Ubuntu 20.04.1 LTS
  • Browser version: Irrelevant for this issue
@Ghostbird Ghostbird added the bug label Nov 19, 2020
@dennisoelkers
Copy link
Member

Hey @Ghostbird,

the issue is that Graylog is probing for the ES version used on startup. It goes through the list of configured nodes and gives up if none of them are up. Can you make sure that ES is available before GL is starting up? If not, you can set the elasticsearch_version configuration setting to 7, what effectively disables auto-sensing the version (please see here for details).

@Ghostbird
Copy link
Author

Ghostbird commented Nov 19, 2020

@dennisoelkers Thanks for the quick reply. I saw the elasticsearch_version option, but it is not clear to me, how to set that through a docker-compose file. Could you assist me with that?
How would I delay the GL startup until after elasticsearch is running?

Update: I managed to get it working. It turned out that what I needed to know was right at the top here. Adding GRAYLOG_ELASTICSEARCH_VERSION=7 to the environment section of the graylog container worked. Now it still spams a lot of connection refused errors while the elasticsearch container is still starting, but it doesn't terminate, and eventually runs properly.

What I find most interesting is that the auto-probe mechanism actually worked once, and then it consistently failed, even with completely new docker containers.

I guess this is mostly a problem for docker-compose file version 3 users, since that version demands that containers handle waits for dependency readiness internally.
Another interesting option is to whether it's possible in docker-compose to check the version tag of a dependency, and set the environment configuration to the correct version based on that knowledge.

@dennisoelkers
Copy link
Member

Good to hear that you managed to get it up and running! You are saying:

I guess this is mostly a problem for docker-compose file version 3 users, since that version demands that containers handle waits for dependency readiness internally.

Do you have a pointer for me to read up on this? This might force us to change the mechanism to retry instead of back off and bail out.

@jalogisch
Copy link
Contributor

@dennisoelkers

we might want to include healthcheck command with a custom check on elasticsearch in our documentation docker-compose and all others. That would start Graylog only after a specific command returns "green" and will start Graylog after Elasticsearch and MongoDB are reachable.

I'll work something out as this is not super complicated.

@Ghostbird
Copy link
Author

Ghostbird commented Nov 19, 2020

Good to hear that you managed to get it up and running! You are saying:

I guess this is mostly a problem for docker-compose file version 3 users, since that version demands that containers handle waits for dependency readiness internally.

Do you have a pointer for me to read up on this? This might force us to change the mechanism to retry instead of back off and bail out.

This comment, and the Moby issue comment linked therein give the best summary I could find. Moby is the core of Docker, in case you didn't know. I sure didn't

TL;DR;
Docker-compose version 3 is aimed at Docker Swarm deployments, where individual running containers, will automatically be restarted on failure. Check on liveness are not very sensible in that architecture. All containers should be built with the ability to gracefully handle temporary connection issues.

@dennisoelkers
Copy link
Member

Thanks a lot for the link. I will digest and decide if we should make some changes for one of the next versions. For now, you can stick with elasticsearch_version being explicitly set (in fact, it could even help with ES8, it could work ootb with it), it is just less convenient configuration-wise.

I am closing this issue for now. If you have any additional information, feel free to reopen it.

@Ghostbird
Copy link
Author

Thanks a lot for the link. I will digest and decide if we should make some changes for one of the next versions. For now, you can stick with elasticsearch_version being explicitly set (in fact, it could even help with ES8, it could work ootb with it), it is just less convenient configuration-wise.

I am closing this issue for now. If you have any additional information, feel free to reopen it.

Thanks a lot for the assistance. I'll stick with the GRAYLOG_ELASTICSEARCH_VERSION=7 in the docker compose environment settings. I think it's perfectly fine to close this issue. The remaining use it has now is that other people can find this workaround in case they need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants