Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana stays read only when ES high disk watermark has been exceeded and later gone beneath the limit #13685

Closed
algestam opened this issue Aug 24, 2017 · 22 comments
Labels
Pioneer Program Team:Operations Team label for Operations Team

Comments

@algestam
Copy link

Kibana version: 6.0.0-beta1

Elasticsearch version: 6.0.0-beta1

Server OS version: Ubuntu 16.04.2 LTS

Browser version: Chrome 60.0.3112.90

Browser OS version: Windows 10

Original install method (e.g. download page, yum, from source, etc.): Official tar.gz packages

Description of the problem including expected versus actual behavior:

I'm running a single node Elasticsearch instance, logstash and Kibana. Everything runs on the same host in separate docker containers.

If the high disk watermark is exceeded on the ES host, the following is logged in the elasticsearch log:

[2017-08-24T07:45:11,757][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-08-24T07:45:41,760][WARN ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] flood stage disk watermark [95%] exceeded on [CSOifArqQK-7PBZM_keNoA][CSOifAr][/data/elasticsearch/nodes/0] free: 693.8mb[2.1%], all indice
s on this node will marked read-only

When this has occured, changes to the .kibana index will of course fail as the index cannot be written to. This can be observed by trying to change any setting under Management->Advanced Settings where a change to i.e. search:queryLanguage fails with the message Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

index_read_only

If more disk space now is made available, ES will log that the node has gone under the high watermark:

[2017-08-24T07:47:11,774][INFO ][o.e.c.r.a.DiskThresholdMonitor] [CSOifAr] rerouting shards: [one or more nodes has gone under the high or low watermark]

One would now assume that it would be possible to make changes to Kibana settings but trying to make a settings change still fails with the error message:

Config: Error 403 Forbidden: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Steps to reproduce:

  1. Make sure that setting changes can be performed without errors
  2. Fill up the elasticsearch data disk so that the high disk watermark is exceeded (I used fallocate -l9G largefile)
  3. Verify in the ES log that the high disk watermark has been exceeded and the indices has been marked read-only
  4. Perform a setting change and verify that it fails since writes are prohibited
  5. Resolve the high disk watermark condition (which I did with rm largefile)
  6. Verify that the ES log states that the node has gone under the high disk watermark (and thus should be possible to write to?)
  7. Perform a setting change and it will fail when it actually should succeed.
@Bargs Bargs added the Team:Operations Team label for Operations Team label Sep 6, 2017
@scaarup
Copy link

scaarup commented Sep 29, 2017

So how do I recover from this? .kibana stays in read only no matter what I do. I have tried to snapshot it, delete it and recover it from snapshot - still read only...

@darkpixel
Copy link

I just ran into this on a test machine. For the life of me I can't continue putting data in to the cluster. I finally had to blow away all the involved indices.

@sz3n
Copy link

sz3n commented Nov 20, 2017

i resolved the issue by deleting the .kibana index:
delete /.kibana/
I lose certains configurations/visualizations/dashboards but it dislocked.

@xose
Copy link

xose commented Nov 20, 2017

I just got hit by this. It's not just Kibana, all indexes get locked when the disk threshold is reached and never get unlocked when space is freed.

To unlock all indexes manually:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

@algestam
Copy link
Author

Thanks @xose, I just got hit by this again and was able to recover by using the command you suggested :)

The problem occurred on all indices, not just the .kibana one.

According to the ES logs, the indices was set to read-only due to low disk space on the elasticsearch host. I run a single host with Elasticsearch, Kibana, Logstash dockerized together with some other tools. As this problem affects other indices is think this is more of an Elasticsearch problem and that the problem seen in Kibana is a symptom of another issue.

@saberkun
Copy link

This bug is stupid. Can you Unbreak it for now? At least you should display a warning and list a possible solution. It is really stupid for me to look into js error log and find this thread!

@darkpixel
Copy link

@saberkun You can unbreak it by following the command @xose posted:

curl -XPUT -H "Content-Type: application/json" https://[YOUR_ELASTICSEARCH_ENDPOINT]:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

@saberkun
Copy link

saberkun commented Nov 27, 2017 via email

@darkpixel
Copy link

Can you provide additional information? Did you receive an error when running the command? Did the indices unlock and now you're getting a new error message? What error messages are you seeing in your log files now?

@saberkun
Copy link

saberkun commented Nov 27, 2017 via email

@kesha-antonov
Copy link

+1
Receiving this error after upgrade from 5.5 to 6.0

@purplesrl
Copy link

purplesrl commented Nov 27, 2017

+1

ELK 6, cleared half the drive still read-only, logstash is allowed to write again, kibana remained read-only

Managed to solve the issue with the workaround provided by @xose

@harmenverburg
Copy link

+1, same error for me.

@sangeetawakhale
Copy link

Same issue for me. Got resolved by solution given by @xose.

@patodevilla
Copy link

Same here. All hail @xose.

@darkpixel
Copy link

darkpixel commented Jan 14, 2018

I just upgraded a single-node cluster from 6.0.0 to 6.1.1 (both ES and Kibana). When I started the services back up, Kibana was throwing:

blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

Same as last time--I had to delete the .kibana index to get it back up and going. There was also the current logstash index with one of the shards listed as unallocated. I deleted it as well and then got the usual flood of alerts in.

I didn't run out of space--there's ~92 GB out of 120 GB free on this test machine. The storage location is ZFS and a scrub didn't reveal any data corruption.

The only errors in the log appear to be irrelevant:

[2018-01-13T20:48:14,579][INFO ][o.e.n.Node               ] [ripley1] stopping ...
[2018-01-13T20:48:14,597][ERROR][i.n.u.c.D.rejectedExecution] Failed to submit a listener notification task. Event loop shut down?
java.util.concurrent.RejectedExecutionException: event executor terminated
        at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:821) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:327) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:320) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:746) ~[netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:760) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:428) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:113) [netty-common-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1010) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:825) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1027) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:301) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
        at org.elasticsearch.http.netty4.Netty4HttpChannel.sendResponse(Netty4HttpChannel.java:146) [transport-netty4-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:491) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:380) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:375) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:91) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:908) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onClusterServiceClose(TransportReplicationAction.java:891) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onClusterServiceClose(ClusterStateObserver.java:310) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onClose(ClusterStateObserver.java:230) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterApplierService.doStop(ClusterApplierService.java:168) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.cluster.service.ClusterService.doStop(ClusterService.java:106) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:85) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.stop(Node.java:713) [elasticsearch-6.0.0.jar:6.0.0]
        at org.elasticsearch.node.Node.close(Node.java:735) [elasticsearch-6.0.0.jar:6.0.0]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:89) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.apache.lucene.util.IOUtils.close(IOUtils.java:76) [lucene-core-7.0.1.jar:7.0.1 8d6c3889aa543954424d8ac1dbb3f03bf207140b - sarowe - 2017-10-02 14:36:35]
        at org.elasticsearch.bootstrap.Bootstrap$4.run(Bootstrap.java:185) [elasticsearch-6.0.0.jar:6.0.0]
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] stopped
[2018-01-13T20:48:14,692][INFO ][o.e.n.Node               ] [ripley1] closing ...
[2018-01-13T20:48:14,704][INFO ][o.e.n.Node               ] [ripley1] closed
[2018-01-13T20:48:39,879][INFO ][o.e.n.Node               ] [ripley1] initializing ...
[2018-01-13T20:48:40,054][INFO ][o.e.e.NodeEnvironment    ] [ripley1] using [1] data paths, mounts [[/scratch/elasticsearch (scratch/elasticsearch)]], net usable_space [92.5gb], net total_space [93.6gb], types [zfs]
[2018-01-13T20:48:40,055][INFO ][o.e.e.NodeEnvironment    ] [ripley1] heap size [989.8mb], compressed ordinary object pointers [true]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] node name [ripley1], node ID [TvkaGbQpR5KZ-ZScMZN6AQ]
[2018-01-13T20:48:40,119][INFO ][o.e.n.Node               ] [ripley1] version[6.1.1], pid[6942], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/4.10.0-38-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_151/25.151-b12]
[2018-01-13T20:48:40,120][INFO ][o.e.n.Node               ] [ripley1] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [aggs-matrix-stats]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [analysis-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [ingest-common]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-expression]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-mustache]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [lang-painless]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [mapper-extras]
[2018-01-13T20:48:41,315][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [parent-join]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [percolator]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [reindex]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [repository-url]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [transport-netty4]
[2018-01-13T20:48:41,320][INFO ][o.e.p.PluginsService     ] [ripley1] loaded module [tribe]
[2018-01-13T20:48:41,321][INFO ][o.e.p.PluginsService     ] [ripley1] no plugins loaded
[2018-01-13T20:48:43,801][INFO ][o.e.d.DiscoveryModule    ] [ripley1] using discovery type [zen]
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] initialized
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,587][INFO ][o.e.n.Node               ] [ripley1] starting ...
[2018-01-13T20:48:44,759][INFO ][o.e.t.TransportService   ] [ripley1] publish_address {192.168.42.40:9300}, bound_addresses {[::]:9300}
[2018-01-13T20:48:44,792][INFO ][o.e.b.BootstrapChecks    ] [ripley1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-13T20:48:47,864][INFO ][o.e.c.s.MasterService    ] [ripley1] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}
[2018-01-13T20:48:47,869][INFO ][o.e.c.s.ClusterApplierService] [ripley1] new_master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300}, reason: apply cluster state (from master [master {ripley1}{TvkaGbQpR5KZ-ZScMZN6AQ}{H39AkwwqS_i-fg3Gl5J8QQ}{192.168.42.40}{192.168.42.40:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-13T20:48:47,884][INFO ][o.e.h.n.Netty4HttpServerTransport] [ripley1] publish_address {192.168.42.40:9200}, bound_addresses {[::]:9200}
[2018-01-13T20:48:47,884][INFO ][o.e.n.Node               ] [ripley1] started
[2018-01-13T20:48:48,326][INFO ][o.e.g.GatewayService     ] [ripley1] recovered [6] indices into cluster_state
[2018-01-13T20:49:01,493][INFO ][o.e.c.m.MetaDataDeleteIndexService] [ripley1] [logstash-2018.01.14/D0f_lDkSQpebPFcey6NHFw] deleting index
[2018-01-13T20:49:18,793][INFO ][o.e.c.m.MetaDataCreateIndexService] [ripley1] [logstash-2018.01.14] creating index, cause [auto(bulk api)], templates [logstash-*], shards [5]/[0], mappings []
[2018-01-13T20:49:18,937][INFO ][o.e.c.r.a.AllocationService] [ripley1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2018.01.14][4]] ...]).

@zjhgx
Copy link

zjhgx commented Feb 7, 2018

+1 same error in 6.1.2

@tylersmalley
Copy link
Contributor

This is a function of Elasticsearch. Per the Elasticsearch error, all indices on this node will marked read-only.

To revert this for an index you can set index.blocks.read_only_allow_delete to null.

More information on this can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html

@darkpixel
Copy link

darkpixel commented Mar 27, 2018

FYI - for anyone still running into this, here's a quick one-liner to fix the indices:
curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

@outworlder
Copy link

FYI - for anyone still running into this, here's a quick one-liner to fix the indices:
curl -s -H "Content-Type: application/json" http://localhost:9200/_cat/indices | awk '{ print $3 }' | sort | xargs -L 1 -I{} curl -s -XPUT -H "Content-Type: application/json" http://localhost:9200/{}/_settings -d '{"index.blocks.read_only_allow_delete": null}'

It grabs a list of all the indices in your cluster, then for each one it sends the command to make it not read-only.

I too was doing this until I found @darkpixel 's solution (#13685 (comment))

You can do this setting for _all instead of going one by one. In my case, it takes quite a while to do it for hundreds of indices, while setting on 'all' takes only a few seconds.

curl -XPUT -H "Content-Type: application/json" https://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

@Frank591
Copy link

i resolved the issue by deleting the .kibana index:
delete /.kibana/
I lose certains configurations/visualizations/dashboards but it dislocked.

Thanks a lot for this WA. It's solved problem for me.

@Goahnary
Copy link

Goahnary commented Aug 1, 2019

This worked for me. Both commands were needed to get kabana working after a fresh install:

curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_cluster/settings -d '{ "transient": { "cluster.routing.allocation.disk.threshold_enabled": false } }'
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

This did not require deleting the .kibana index. Works perfectly now!

Source:
https://selleo.com/til/posts/esrgfyxjee-how-to-fix-elasticsearch-forbidden12index-read-only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pioneer Program Team:Operations Team label for Operations Team
Projects
None yet
Development

No branches or pull requests