Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Can't change temp directory using environment variables #71

Open
shawnz opened this issue Sep 4, 2019 · 27 comments
Open

Can't change temp directory using environment variables #71

shawnz opened this issue Sep 4, 2019 · 27 comments
Assignees
Labels
enhancement New feature or request PerformanceAnalyzer Performance Analyzer related

Comments

@shawnz
Copy link

shawnz commented Sep 4, 2019

Hi there, as documented in issue #70 it is not possible to use the performance-analyzer-agent-cli with a temp directory marked noexec. The elasticsearch startup scripts provide the possibility of using the "ES_TMPDIR" variable to override the temp directory path, however this does not work for the performance-analyzer-agent-cli script because using ES_TMPDIR causes ES_JAVA_OPTS to be clobbered.

See how on line 9 of performance-analyzer-agent-cli, ES_JAVA_OPTS is set with values necessary to boot the performance analyzer, before calling elasticsearch-cli:

https://github.com/opendistro-for-elasticsearch/performance-analyzer/blob/master/packaging/performance-analyzer-agent-cli#L9

Elasticsearch-cli then calls elasticsearch-env to set up the environment:

https://github.com/elastic/elasticsearch/blob/v7.1.1/distribution/src/bin/elasticsearch-cli#L5

Finally elasticsearch-env clobbers ES_JAVA_OPTS when ES_TMPDIR is set:

https://github.com/elastic/elasticsearch/blob/v7.1.1/distribution/src/bin/elasticsearch-env#L88

Note that this ES_TMPDIR is gone in the latest version of the scripts, so maybe they now intend for us to use ES_JAVA_OPTS to set java.io.tmpdir with a -D parameter. However this approach has a different problem: when /etc/sysconfig/elasticsearch is sourced in elasticsearch-env, it similarly clobbers the ES_JAVA_OPTS that performance-analyzer-agent-cli defines. Thus the settings from PA_AGENT_JAVA_OPTS, such as the log4j configuration file, are lost.

The workaround which we are using, as described in issue #70, is to edit the performance-analyzer-agent-cli script and add the string -Djava.io.tmpdir=/apps/elasticsearch/tmp to the end of PA_AGENT_JAVA_OPTS (where /apps/ is a volume that we control). We then also set the other components of elasticsearch to use that same temp directory with the ES_TMPDIR variable (NOT the ES_JAVA_OPTS variable, otherwise the PA_AGENT_JAVA_OPTS get lost as noted above).

@shawnz
Copy link
Author

shawnz commented Sep 4, 2019

More info: It seems like they still do use the ES_TMPDIR in new versions, but only for the elasticsearch binary itself and not the other cli scripts, since the other scripts do not need a temp directory. Perhaps the performance-analyzer-agent-cli script could adopt a similar approach since it similarly requires a temp directory.

See: elastic/elasticsearch@f97606e

@aesgithub aesgithub added enhancement New feature or request PerformanceAnalyzer Performance Analyzer related labels Sep 4, 2019
@sendkb
Copy link
Contributor

sendkb commented Sep 18, 2019

Thanks @shawnz for reporting this issue. We will update the CLI to honor ES_TMPDIR if it is set. We are also thinking of using PA_AGENT_JAVA_OPTS env variable as-is if it is set. Let us know what you think about setting PA_AGENT_JAVA_OPTS from your end.

@shawnz
Copy link
Author

shawnz commented Sep 18, 2019

Hi @sendkb, thank you for your time looking into this issue. Either option would be just fine for me.

@shawnz
Copy link
Author

shawnz commented Aug 27, 2020

Just a follow up to your previous question: it would indeed be helpful to override other parts of PA_AGENT_JAVA_OPTS as well because that would also let me change the log4j.configurationFile property.

This has recently become an important concern for me because the default location of the log4j config file is such that the file gets overwritten after any elasticsearch updates. So if I could instead put the log4j configuration in a location of my choosing, it would work around that issue.

@rdecuir
Copy link

rdecuir commented Nov 6, 2020

Is there any update on this? I am in the same boat as the OP here. We are using SELinux, marking our tmp dirs as NOEXEC, which is leading to elasticsearch.service failing to start up. .We're in AWS using centos 7 VMs, and installing opendistro by rpm.

Adding -Djava.io.tmpdir=/etc/elasticsearch/tmp to the end of PA_AGENT_JAVA_OPT and we are still unable to get past this issue. It looks like no matter where I place or set $ES_TMPDIR or "-Djava.io.tmpdir=/etc/elasticsearch/tmp", elasticsearch ignores the setting and always attempts to write to /tmp for something or another.

Any place that I have defined the new temp directy and gotten elasticsearch to start up
logs complains about JNA failing, and the show that the java option Djava.io.tmpdir is defined twice. Ex. /etc/sysconfig/elasticsearch and at the top of the jvm.options.

I tried adding "ES_TMPDIR=/etc/elasticsearch/tmp" to the /etc/sysconfig/elasticsearch file, which does seem to get the logs to show that the change was honored, but that causes the JVM to fail and elasticsearch fails to come up.

Unless I mark the /tmp directory as executable which is not a viable solution for us, nothing I have tried has seemed to work. Is there a time frame for this fix or another work around I could try? We cant move forward transitioning to OpenDistro if we have to have our tmp directory execuable.

Thanks!

@shawnz
Copy link
Author

shawnz commented Nov 6, 2020

Hey there @rdecuir , there are actually two issues at play here. Elasticsearch fails to start with a noexec tmp directory, and also the performance analyzer agent fails to start with a noexec temp directory. In order to fix both problems, my setup is as follows:

  • Make the change to performance-analyzer-agent-cli to fix the performance analyzer problem, AND

  • Add ES_TMPDIR=/whatever to the /etc/sysconfig/elasticsearch file. DO NOT override ES_JAVA_OPTS in there or I believe that may negate the change.

So what you are describing should be working as far as I am aware. Can you show me the error message? I am wondering if the problem is maybe that "/etc/elasticsearch" is not a suitable location. I am not sure why but in the past I have had issues where elasticsearch refuses to start if I put foreign files in there.

@rdecuir
Copy link

rdecuir commented Nov 6, 2020

Thanks for getting back to me! @shawnz error message below, and I did try a few things based on your suggestion.

  • I did create a new tmp location "/opt/elasticsearch/tmp", chowned that to the elasticsearch user.

  • Added ES_TMPDIR=/opt/elasticsearch/tmp to /etc/sysconfig/elasticsearch

  • Commented out ES_JAVA_OPTS= in /etc/sysconfig/elasticsearch

  • Added -Djava.io.tmpdir=/opt/elasticsearch/tmp to performance-analyzer-agent-cl

But still when I run systemctl start elasticsearch.service it fails to start, I even tried as the elasticsearch user to kick off .../bin/elasticsearch, to try elasticsearch without the performance-anaylzer, but still see the same failure.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f420971ec95, pid=24354, tid=24522
#
# JRE version: OpenJDK Runtime Environment AdoptOpenJDK (14.0.1+7) (build 14.0.1+7)
# Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (14.0.1+7, mixed mode, tiered, g1 gc, linux-amd64)
# Problematic frame:
# C  [jna8794642219682927580.tmp+0x12c95]  ffi_prep_closure_loc+0x15
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://github.com/AdoptOpenJDK/openjdk-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,COMPAT -Xms32g -Xmx32g -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Djava.io.tmpdir=/opt/elasticsearch/tmp -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Dclk.tck=100 -Djdk.attach.allowAttachSelf=true -Djava.security.policy=file:///usr/share/elasticsearch/plugins/opendistro_performance_analyzer/pa_config/es_security.policy -XX:MaxDirectMemorySize=17179869184 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=oss -Des.distribution.type=rpm -Des.bundled_jdk=true org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet

Host: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 8 cores, 61G, CentOS Linux release 7.8.2003 (Core)
Time: Fri Nov  6 16:00:55 2020 UTC elapsed time: 2 seconds (0d 0h 0m 2s)

---------------  T H R E A D  ---------------

Current thread (0x00007f4aa802a800):  JavaThread "main" [_thread_in_native, id=24522, stack(0x00007f4aafffc000,0x00007f4ab00fd000)]

Stack: [0x00007f4aafffc000,0x00007f4ab00fd000],  sp=0x00007f4ab00fa3a0,  free space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [jna8794642219682927580.tmp+0x12c95]  ffi_prep_closure_loc+0x15
C  [jna8794642219682927580.tmp+0xa2ac]  Java_com_sun_jna_Native_registerMethod+0x51c
j  com.sun.jna.Native.registerMethod(Ljava/lang/Class;Ljava/lang/String;Ljava/lang/String;[I[J[JIJJLjava/lang/reflect/Method;JIZ[Lcom/sun/jna/ToNativeConverter;Lcom/sun/jna/FromNativeConverter;Ljava/lang/String;)J+0
j  com.sun.jna.Native.register(Ljava/lang/Class;Lcom/sun/jna/NativeLibrary;)V+1159
j  com.sun.jna.Native.register(Ljava/lang/Class;Ljava/lang/String;)V+17
j  com.sun.jna.Native.register(Ljava/lang/String;)V+7
j  org.elasticsearch.bootstrap.JNACLibrary.<clinit>()V+73
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7e9b8b]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3ab
V  [libjvm.so+0x7c68ad]  InstanceKlass::call_class_initializer(Thread*)+0x21d
V  [libjvm.so+0x7c6e66]  InstanceKlass::initialize_impl(Thread*)+0x556
V  [libjvm.so+0xa17a50]  LinkResolver::resolve_static_call(CallInfo&, LinkInfo const&, bool, Thread*)+0x440
V  [libjvm.so+0xa1cb68]  LinkResolver::resolve_invoke(CallInfo&, Handle, constantPoolHandle const&, int, Bytecodes::Code, Thread*)+0x498
V  [libjvm.so+0x7e4d43]  InterpreterRuntime::resolve_invoke(JavaThread*, Bytecodes::Code)+0x2f3
V  [libjvm.so+0x7e5295]  InterpreterRuntime::resolve_from_cache(JavaThread*, Bytecodes::Code)+0x105
j  org.elasticsearch.bootstrap.JNANatives.definitelyRunningAsRoot()Z+8
j  org.elasticsearch.bootstrap.Natives.definitelyRunningAsRoot()Z+18
j  org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Ljava/nio/file/Path;ZZZ)V+7
j  org.elasticsearch.bootstrap.Bootstrap.setup(ZLorg/elasticsearch/env/Environment;)V+72
j  org.elasticsearch.bootstrap.Bootstrap.init(ZLjava/nio/file/Path;ZLorg/elasticsearch/env/Environment;)V+237
j  org.elasticsearch.bootstrap.Elasticsearch.init(ZLjava/nio/file/Path;ZLorg/elasticsearch/env/Environment;)V+13
j  org.elasticsearch.bootstrap.Elasticsearch.execute(Lorg/elasticsearch/cli/Terminal;Ljoptsimple/OptionSet;Lorg/elasticsearch/env/Environment;)V+204
j  org.elasticsearch.cli.EnvironmentAwareCommand.execute(Lorg/elasticsearch/cli/Terminal;Ljoptsimple/OptionSet;)V+218
j  org.elasticsearch.cli.Command.mainWithoutErrorHandling([Ljava/lang/String;Lorg/elasticsearch/cli/Terminal;)V+79
j  org.elasticsearch.cli.Command.main([Ljava/lang/String;Lorg/elasticsearch/cli/Terminal;)I+47
j  org.elasticsearch.bootstrap.Elasticsearch.main([Ljava/lang/String;Lorg/elasticsearch/bootstrap/Elasticsearch;Lorg/elasticsearch/cli/Terminal;)I+3
j  org.elasticsearch.bootstrap.Elasticsearch.main([Ljava/lang/String;)V+29
v  ~StubRoutines::call_stub
V  [libjvm.so+0x7e9b8b]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x3ab
V  [libjvm.so+0x8760fd]  jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) [clone .isra.118] [clone .constprop.255]+0x34d
V  [libjvm.so+0x878754]  jni_CallStaticVoidMethod+0x164
C  [libjli.so+0x58fd]  JavaMain+0xe4d
C  [libjli.so+0x9019]  ThreadJavaMain+0x9

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.sun.jna.Native.registerMethod(Ljava/lang/Class;Ljava/lang/String;Ljava/lang/String;[I[J[JIJJLjava/lang/reflect/Method;JIZ[Lcom/sun/jna/ToNativeConverter;Lcom/sun/jna/FromNativeConverter;Ljava/lang/String;)J+0
j  com.sun.jna.Native.register(Ljava/lang/Class;Lcom/sun/jna/NativeLibrary;)V+1159
j  com.sun.jna.Native.register(Ljava/lang/Class;Ljava/lang/String;)V+17
j  com.sun.jna.Native.register(Ljava/lang/String;)V+7
j  org.elasticsearch.bootstrap.JNACLibrary.<clinit>()V+73
v  ~StubRoutines::call_stub
j  org.elasticsearch.bootstrap.JNANatives.definitelyRunningAsRoot()Z+8
j  org.elasticsearch.bootstrap.Natives.definitelyRunningAsRoot()Z+18
j  org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Ljava/nio/file/Path;ZZZ)V+7
j  org.elasticsearch.bootstrap.Bootstrap.setup(ZLorg/elasticsearch/env/Environment;)V+72
j  org.elasticsearch.bootstrap.Bootstrap.init(ZLjava/nio/file/Path;ZLorg/elasticsearch/env/Environment;)V+237
j  org.elasticsearch.bootstrap.Elasticsearch.init(ZLjava/nio/file/Path;ZLorg/elasticsearch/env/Environment;)V+13
j  org.elasticsearch.bootstrap.Elasticsearch.execute(Lorg/elasticsearch/cli/Terminal;Ljoptsimple/OptionSet;Lorg/elasticsearch/env/Environment;)V+204
j  org.elasticsearch.cli.EnvironmentAwareCommand.execute(Lorg/elasticsearch/cli/Terminal;Ljoptsimple/OptionSet;)V+218
j  org.elasticsearch.cli.Command.mainWithoutErrorHandling([Ljava/lang/String;Lorg/elasticsearch/cli/Terminal;)V+79
j  org.elasticsearch.cli.Command.main([Ljava/lang/String;Lorg/elasticsearch/cli/Terminal;)I+47
j  org.elasticsearch.bootstrap.Elasticsearch.main([Ljava/lang/String;Lorg/elasticsearch/bootstrap/Elasticsearch;Lorg/elasticsearch/cli/Terminal;)I+3
j  org.elasticsearch.bootstrap.Elasticsearch.main([Ljava/lang/String;)V+29
v  ~StubRoutines::call_stub

@shawnz
Copy link
Author

shawnz commented Nov 6, 2020

Interesting. Based on this https://discuss.elastic.co/t/elasticsearch-service-getting-aborted/221857/2 maybe try also setting jna.tmpdir.

I guess that would mean, altogether in /etc/sysconfig/elasticsearch, you would have to comment out ES_TMPDIR and instead set ES_JAVA_OPTS to -Djava.io.tmpdir=/opt/elasticsearch/tmp -Djna.tmpdir=/opt/elasticsearch/tmp. Does that work? Not sure why the necessary options would be different between our configurations though.

I will also note that I am only using JDK 11, not sure if that is relevant to the issue. I am also using Shenandoah GC instead of G1 GC.

EDIT: I also notice you are using 32gb of heap which is not really recommended (although I'm sure that's not related). See here: https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html (note the section about compressed oops)

@jordanenglish
Copy link

jordanenglish commented Nov 6, 2020

Hi @shawnz and @rdecuir. I just started following this issue yesterday as I ran into he same problem with the noexec on tmp.

Here are the changes I had to make based on @shawnz workaround.

/etc/elasticsearch/tmp is the directory I've chosen to use for this. My efforts were to try and try Wazuh installed.

/etc/sysconfig/elasticsearch

# Additional Java OPTS
ES_JAVA_OPTS=-'Djava.io.tmpdir=/etc/elasticsearch/tmp'

/usr/share/elasticsearch/bin/performance-analyzer-agent-cli

[   "$@" -Djava.io.tmpdir=/etc/elasticsearch/tmp]

@rdecuir
Copy link

rdecuir commented Nov 6, 2020

Thanks for the suggestion @shawnz and @jordan. Tried both suggestions and some other things and now trying to narrow the workflow to target issues.

Since its both a performance analyzer and elasticsearch problem I've now started testing the following:

I now become the elasticsearch user and am executing the /usr/share/elasticsearch/bin/elasticsearch script.

What I have noticed is:

  • Setting the vim /etc/sysconfig/elasticsearch file with JAVA_OPTS="-Djava.io.tmpdir=/opt/elasticsearch/tmp -Djna.tmpdir=/opt/elasticsearch/tmp" seems to not make a difference, elasticsearch will start up and immediately complain about JNA, error at bottom of post. And of course the logs show that the JVM args have tmp set as -Djava.io.tmpdir=/tmp/elasticsearch-0321764032184763, not /opt/elasticsearch/tmp

  • Setting ES_TMPDIR=/opt/elasticsearch/tmp either in the /etc/sysconfig/elasticsearch file, or one the comandline ex. ES_TMPDIR=/opt/elasticsearch/tmp /usr/share/elasticsearch/bin/elasticsearch, will cause elasticsearch not to start up and the logs show that the JVM args have tmp set for the correct -Djava.io.tmpdir=/opt/elasticsearch/tmp

Figured working just with elasticsearch might be easier to start with but still having issues. It literally seems like when elasticsearch does pick up and change the tmp dir to what I want, assigning a new tmp dir it breaks elasticsearch.

For more infor:

  • working with opendistroforelasticsearch-1.11.0

  • installed java-1.8.0-openjdk-devel on the box as per their documentation

  • added the link from docs ln -s /usr/lib/jvm/java-1.8.0/lib/tools.jar /usr/share/elasticsearch/lib/

  • Can tell by logs opendistro elasticsearch is using its own java 14 library

Any more suggestions or ideas are appreciated, thank you so much for you help thus far as its driving me nuts haha.

Error when I don't set ES_TMPDIR, regardless if I set Djava.io.tmpdir=/opt/elasticsearch/tmp and/or -Djna.tmpdir=/opt/elasticsearch/tmp:

unable to load JNA native support library, native methods will be disabled.
java.lang.UnsatisfiedLinkError: /tmp/elasticsearch-14499571148581708342/jna7185666673918243058.tmp: /tmp/elasticsearch-14499571148581708342/jna7185666673918243058.tmp: failed to map segment from shared object: Operation not permitted
        at java.lang.ClassLoader$NativeLibrary.load0(Native Method) ~[?:?]
        at java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2452) ~[?:?]
        at java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2508) ~[?:?]
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2704) ~[?:?]
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2637) ~[?:?]
        at java.lang.Runtime.load0(Runtime.java:745) ~[?:?]
        at java.lang.System.load(System.java:1871) ~[?:?]
        at com.sun.jna.Native.loadNativeDispatchLibraryFromClasspath(Native.java:1018) ~[jna-5.5.0.jar:5.5.0 (b0)]
        at com.sun.jna.Native.loadNativeDispatchLibrary(Native.java:988) ~[jna-5.5.0.jar:5.5.0 (b0)]
        at com.sun.jna.Native.<clinit>(Native.java:195) ~[jna-5.5.0.jar:5.5.0 (b0)]
        at java.lang.Class.forName0(Native Method) ~[?:?]
        at java.lang.Class.forName(Class.java:340) ~[?:?]
        at org.elasticsearch.bootstrap.Natives.<clinit>(Natives.java:45) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:110) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:178) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) [elasticsearch-cli-7.9.1.jar:7.9.1]
        at org.elasticsearch.cli.Command.main(Command.java:90) [elasticsearch-cli-7.9.1.jar:7.9.1]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) [elasticsearch-7.9.1.jar:7.9.1]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) [elasticsearch-7.9.1.jar:7.9.1]

@jordanenglish
Copy link

@rdecuir - do you have the elasticsearch service running? What happens if you start the service as opposed to starting the binary as the user?

This is what journalctl shows when I start my service:

Nov 06 19:19:31 $HOSTNAME systemd[1]: Started Opendistro for Elasticsearch Performance Analyzer.
Nov 06 19:19:31 $HOSTNAME systemd[1]: Starting Elasticsearch...
Nov 06 19:19:32 $HOSTNAME performance-analyzer-agent-cli[3912882]: ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2
Nov 06 19:19:35 $HOSTNAME performance-analyzer-agent-cli[3912882]: 19:19:35.781 [pa-reader] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/elasticsearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/batch_metrics_enabled.conf
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: Nov 06, 2020 7:19:36 PM org.jooq.tools.JooqLogger info
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: INFO:
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]:
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@  @@        @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@@@@@        @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@  @@  @@    @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@  @@@@  @@  @@    @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@        @@        @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@        @@        @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@    @@  @@  @@@@  @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@        @@  @  @  @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@        @@        @@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@@@@@@@@  @@@@@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  Thank you for using jOOQ 3.10.8
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]:
Nov 06 19:19:36 $HOSTNAME performance-analyzer-agent-cli[3912882]: 19:19:36.485 [pa-reader] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/elasticsearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/batch_metrics_enabled.conf
Nov 06 19:19:38 $HOSTNAME performance-analyzer-agent-cli[3912882]: 19:19:38.408 [pa-reader] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/elasticsearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/batch_metrics_enabled.conf

@rdecuir
Copy link

rdecuir commented Nov 6, 2020

@jordanenglish so, if I attempt to run the service it fails and logs look like so:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007facb11b4c95, pid=10437, tid=10603
#
# JRE version: OpenJDK Runtime Environment AdoptOpenJDK (14.0.1+7) (build 14.0.1+7)
# Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (14.0.1+7, mixed mode, tiered, g1 gc, linux-amd64)
# Problematic frame:
# C  [jna3963141006250589072.tmp+0x12c95]  ffi_prep_closure_loc+0x15
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://github.com/AdoptOpenJDK/openjdk-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,COMPAT -Xms32g -Xmx32g -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Djava.io.tmpdir=/opt/elasticsearch/tmp -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Dclk.tck=100 -Djdk.attach.allowAttachSelf=true -Djava.security.policy=file:///usr/share/elasticsearch/plugins/opendistro_performance_analyzer/pa_config/es_security.policy -XX:MaxDirectMemorySize=17179869184 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=oss -Des.distribution.type=rpm -Des.bundled_jdk=true org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet

Host: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, 8 cores, 61G, CentOS Linux release 7.8.2003 (Core)
Time: Fri Nov  6 19:26:01 2020 UTC elapsed time: 2 seconds (0d 0h 0m 2s)

@y0d4a
Copy link

y0d4a commented Dec 1, 2020

hi, i got same problem..
on new centos installation i try untended install and after script install it all, he cannot start elastic with same reason:

Dec 01 15:11:20 vlxwazuhtest performance-analyzer-agent-cli[8152]: 15:11:20.819 [pa-reader] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/elasticsearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/batch_metrics_enabled.conf
Dec 01 15:11:23 vlxwazuhtest performance-analyzer-agent-cli[8152]: 15:11:23.274 [pa-reader] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/elasticsearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/batch_metrics_enabled.conf

that file do not exist, in data folder there is only rca_enabled.conf file

i try to create that file with true and false, and i got new errors, so seems that is not only the problem..

@baldy2811
Copy link

hi, i got same problem..
on new centos installation i try untended install and after script install it all, he cannot start elastic with same reason:

Dec 01 15:11:20 vlxwazuhtest performance-analyzer-agent-cli[8152]: 15:11:20.819 [pa-reader] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/elasticsearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/batch_metrics_enabled.conf
Dec 01 15:11:23 vlxwazuhtest performance-analyzer-agent-cli[8152]: 15:11:23.274 [pa-reader] ERROR com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ReaderMetricsProcessor - Error reading file '/usr/share/elasticsearch/data/batch_metrics_enabled.conf': java.nio.file.NoSuchFileException: /usr/share/elasticsearch/data/batch_metrics_enabled.conf

that file do not exist, in data folder there is only rca_enabled.conf file

i try to create that file with true and false, and i got new errors, so seems that is not only the problem..

did you find a fix for that? i got the same Problem right now after a fresh install on a Ubuntu

@rdecuir
Copy link

rdecuir commented Dec 30, 2020

Ok so here is what I did to get this to work:

Created dir: /etc/elasticsearch/tmp

Configured the following files:

/etc/sysconfig/elasticsearch

################################
# Elasticsearch
################################

# Elasticsearch home directory
#ES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
#JAVA_HOME=

# Elasticsearch configuration directory
# Note: this setting will be shared with command-line tools
ES_PATH_CONF=/etc/elasticsearch

# Elasticsearch PID directory
#PID_DIR=/var/run/elasticsearch

# Additional Java OPTS
ES_TMPDIR=/etc/elasticsearch/tmp
TMPDIR=/etc/elasticsearch/tmp

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

################################
# Elasticsearch service
################################

# SysV init.d
#
# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process
ES_STARTUP_SLEEP_TIME=5

################################
# System properties
################################

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence
#MAX_OPEN_FILES=65535

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.memory_lock: true' option
# in elasticsearch.yml.
# When using systemd, LimitMEMLOCK must be set in a unit file such as
# /etc/systemd/system/elasticsearch.service.d/override.conf.
#MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf
#MAX_MAP_COUNT=262144

/etc/systemd/system/elasticsearch.service.d/elasticsearch.conf

[Service]
LimitMEMLOCK=infinity

/usr/share/elasticsearch/bin/performance-analyzer-agent-cli

#!/bin/sh

PA_AGENT_JAVA_OPTS="-Dlog4j.configurationFile=$ES_HOME/plugins/opendistro_performance_analyzer/pa_config/log4j2.xml \
              -Xms64M -Xmx64M -XX:+UseSerialGC -XX:CICompilerCount=1 -XX:-TieredCompilation -XX:InitialCodeCacheSize=4096 \
              -XX:InitialBootClassLoaderMetaspaceSize=30720 -XX:MaxRAM=400m -Djna.tmpdir=/etc/elasticsearch/tmp -Djava.io.tmpdir=/etc/elasticsearch/tmp"

ES_MAIN_CLASS="com.amazon.opendistro.elasticsearch.performanceanalyzer.PerformanceAnalyzerApp" \
ES_ADDITIONAL_CLASSPATH_DIRECTORIES=performance-analyzer-rca/lib \
ES_JAVA_OPTS=$PA_AGENT_JAVA_OPTS \
 $ES_HOME/bin/elasticsearch-cli \
   "$@"

/etc/elasticsearch/jvm.options

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms32g
-Xmx32g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

## JVM temporary directory
-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=/var/lib/elasticsearch

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log

## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m

## OpenDistro Performance Analyzer
-Dclk.tck=100
-Djdk.attach.allowAttachSelf=true
-Djava.security.policy=file:///usr/share/elasticsearch/plugins/opendistro_performance_analyzer/pa_config/es_security.policy

@jordanenglish
Copy link

I'm not having any issues at the moment @rdecuir, but do you have a reason for your 32g in Java? I'm just curious.

your /etc/elasticsearch/jvm.options

-Xms32g
-Xmx32g

my /etc/elasticsearch/jvm.options

-Xms1g
-Xmx1g

Were you seeing performance issues that required you to increase this?

@shawnz
Copy link
Author

shawnz commented Dec 30, 2020

Note that it is a bad idea to set the java heap above ~30 GB because you lose out on an important optimization called compressed oops. You will likely get worse performance by doing that unless you set it MUCH larger than 30gb, for example 50-60gb.

Elastic generally recommends setting the min/max heap size to be half the physical ram on the system, but not more than ~30gb (assuming elasticsearch is all that's running on the system).

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#heap-size-settings

@rdecuir
Copy link

rdecuir commented Dec 30, 2020

Note that it is a bad idea to set the java heap above ~30 GB because you lose out on an important optimization called compressed oops. You will likely get worse performance by doing that unless you set it MUCH larger than 30gb, for example 50-60gb.

Elastic generally recommends setting the min/max heap size to be half the physical ram on the system, but not more than ~30gb (assuming elasticsearch is all that's running on the system).

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#heap-size-settings

The machines that are running elasticsearch are 64gb machines so we configured them to half that number. From what you are saying would you suggest trying to lower that to 30gb then? I will read the link shortly. We ingest 5-10TB into ES a day with a 20+ cluster.

@rdecuir
Copy link

rdecuir commented Dec 30, 2020

I'm not having any issues at the moment @rdecuir, but do you have a reason for your 32g in Java? I'm just curious.

your /etc/elasticsearch/jvm.options

-Xms32g
-Xmx32g

my /etc/elasticsearch/jvm.options

-Xms1g
-Xmx1g

Were you seeing performance issues that required you to increase this?

Mostly using previous configurations that were determined before my time, that made sense to me given my limited ES "tuning" experience.

@shawnz
Copy link
Author

shawnz commented Dec 30, 2020

Indeed if you have 64gb of ram you will probably get better performance to use slightly less than 32gb of heap, so that your JVM is eligible to use compressed oops.

Compressed oops basically allows Java to use 32-bit pointers even on 64-bit systems, thus saving RAM. Since Java is very object-heavy, cutting the size of pointers in half can provide huge RAM savings in a large Java application.

If you are not able to use compressed oops, then you might expect almost double the memory usage in some cases (that's why it's not recommended to exceed 32GB of heap unless you can exceed it by a huge amount, like 64GB).

@rdecuir
Copy link

rdecuir commented Dec 30, 2020

Indeed if you have 64gb of ram you will probably get better performance to use slightly less than 32gb of heap, so that your JVM is eligible to use compressed oops.

Compressed oops basically allows Java to use 32-bit pointers even on 64-bit systems, thus saving RAM. Since Java is very object-heavy, cutting the size of pointers in half can provide huge RAM savings in a large Java application.

If you are not able to use compressed oops, then you might expect almost double the memory usage in some cases (that's why it's not recommended to exceed 32GB of heap unless you can exceed it by a huge amount, like 64GB).

Thank you, after I read more of the link you sent I'll drop our setting down and give it a shot.

@jordanenglish
Copy link

@rdecuir @shawnz - since you guys are using noexec on /tmp, can I assume you also have FIPS enabled on your Elasticsearch hosts?

@rdecuir
Copy link

rdecuir commented May 12, 2021

@jordanenglish correct, we have FIPS enabled.

@jordanenglish
Copy link

@jordanenglish correct, we have FIPS enabled.

Can you elaborate on your setup some? I had to get a new ODFE server stood up and can't get it working unless I disable FIPS on RHEL8.

@rdecuir
Copy link

rdecuir commented May 13, 2021

@jordanenglish correct, we have FIPS enabled.

Can you elaborate on your setup some? I had to get a new ODFE server stood up and can't get it working unless I disable FIPS on RHEL8.

So, We are running on a AWS with FIPS enabled centos7 boxes. It is required that we do not disable FIPS and keep noexec on the /temp dir.

I've installed opendistro via yum after adding it to our repo. I got i working which I provided details above in the comments. It took a while of touching everything but it resolved the issues I was seeing.

Let me know if you need more details or what you're looking for, not sure I can help in a different environment but I can def tell you what I went through and my process.

@jordanenglish
Copy link

@jordanenglish correct, we have FIPS enabled.

Can you elaborate on your setup some? I had to get a new ODFE server stood up and can't get it working unless I disable FIPS on RHEL8.

So, We are running on a AWS with FIPS enabled centos7 boxes. It is required that we do not disable FIPS and keep noexec on the /temp dir.

I've installed opendistro via yum after adding it to our repo. I got i working which I provided details above in the comments. It took a while of touching everything but it resolved the issues I was seeing.

Let me know if you need more details or what you're looking for, not sure I can help in a different environment but I can def tell you what I went through and my process.

Thanks for your feedback. I think that RHEL7 handles signatures on unsigned RPMs differently than RHEL8. Even with the repo available on my RHEL8 box I cannot do a yum install opendistroforelasticsearch

I have to do the following:

# yum install --downloadonly --destdir /tmp/odfe opendistroforelasticsearch -y
# rpm -ivh --nodigest --nofiledigest /tmp/odfe/*.rpm

@Darkentik
Copy link

I think the better way should be to have a config file for the plugin instead of cheating around with env variable and modifying stuff in service files.
The same problem is mentioned here in other issue: #206

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request PerformanceAnalyzer Performance Analyzer related
Projects
None yet
Development

No branches or pull requests

9 participants