Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Cannot StackTrace Be Found Using stackTraceId After LiveObject Is Enabled? #928

Open
243300852 opened this issue May 9, 2024 · 11 comments
Labels

Comments

@243300852
Copy link

No description provided.

@243300852
Copy link
Author

After liveobject is enabled,Occasional occurrence stackTrace can't be found in dictionary

@apangin
Copy link
Collaborator

apangin commented May 9, 2024

Thank you for your interest in async-profiler.
To report a bug, please include the following information:

  • OS version, JDK version and async-profiler version;
  • command line to start the profiler;
  • expected behavior versus actual behavior;
  • exact error message if any;
  • sequence of actions that leads to the problem;
  • an example program or any other information that helps to reproduce the issue.

@243300852
Copy link
Author

243300852 commented May 10, 2024

Thanks for the reply,The questions are as follows:

In the case of disable liveobject , modes.LIVEOBJECT = false ,The specific command line code is as follows,The memory collection report is correct.
But in the case of enable liveobject , modes.LIVEOBJECT = true, Occasional Error Reported During Memory Collection. When the backend receives the JFR file for parsing, stackTrace cannot be found in the dictionary using stackTraceId. As a result, the parsing fails and the Java backend reports a null pointer error, as shown in the following figure.

Same situation,Use IDEA to open the incorrect .jfr file and choose Events-java application-Allocation in new TLAB - Stack Trace. Many stack traces are empty, It should be non-empty in the right case.

I tried different JDK versions, such as JDK11 JDK17.0.1 and so on have this problem.

(I can't upload images to github due to permissions.)
The command line code is as follows:

start,jfr=7,jstackdepth=100,threads=true,event=cpu,interval=50ms,alloc=512k,wall=50ms,live

std::stringstream command;
command << "start,jfr=7"
<< ",jstackdepth=" << CNP_Constants::MAX_FRAME_COUNT()
<< ",threads=true";
if (modes.CPU) {
command << ",event=cpu,interval="
<< (timeIntvl.empty() ? std::to_string(CNP_Constants::DEFAULT_TIME_INTERVAL()) : timeIntvl) << "ms";
}
if (modes.MEMORY) {
command << ",alloc="
<< (dataIntvl.empty() ? std::to_string(CNP_Constants::DEFAULT_DATA_INTERVAL()) : dataIntvl) << "k";
}
if (modes.LATENCY) {
command << ",wall="
<< (timeIntvl.empty() ? std::to_string(CNP_Constants::DEFAULT_TIME_INTERVAL()) : timeIntvl) << "ms"
<< ",filter=RANDOM";
}
if (modes.LIVEOBJECT) {
if (CNP_AgentGlobal::jvm_version() >= JVM_VERSION_11) {
command << ",live";
}
}

The Java backend reports the following error:
java.lang.NullPointerException: Cannot read field "methods" because "stackTrace" is null
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.SampleConverter.lambda$getLines$1(SampleConverter.java:161)
at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1220)
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.SampleConverter.getLines(SampleConverter.java:158)
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.SampleConverter.handleSample(SampleConverter.java:151)
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.SampleConverter.handleAllocationSample(SampleConverter.java:132)
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.SampleConverter.parseJfr(SampleConverter.java:185)
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.JfrIngester.handlePData(JfrIngester.java:89)
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.JfrIngester.lambda$receiveData$0(JfrIngester.java:52)
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.WorkerGroup.lambda$submit$0(WorkerGroup.java:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:842)

The Java backend reads the JFR code as follows,error is line StackTrace stackTrace = (StackTrace)stackTraces.get((long)id);

private Line[] getLines(JfrReader jfrReader, int stackTraceId, Dictionary stackTraces) {
return (Line[])this.stackToLine.computeIfAbsent(stackTraceId, (id) -> {
StackTrace stackTrace = (StackTrace)stackTraces.get((long)id);
long[] methods = stackTrace.methods;
byte[] types = stackTrace.types;
Line[] lines = new Line[methods.length];

        for(int i = 0; i < methods.length; ++i) {
            int line = stackTrace.locations[i] >>> 16;
            long methodId = this.getMethodId(types[i], methods[i], jfrReader);
            lines[methods.length - 1 - i] = new Line(methodId, line);
        }

        return lines;
    });
}

@243300852
Copy link
Author

I suspect there is a bug in this line of code. Why is the second parameter of this recordSample 0 when live is enabled?

void ObjectSampler::recordAllocation(jvmtiEnv* jvmti, JNIEnv* jni, EventType event_type,
jobject object, jclass object_klass, jlong size) {
AllocEvent event;
event._total_size = size > _interval ? size : _interval;
event._instance_size = size;
event._class_id = lookupClassId(jvmti, object_klass);

if (_live) {
    u64 trace = Profiler::instance()->recordSample(NULL, 0, event_type, &event);
    live_refs.add(jni, object, size, trace);
} else {
    Profiler::instance()->recordSample(NULL, event._total_size, event_type, &event);
}

}

@apangin
Copy link
Collaborator

apangin commented May 11, 2024

java.lang.NullPointerException: Cannot read field "methods" because "stackTrace" is null
at com.huawei.cloud.profiler.cp.ingester.ingest.jfr.SampleConverter.lambda$getLines$1(SampleConverter.java:161)

This exceptions does not come from async-profiler. It's a part of Huawei software.
Could you please share a .jfr file that you suspect is wrong?

@apangin
Copy link
Collaborator

apangin commented May 12, 2024

I found a case when a LiveObject sample with stackTraceId == 0 could have been recorded. Presumably, this is the same issue as you observe.
I pushed the fix, please check if it helps. If not, please provide a corresponding .jfr file.

@apangin apangin added the bug label May 12, 2024
@243300852
Copy link
Author

243300852 commented May 14, 2024

rightjfrfile.zip
This is a correct jfr file. After the jfr file is parsed using IDEA, each stack trace is correct.
wrongjfrfile.zip
This is a jfr file with liveobject enabled. After parsing the jfr file with IDEA, you can see that some of the Stack Trace is null. I'm not sure if this null is a bug.
Parsing Mode:Opening a JFR File Using IDEA,Choose Events->Events by type -> Java Application -> Allocation in new TLAB, to open the Stack Trace on the right. The error is that the Stack Trace of some records is empty.

@apangin
Copy link
Collaborator

apangin commented May 15, 2024

OK, I understand the problem. Both of these .jfr files are valid, but the second one indeed has allocation events without stack traces. The reason is that async-profiler does not currently support "normal" allocation profiling together with "live" object profiling. When live option is set, only profiler.LiveObject events are relevant, but not jdk.ObjectAllocationInNewTLAB. This is the limitation of the current design, but this may be fixed in the future.

@243300852
Copy link
Author

243300852 commented May 16, 2024

Thank you, I understand. I found that changing the second parameter of the function (recordSample(NULL, 0, event_type, &event)) from 0 to event. _total_size, which fixes this problem. Could you please change it like this?

void ObjectSampler::recordAllocation(jvmtiEnv* jvmti, JNIEnv* jni, EventType event_type, jobject object, jclass object_klass, jlong size) {
AllocEvent event;
event._total_size = size > _interval ? size : _interval;
event._instance_size = size;
event._class_id = lookupClassId(jvmti, object_klass);
if (_live) {
u64 trace = Profiler::instance()->recordSample(NULL, 0, event_type, &event);
live_refs.add(jni, object, size, trace);
} else {
Profiler::instance()->recordSample(NULL, event._total_size, event_type, &event);
}

@apangin
Copy link
Collaborator

apangin commented May 16, 2024

This will break live object profiling.
I know how to support both allocation and live object profiling at the same time, but the fix is not that trivial.

@243300852
Copy link
Author

Thank you. I look forward to solving this problem soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants