OOM when working with large files because of huge DocumentSymbols response #815

sebthom · 2024-02-29T19:42:14Z

While working on #703 LSSymbolsContentProvider populating the outline view freezes UI for minutes for large files I am now facing the issue that when loading large JSON on log files (in my case 23MB with 11k lines) my Eclipse IDE with 3GB heap exits with an OOM while rendering the outline.

I traced the issue down to the fact that the JSON language server responds to the DocumentSymbols request with a whooping 160MB large JSON response which is then processed by the lsp4j's ConcurrentMessageProcessor + StreamMessageProducer. Unfortunately the StreamMessageProducer isn't really stream processing but tries to deserialize the JSON string DOM-style like into a huge object with countless of GSON LinkedTreeMap instances which eventually trigger the OOM.

A first optimization I see is to avoid this string allocation and instead pass an InputStreamReader instance at:

lsp4j/org.eclipse.lsp4j.jsonrpc/src/main/java/org/eclipse/lsp4j/jsonrpc/json/StreamMessageProducer.java

Lines 191 to 193 in c1d4cd3

    
           String content = new String(buffer, headers.charset); 
        
           try { 
        
           	Message message = jsonHandler.parseMessage(content);

In my case that would avoid the temporary blocking 160MB of heap and make it available for the JSON processing.

However ultimately this will not be enough. The main issue here is that the DocumentSymbols request does not support batching microsoft/language-server-protocol#1533

So I was wondering if it wouldn't make sense in LSP4J to not process incoming messages of a certain size and handle them gracefully with a RuntimeException that is properly logged.

Btw. I did not encounter these OOMs half a year ago when I opened the eclipse/lsp4e#703. At that time I was using Eclipse 2023-06 and the respective lsp4j version. Can it be that lsp4j allocates more memory in the latest release?

jonahgraham · 2024-02-29T20:05:46Z

(I am quoting this part of your OP message because it was an edit people only following email notifications won't see it)

Btw. I did not encounter these OOMs half a year ago when I opened the eclipse/lsp4e#703. At that time I was using Eclipse 2023-06 and the respective lsp4j version. Can it be that lsp4j allocates more memory in the latest release?

The only LSP4J change I can think of that would have a small (tiny?) effect is that messages have a new field MessageJsonHandler - see #772. The other change may be if the wired up gson version changed?

As to your main question, I don't have much thoughts on it. Having a configurable max message size seems ok, but not sure where that should be configured. Very large JSON messages can translate into enormous Objects, or relatively small ones (for example if most of the JSON message is simply long strings such as file contents). I don't think LSP4J's jsonrpc should have a default max size.

I wonder if there is more efficient way (space wise) to process the JSON input that would avoid all the intermediary objects. i.e equivalent to sax vs dom xml parsing. Perhaps jackson? It would expose how much of the LSP4J API leaks out GSON API I guess.

sebthom · 2024-02-29T22:33:42Z

When thinking about replacing GSON I recommend to check out https://github.com/fabienrenaud/java-json-benchmark esp. the deserialization performance graphs.

jonahgraham · 2024-02-29T23:06:42Z

Useful link. Thanks for sharing.

with payloads of 1, 10, 100 and 1000 KB size

We need to adjust benchmark payload to 100× that and look at memory usage too.

pisv · 2024-03-01T09:36:12Z

However ultimately this will not be enough. The main issue here is that the DocumentSymbols request does not support batching microsoft/language-server-protocol#1533

I completely agree. When the number of nodes in the input message is huge, there is only so much LSP4J can do, GSON or not -- the resulting deserialized object graph would still be huge.

However, LSP supports partial results streaming for such use cases, which the DocumentSymbols request also supports:

partial result: DocumentSymbol[] | SymbolInformation[].

So, it does support batching. It is just that both the server and the client need to support this option.

sebthom · 2024-03-01T11:20:02Z

@pisv does LSP4J support partial results? I am wondering how I can convert ls.getTextDocumentService().documentSymbol() used here accordingly.

final var params = new DocumentSymbolParams(LSPEclipseUtils.toTextDocumentIdentifier(documentURI));
symbols = outlineViewerInput.wrapper.execute(ls -> ls.getTextDocumentService().documentSymbol(params));

pisv · 2024-03-01T12:51:06Z

Yes, LSP4J supports partial results, i.e. it provides mapping of all the necessary LSP structures to Java. For a usage example on the client-side, you can have a look at

https://github.com/lxtk-org/lxtk/blob/5f7fb05350299903e0a4f5ae26c56decdacb589b/org.lxtk.lx4e.ui/src/org/lxtk/lx4e/ui/symbols/WorkspaceSymbolSelectionDialog.java#L145-L177

See also AbstractPartialResultProgress and related classes in LXTK.

sebthom · 2024-03-02T11:52:44Z

Yes, LSP4J supports partial results, i.e. it provides mapping of all the necessary LSP structures to Java. For a usage example on the client-side, you can have a look at

lxtk-org/lxtk@5f7fb05/org.lxtk.lx4e.ui/src/org/lxtk/lx4e/ui/symbols/WorkspaceSymbolSelectionDialog.java#L145-L177

See also AbstractPartialResultProgress and related classes in LXTK.

I had a look at the project. To be honest I don't really grasp it, there are so many layers of abstractions/indirection that I got lost in the code. It looks like you needed to implement a whole infrastructure on top of lsp4j just to get partial results working.

So far I understand I need to set partialResultToken and workDoneToken on the request params and then can send the request with the same tokens multiple times.
The server signals the last result by returning the workDoneToken.

A very naive implementation of my current understanding is:

var params = new DocumentSymbolParams(new TextDocumentIdentifier(documentURI));
params.setPartialResultToken(UUID.randomUUID().toString());
params.setWorkDoneToken(UUID.randomUUID().toString());
do {
  List<Either<SymbolInformation,DocumentSymbol>> result = languageServer.getTextDocumentService().documentSymbol(params).get());
} while ( /* when to exit ?? */ );

What I don't understand is, how to know when all results are returned. documentSymbol() request only returns the list of symbols but no progress or workdone info.

pisv · 2024-03-03T10:59:04Z

So far I understand I need to set partialResultToken and workDoneToken on the request params and then can send the request with the same tokens multiple times. The server signals the last result by returning the workDoneToken.

A very naive implementation of my current understanding is:
var params = new DocumentSymbolParams(new TextDocumentIdentifier(documentURI));
params.setPartialResultToken(UUID.randomUUID().toString());
params.setWorkDoneToken(UUID.randomUUID().toString());
do {
  List<Either<SymbolInformation,DocumentSymbol>> result = languageServer.getTextDocumentService().documentSymbol(params).get());
} while ( /* when to exit ?? */ );

No, it does not work so.

First, work done progress is independent of partial result progress, so you do not actually need to set a work done token if all you need is partial results.

Second, you only need to send the request once; partial results, if the server supports them for this request, will be sent by the server via one or more $progress notifications (with the same partial result token) before the response is sent (in which case, a successful response will be effectively empty, because all results will be sent by the progress notifications).

So, the key here is that partial results need to be actually supported by the server for a given request.

Then, you need to ensure that $progress notifications with partial results are properly handled by the client, and the results are accepted/accumulated somehow. It can get a bit complicated, but basically, you need to implement LSP4J's LanguageClient.notifyProgress.

I'd suggest reading the following sections of the specification for more information:

(The latter describes client-initiated work done progress, but the same principles apply equally to partial result progress.)

ghentschke · 2024-03-11T10:04:14Z

@sebthom are you working currently on a PR for this issue here or in LSP4E?

sebthom · 2024-03-11T10:17:26Z

@ghentschke no I am not. go for it :-)

This was referenced Feb 29, 2024

Avoid temporary String object creation in StreamMessageProducer #816

Merged

Support limiting the number of outline symbols to avoid UI freezes eclipse/lsp4e#704

Merged

ghentschke mentioned this issue Mar 11, 2024

Performance bottleneck in Ui Thread eclipse/lsp4e#907

Open

ghentschke mentioned this issue Mar 11, 2024

Support partial results for DocumentSymbol eclipse/lsp4e#946

Open

jonahgraham added enhancement help_wanted bug and removed enhancement labels May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when working with large files because of huge DocumentSymbols response #815

OOM when working with large files because of huge DocumentSymbols response #815

sebthom commented Feb 29, 2024 •

edited

jonahgraham commented Feb 29, 2024

sebthom commented Feb 29, 2024 •

edited

jonahgraham commented Feb 29, 2024

pisv commented Mar 1, 2024

sebthom commented Mar 1, 2024 •

edited

pisv commented Mar 1, 2024

sebthom commented Mar 2, 2024

pisv commented Mar 3, 2024

ghentschke commented Mar 11, 2024

sebthom commented Mar 11, 2024

OOM when working with large files because of huge DocumentSymbols response #815

OOM when working with large files because of huge DocumentSymbols response #815

Comments

sebthom commented Feb 29, 2024 • edited

jonahgraham commented Feb 29, 2024

sebthom commented Feb 29, 2024 • edited

jonahgraham commented Feb 29, 2024

pisv commented Mar 1, 2024

sebthom commented Mar 1, 2024 • edited

pisv commented Mar 1, 2024

sebthom commented Mar 2, 2024

pisv commented Mar 3, 2024

ghentschke commented Mar 11, 2024

sebthom commented Mar 11, 2024

sebthom commented Feb 29, 2024 •

edited

sebthom commented Feb 29, 2024 •

edited

sebthom commented Mar 1, 2024 •

edited