Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of Events grows indefinitely when one Event contains null message in HttpSender #190

Open
nbetten opened this issue Jul 16, 2021 · 4 comments · May be fixed by #195
Open

List of Events grows indefinitely when one Event contains null message in HttpSender #190

nbetten opened this issue Jul 16, 2021 · 4 comments · May be fixed by #195

Comments

@nbetten
Copy link

nbetten commented Jul 16, 2021

## General Information ## 
Version: 1.8.0
Platform version: <>
Framework version: Java 11
Splunk version: 8.1.1

We have encountered a problem in one of our applications with the heap overflowing caused by stuck Splunk messages. We have analyzed the problem and verified that it is caused by a faulty Layout that fails to serialize some Object properly before passing them on.

The problem chain starts at the method send(...) in the HttpEventCollectorSender as seen below with comments indicating the problem. When send(...) is called with a message that contains a null reference, the HttpEventCollectorEventInfo() gets added to the list of events. However this causes the HttpEventCollectorSender to fail upon send, causing the list to grow with no flushing thereafter.

    public synchronized void send(
            final String severity,
            final String message,
            final String logger_name,
            final String thread_name,
            Map<String, String> properties,
            final String exception_message,
            Serializable marker
    ) {
        // create event info container and add it to the batch
        HttpEventCollectorEventInfo eventInfo =
                new HttpEventCollectorEventInfo(severity, message, logger_name, thread_name, properties, exception_message, marker);
        eventsBatch.add(eventInfo);
        # NullPointerException when a null message gets passed 
        eventsBatchSize += severity.length() + message.length();
        if (eventsBatch.size() >= maxEventsBatchCount || eventsBatchSize > maxEventsBatchSize) {
            flush();
        }
    }

When calling flush with a malformed event flush always fails causing the heap to explode. This is caused by the code snippet at the bottom

    public synchronized void flush() {
        if (eventsBatch.size() > 0) {
            postEventsAsync(eventsBatch);
        }
        // Clear the batch. A new list should be created because events are
        // sending asynchronously and "previous" instance of eventsBatch object
        // is still in use.
        eventsBatch = new LinkedList<>();
        eventsBatchSize = 0;
    }

Events can not get posted because serialization fails if a Layout generates a faulty event.

   public void postEvents(final List<HttpEventCollectorEventInfo> events,
                           final HttpEventCollectorMiddleware.IHttpSenderCallback callback) {
        startHttpClient(); // make sure http client is started
        // convert events list into a string
        StringBuilder eventsBatchString = new StringBuilder();
        for (HttpEventCollectorEventInfo eventInfo : events) {
            eventsBatchString.append(serializer.serialize(eventInfo));
        }
@marthie
Copy link

marthie commented Jul 16, 2021

A NullPointerException from serializer.serialize(...) is not catched in code:

for (HttpEventCollectorEventInfo eventInfo : events) {
  eventsBatchString.append(serializer.serialize(eventInfo)); // throws NullPointerException on a 'null' Reference in eventInfo.message
}

In our case the thrown NullPointerexception is catched in Logback's ch.qos.logback.core.AppenderBase.

As @nbetten mentioned above, the NullPointerException prevents the HttpEventCollectorSender.eventsBatch list from being cleaned up within the HttpEventCollectorSender.flush() method.

@nbetten
Copy link
Author

nbetten commented Jul 19, 2021

So my idea was that faulty Layouts have to be catched by the Splunk Loggers respectively, the easiest solution would be to filter out events that have null messages or cast that null message to an empty String.

    public synchronized void send(
            final String severity,
            final String message,
            final String logger_name,
            final String thread_name,
            Map<String, String> properties,
            final String exception_message,
            Serializable marker
    ) {
        String eventMessage = message == null ? "" : message;
        // create event info container and add it to the batch
        HttpEventCollectorEventInfo eventInfo =
                new HttpEventCollectorEventInfo(severity, eventMessage, logger_name, thread_name, properties, exception_message, marker);
        eventsBatch.add(eventInfo);
        eventsBatchSize += severity.length() + eventMessage.length();
        if (eventsBatch.size() >= maxEventsBatchCount || eventsBatchSize > maxEventsBatchSize) {
            flush();
        }
    }

@nbetten
Copy link
Author

nbetten commented Aug 3, 2021

As for now I will propose a solution to the Splunk Maintainers, since no other ideas were suggested.

@nbetten
Copy link
Author

nbetten commented Aug 17, 2021

Fixed in #195, waiting to be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants