Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in javacpp-presets-libpostal libpostal_expand_address method #641

Open
orividrisk opened this issue Aug 27, 2023 · 7 comments
Open

Comments

@orividrisk
Copy link

orividrisk commented Aug 27, 2023

Hi,
When using the javacpp-presets-libpostal libpostal 1.1-1.5.8 I experience a memory leak in my java process.
My code is iterating on a large number of addresses, and over time the java process memory is increasing until it crashes due to OOM.
I have been monitored the jvm heap and non heap memory metrics and there is no issue there.
This leads me to the conclusion that the issue is with the native allocation of the returned address response, which is not deallocating the parsed addresses.
I created an issue in the javacpp-presets repo (mentioned in the libpostal readme), where they replied that it could be a potential libpostal c library memory leak:
bytedeco/javacpp-presets#1407

The following is my code (I am using the default options provided by the library):

try (PointerScope pointerScope = new PointerScope();
             libpostal_address_parser_response_t response = libpostal_parse_address(addressStr, options)) {

            long count = response.num_components();
            Map<String, String> fields = new HashMap<>();

            for (int i = 0; i < count; i++) {
                fields.put(response.labels(i).getString(), response.components(i).getString());
            }

            libpostal_address_parser_response_destroy(response);
            return fields;
 }

image

If any more information is needed I would be happy to provide it.
I would very appreciate your prompt assistance with the issue as it is having a big impact on our production env.

Thank you and have a very nice day.

@orividrisk
Copy link
Author

orividrisk commented Aug 27, 2023

@saudet FYI

@albarrentine
Copy link
Contributor

Ran a few parser tests under valgrind with no memory leaks found on this side.

Can you write the above test program in C, compile it, and run under valgrind --leak-check=full to see if any leaks can be reproduced without roping in any of the JNI bindings?

Not clear on how javacpp-presets interacts with the C functions (it does not use the official jpostal/JNI bindings, instead generating its own automatic JNI bindings from the C API), and whether it's using latest or a tag. If it's not using the latest version of libpostal, would recommend trying that as there were some SSE fixes recently, but none of those were related to a memory leak.

Also noting that libpostal is not thread-safe so should only be used from a single thread.

@orividrisk
Copy link
Author

orividrisk commented Aug 30, 2023

@albarrentine thank you for the quick response.
My C is quite rusty so I don't think I will be able to provide the C equivalent of my test.
I wrote a very simple program using the javacpp-presets presets, which easily reproduces the issue.
This program upon exiting, produces a leak reports using the macOS leaks util( as unfortunately valgrind is no longer supported on more advanced versions of macos).

The program can be found in the following github repo , with instruction in the ReadMe file of how to run it.

In the memory leak report I found these two reoccurring stack trace leaks :

  1. Leak: 0x7fcb71281b90 size=16 zone: MallocStackLoggingLiteZone_0x104f34000 malloc in cstring_array_to_strings C libpostal.1.dylib length: 4 "unit" Call stack: 0x11627653a (???) ??? | 0x104e9130c (libjnipostal.dylib) Java_org_bytedeco_libpostal_global_postal_libpostal_1parse_1address__Lorg_bytedeco_javacpp_BytePointer_2Lorg_bytedeco_libpostal_libpostal_1address_1parser_1options_1t_2 | 0x12f57815c (libpostal.1.dylib) libpostal_parse_address | 0x12f59be9b (libpostal.1.dylib) address_parser_parse | 0x12f5904c7 (libpostal.1.dylib) cstring_array_to_strings | 0x7ff816f73a43 (libsystem_c.dylib) strdup | 0x7ff816f19149 (libsystem_malloc.dylib) _malloc_zone_malloc_instrumented_or_legacy

  2. Leak: 0x7fcb71284780 size=32 zone: MallocStackLoggingLiteZone_0x104f34000 malloc in address_parser_parse C libpostal.1.dylib Call stack: 0x7ff8170c6bd3 (libsystem_pthread.dylib) thread_start | 0x7ff8170cb1d3 (libsystem_pthread.dylib) _pthread_start | 0x104e21529 (libjli.dylib) ThreadJavaMain | 0x104e1ec4f (libjli.dylib) JavaMain | 0x1066742d7 (libjvm.dylib) jni_CallStaticVoidMethod | 0x106670e58 (libjvm.dylib) jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) | 0x10660cb26 (libjvm.dylib) JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) | 0x116269cc9 (???) ??? | 0x10ef080dc (???) ??? | 0x1168444a0 (???) ??? | 0x104e9130c (libjnipostal.dylib) Java_org_bytedeco_libpostal_global_postal_libpostal_1parse_1address__Lorg_bytedeco_javacpp_BytePointer_2Lorg_bytedeco_libpostal_libpostal_1address_1parser_1options_1t_2 | 0x12f57815c (libpostal.1.dylib) libpostal_parse_address | 0x12f59bd25 (libpostal.1.dylib) address_parser_parse | 0x7ff816f19149 (libsystem_malloc.dylib) _malloc_zone_malloc_instrumented_or_legacy

it looks like in both cases it is related to the address_parser_parse method in the libpostal library.

@saudet for clarification could you please shed some light as to how the JNI binding works including the used libostal version, and if the stack trace above could be related to this memory leak issue?

In addition I attached bellow a java NTM report which shows that the memory leak is not relevant to the JVM managed memory space, and also the report provided by the macOS leaks util

java NMT report.txt

leak report.txt

Please let me know if you need any more information.

@saudet
Copy link

saudet commented Aug 30, 2023

@orividrisk Your small example is pretty big. If you're serious about debugging this, reduce it down to only a few lines.

@albarrentine
Copy link
Contributor

@orividrisk https://github.com/orividrisk/libpostal-poc/blob/bd583351594b729dfa2bb89c6fac8f4ffb67c37d/src/main/java/com/example/test/Address.java#L66 you’re not calling libpostal_address_parser_response_destroy to free the memory allocated in that line, which explains both 1 and 2.

@orividrisk
Copy link
Author

@albarrentine adding libpostal_address_parser_response_destroy did the trick. the memory leak related to libpostal_parse_address was resolved. thank you!

In our production env we are using both address normalization & parsing which both presented with memory leak issue.
In order to isolate the memory issue I started with address parsing. now that it was resolved I moved on to normalization which also present with memory leak issue.

The following is a call stack example of a leak for the macOS leaks util:

Leak: 0x60000236a580 size=64 zone: DefaultMallocZone_0x107ab9000 realloc in utf8_case C libpostal.1.dylib length: 51 "1142 bellview dr wisconsin rapids wi 54494-9293 usa" Call stack: 0x7ff8170c6bd3 (libsystem_pthread.dylib) thread_start | 0x7ff8170cb1d3 (libsystem_pthread.dylib) _pthread_start | 0x107adc529 (libjli.dylib) ThreadJavaMain | 0x107ad9c4f (libjli.dylib) JavaMain | 0x1093112d7 (libjvm.dylib) jni_CallStaticVoidMethod | 0x10930de58 (libjvm.dylib) jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) | 0x1092a9b26 (libjvm.dylib) JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*) | 0x118f06cc9 (???) ??? | 0x111ac4ae4 (???) ??? | 0x1194c628f (???) ??? | 0x107b5292c (libjnipostal.dylib) Java_org_bytedeco_libpostal_global_postal_libpostal_1normalize_1string__Lorg_bytedeco_javacpp_BytePointer_2J | 0x132592d59 (libpostal.1.dylib) normalize_string_latin_languages | 0x132592b4e (libpostal.1.dylib) normalize_string_utf8_languages | 0x13258cd30 (libpostal.1.dylib) utf8_case | 0x7ff816efcde6 (libsystem_malloc.dylib) realloc

For the normalized address I didn't see a method to "destroy" the returned normalized string.

I updated the example github repo to now perform both parsing and normalization.
@saudet please let me know what you think ( I updated the example to be as concise as possible)

The README instructions in the repo can be used to reproduce the issue, with the detailed leaks instances report printed out to console after the process concludes it's run.

normalized address leak report.txt

Please let me know if you need any more information.

@albarrentine
Copy link
Contributor

@orividrisk there's a corresponding function libpostal_expansion_array_destroy(expansions, num_expansions) which cleans up memory for that response. This is used in the expander cli program which builds with libpostal. I'm not seeing any memory leaks for that input either.

@albarrentine albarrentine changed the title Memory leak in libpostal libpostal_parse_address method Memory leak in javacpp-presets-libpostal libpostal_expand_address method Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants