Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddressSanitizer: heap-use-after-free on large dataset #217

Open
JonasKellerer opened this issue Jul 14, 2023 · 1 comment
Open

AddressSanitizer: heap-use-after-free on large dataset #217

JonasKellerer opened this issue Jul 14, 2023 · 1 comment
Assignees

Comments

@JonasKellerer
Copy link

We have tried to use the csv-parser on a large dataset (8 million lines at 9,9 GB). However when looping over all lines and exectue row[column_name].get<std::string>() we get the following error message

`
==245==ERROR: AddressSanitizer: heap-use-after-free on address 0x621003c37248 at pc 0x56492659e7ee bp 0x7ffe476e2f20 sp 0x7ffe476e2f10
READ of size 8 at 0x621003c37248 thread T0
#0 0x56492659e7ed in csv::internals::CSVFieldList::operator[](unsigned long) const /mwe/includes/csv_reader.h:7635
#1 0x56492659f298 in csv::CSVRow::get_field(unsigned long) const /mwe/includes/csv_reader.h:7694
#2 0x56492659ea9d in csv::CSVRow::operator[](unsigned long) const /mwe/includes/csv_reader.h:7656
#3 0x56492659ebea in csv::CSVRow::operator[](std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) const /mwe/includes/csv_reader.h:7672
#4 0x5649265927c2 in getColumn(std::filesystem::__cxx11::path const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) /mwe/src/main.cpp:27
#5 0x564926592eb0 in main /mwe/src/main.cpp:36
#6 0x7f66a36d0d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
#7 0x7f66a36d0e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
#8 0x564926591dc4 in _start (/mwe/build/csvMWE+0x7dc4)

0x621003c37248 is located 328 bytes inside of 4096-byte region [0x621003c37100,0x621003c38100)
freed by thread T107 here:
#0 0x7f66a3cb722f in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:172
#1 0x5649265be954 in __gnu_cxx::new_allocatorcsv::internals::RawCSVField*::deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/ext/new_allocator.h:145
#2 0x5649265b31d6 in std::allocatorcsv::internals::RawCSVField*::deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/allocator.h:199
#3 0x5649265b31d6 in std::allocator_traits<std::allocatorcsv::internals::RawCSVField* >::deallocate(std::allocatorcsv::internals::RawCSVField*&, csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:496
#4 0x5649265aa73f in std::_Vector_base<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_deallocate(csv::internals::RawCSVField**, unsigned long) /usr/include/c++/11/bits/stl_vector.h:354
#5 0x5649265b0692 in void std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_realloc_insert<csv::internals::RawCSVField* const&>(__gnu_cxx::__normal_iterator<csv::internals::RawCSVField**, std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* > >, csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/vector.tcc:500
#6 0x5649265a6ef2 in std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::push_back(csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/stl_vector.h:1198
#7 0x56492659e97c in csv::internals::CSVFieldList::allocate() /mwe/includes/csv_reader.h:7640
#8 0x5649265a3b65 in void csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned long&>(unsigned int&&, unsigned long&) /mwe/includes/csv_reader.h:5478
#9 0x564926598700 in csv::internals::IBasicCSVParser::push_field() /mwe/includes/csv_reader.h:6972
#10 0x564926598c01 in csv::internals::IBasicCSVParser::parse() /mwe/includes/csv_reader.h:6999
#11 0x5649265c8b49 in csv::internals::StreamParser<std::basic_ifstream<char, std::char_traits > >::next(unsigned long) /mwe/includes/csv_reader.h:6175
#12 0x56492659ceb9 in csv::CSVReader::read_csv(unsigned long) /mwe/includes/csv_reader.h:7496
#13 0x5649265c9335 in bool std::__invoke_impl<bool, bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(std::__invoke_memfun_deref, bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:74
#14 0x5649265c913e in std::__invoke_result<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>::type std::__invoke<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:96
#15 0x5649265c905e in bool std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/include/c++/11/bits/std_thread.h:253
#16 0x5649265c8ec1 in std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::operator()() /usr/include/c++/11/bits/std_thread.h:260
#17 0x5649265c8dbd in std::thread::_State_impl<std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> > >::_M_run() /usr/include/c++/11/bits/std_thread.h:211
#18 0x7f66a3ab22b2 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2)

previously allocated by thread T107 here:
#0 0x7f66a3cb61c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
#1 0x5649265c49fd in __gnu_cxx::new_allocatorcsv::internals::RawCSVField*::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127
#2 0x5649265bd7a6 in std::allocatorcsv::internals::RawCSVField*::allocate(unsigned long) /usr/include/c++/11/bits/allocator.h:185
#3 0x5649265bd7a6 in std::allocator_traits<std::allocatorcsv::internals::RawCSVField* >::allocate(std::allocatorcsv::internals::RawCSVField*&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464
#4 0x5649265b779f in std::_Vector_base<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_allocate(unsigned long) /usr/include/c++/11/bits/stl_vector.h:346
#5 0x5649265b0514 in void std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::_M_realloc_insert<csv::internals::RawCSVField* const&>(__gnu_cxx::__normal_iterator<csv::internals::RawCSVField**, std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* > >, csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/vector.tcc:440
#6 0x5649265a6ef2 in std::vector<csv::internals::RawCSVField*, std::allocatorcsv::internals::RawCSVField* >::push_back(csv::internals::RawCSVField* const&) /usr/include/c++/11/bits/stl_vector.h:1198
#7 0x56492659e97c in csv::internals::CSVFieldList::allocate() /mwe/includes/csv_reader.h:7640
#8 0x5649265a3b65 in void csv::internals::CSVFieldList::emplace_back<unsigned int, unsigned long&>(unsigned int&&, unsigned long&) /mwe/includes/csv_reader.h:5478
#9 0x564926598700 in csv::internals::IBasicCSVParser::push_field() /mwe/includes/csv_reader.h:6972
#10 0x564926598c01 in csv::internals::IBasicCSVParser::parse() /mwe/includes/csv_reader.h:6999
#11 0x5649265c8b49 in csv::internals::StreamParser<std::basic_ifstream<char, std::char_traits > >::next(unsigned long) /mwe/includes/csv_reader.h:6175
#12 0x56492659ceb9 in csv::CSVReader::read_csv(unsigned long) /mwe/includes/csv_reader.h:7496
#13 0x5649265c9335 in bool std::__invoke_impl<bool, bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(std::__invoke_memfun_deref, bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:74
#14 0x5649265c913e in std::__invoke_result<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>::type std::__invoke<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long>(bool (csv::CSVReader::&&)(unsigned long), csv::CSVReader&&, unsigned long&&) /usr/include/c++/11/bits/invoke.h:96
#15 0x5649265c905e in bool std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) /usr/include/c++/11/bits/std_thread.h:253
#16 0x5649265c8ec1 in std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> >::operator()() /usr/include/c++/11/bits/std_thread.h:260
#17 0x5649265c8dbd in std::thread::_State_impl<std::thread::_Invoker<std::tuple<bool (csv::CSVReader::)(unsigned long), csv::CSVReader, unsigned long> > >::_M_run() /usr/include/c++/11/bits/std_thread.h:211
#18 0x7f66a3ab22b2 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2)

Thread T107 created by T0 here:
#0 0x7f66a3c58685 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
#1 0x7f66a3ab2388 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_deletestd::thread::_State >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc388)
#2 0x56492659d23d in csv::CSVReader::read_row(csv::CSVRow&) /mwe/includes/csv_reader.h:7536
#3 0x56492659e70a in csv::CSVReader::iterator::operator++() /mwe/includes/csv_reader.h:7605
#4 0x5649265928ad in getColumn(std::filesystem::__cxx11::path const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) /mwe/src/main.cpp:25
#5 0x564926592eb0 in main /mwe/src/main.cpp:36
#6 0x7f66a36d0d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)

SUMMARY: AddressSanitizer: heap-use-after-free /mwe/includes/csv_reader.h:7635 in csv::internals::CSVFieldList::operator[](unsigned long) const
Shadow bytes around the buggy address:
0x0c428077edf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c428077ee00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c428077ee10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c428077ee20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c428077ee40: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
0x0c428077ee50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c428077ee90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==245==ABORTING
`

The problem can be fixed, when using std::this_thread::sleep_for(std::chrono::nanoseconds(1)); in the same loop.

For reproduceability, I have put a MWE here:
https://drive.google.com/file/d/1M_PJLlhxs8JTmIGEcDNCBAeBqxqmdNBC/view?usp=drive_link

Just extract it and run docker build . --tag=mwe, then docker run -it mwe and inside the container ./runAndBuild.sh.

@vincentlaucsb
Copy link
Owner

Thanks for your report, I'll take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants