Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault while using ColPack coloring method #59

Open
Saelyos opened this issue Jun 10, 2020 · 8 comments
Open

Segfault while using ColPack coloring method #59

Saelyos opened this issue Jun 10, 2020 · 8 comments

Comments

@Saelyos
Copy link

Saelyos commented Jun 10, 2020

Hello,

I'm experiencing random segfaults when using the ColPack coloring method to compute a sparse jacobian. I can reproduce them with sparse_exemple. To make them occur more frequently I've changed the line of sparse.cpp:

Run( colpack_jacobian, "colpack_jacobian" );

with

for (int i = 0; i < 10000; i++) {
    Run( colpack_jacobian, "colpack_jacobian" );
}

This problem might be related to this issue: coin-or/Adol-C/19, and in that case the problem could come from ColPack. I'm not completely convinced it's the same issue, because the bug in Adol-C only appears with column coloring and not with row coloring, and CppAD uses row coloring.

Additional information

OS: Debian 10
CppAD version: 20200000.2

@bradbell
Copy link
Contributor

bradbell commented Dec 4, 2020

Sorry for the slow response. I must have missed the e-mail informing me of this issue.

I have tried to reproduce this error (on a Fedora 33 system) and cannot. I think it may be an issue of the version of ColPack that is linked to CppAD in the Debian release. Here is what works for me and should work for you on your system:

  • I built a local copy of cppad as follows:
clone https://github.com/coin-or/CppAD.git cppad.git
cd cppad.git
git checkout 20200000.2
bin/get_colpack.sh
cd build
libdir=$(find prefix -name 'libColPack.*' | head -1 | sed -e 's|prefix/\([^/]*\)/.*|\1|')
cmake -D colpack_prefix=$(pwd)/prefix -D cmake_install_libdirs="$libdir" ..
make check_example_sparse

All the tests passed (for me). I then edited the file ../example/sparse/sparse.cpp as follows:

  • below // This line is used by test_one.sh I added the following text:
for (int i = 0; i < 10000; i++) {
    Run( colpack_jacobian, "colpack_jacobian" );
}
# if 0
  • above // check for memory leak I added the following text
# endif

I then re-ran the following command (n the build directory)

make check_example_sparse

This time I got 1000 lines with

colpack_jacobian    OK
  • I then ran the command
valgrind --leak-check=yes example/sparse/example_sparse 

And got the following message at the end:

==1563847== HEAP SUMMARY:
==1563847==     in use at exit: 0 bytes in 0 blocks
==1563847==   total heap usage: 1,590,060 allocs, 1,590,060 frees, 683,153,528 bytes allocated
==1563847== 
==1563847== All heap blocks were freed -- no leaks are possible
==1563847== 
==1563847== For lists of detected and suppressed errors, rerun with: -s
==1563847== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@Saelyos
Copy link
Author

Saelyos commented Jan 15, 2021

I've followed the exact same steps as you, and I still have some segfaults when I change the code to launch the test 10000 times.
With the valgrind command I got the following message :

==24200== HEAP SUMMARY:
==24200==     in use at exit: 5,472 bytes in 11 blocks
==24200==   total heap usage: 1,790,071 allocs, 1,790,060 frees, 685,990,776 bytes allocated
==24200== 
==24200== 2,128 bytes in 7 blocks are possibly lost in loss record 4 of 5
==24200==    at 0x4837B65: calloc (vg_replace_malloc.c:752)
==24200==    by 0x40116D1: allocate_dtv (dl-tls.c:286)
==24200==    by 0x401203D: _dl_allocate_tls (dl-tls.c:532)
==24200==    by 0x5072B95: allocate_stack (allocatestack.c:621)
==24200==    by 0x5072B95: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==24200==    by 0x504AD61: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24200==    by 0x5041E09: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==24200==    by 0x48B3A29: ColPack::BipartiteGraphPartialColoring::PartialDistanceTwoRowColoring_OMP() (in /usr/lib/x86_64-linux-gnu/libColPack.so.0.0.0)
==24200==    by 0x48B48A7: ColPack::BipartiteGraphPartialColoringInterface::PartialDistanceTwoColoring(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /usr/lib/x86_64-linux-gnu/libColPack.so.0.0.0)
==24200==    by 0x4855BA3: CppAD::local::cppad_colpack_general(CppAD::vector<unsigned long>&, unsigned long, unsigned long, CppAD::vector<unsigned int*> const&) (cppad_colpack.cpp:73)
==24200==    by 0x15A140: void CppAD::local::color_general_colpack<CppAD::local::sparse::list_setvec, CppAD::vector<unsigned long> >(CppAD::local::sparse::list_setvec const&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long>&) (color_general.hpp:268)
==24200==    by 0x190811: unsigned long CppAD::ADFun<double, double>::SparseJacobianFor<CppAD::vector<double>, CppAD::local::sparse::list_setvec, CppAD::vector<unsigned long> >(CppAD::vector<double> const&, CppAD::local::sparse::list_setvec&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long> const&, CppAD::vector<double>&, CppAD::sparse_jacobian_work&) (sparse_jacobian.hpp:415)
==24200==    by 0x190120: unsigned long CppAD::ADFun<double, double>::SparseJacobianForward<CppAD::vector<double>, std::vector<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >, std::allocator<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> > > >, CppAD::vector<unsigned long> >(CppAD::vector<double> const&, std::vector<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >, std::allocator<std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> > > > const&, CppAD::vector<unsigned long> const&, CppAD::vector<unsigned long> const&, CppAD::vector<double>&, CppAD::sparse_jacobian_work&) (sparse_jacobian.hpp:784)
==24200== 
==24200== LEAK SUMMARY:
==24200==    definitely lost: 0 bytes in 0 blocks
==24200==    indirectly lost: 0 bytes in 0 blocks
==24200==      possibly lost: 2,128 bytes in 7 blocks
==24200==    still reachable: 3,344 bytes in 4 blocks
==24200==         suppressed: 0 bytes in 0 blocks
==24200== Reachable blocks (those to which a pointer was found) are not shown.
==24200== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==24200== 
==24200== For counts of detected and suppressed errors, rerun with: -v
==24200== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

(there was no segfaults when running with Valgrind, I got OK for all the tests)

Anyway, I was using ColPack because my benchmarks had shown better performances by using it, but it was probably just because of a different compilation flag between ColPack and CppAD. So now I'm using the default coloring method of CppAD, and everything works perfectly.

@bradbell
Copy link
Contributor

@Saelyos Do you get the error if you use the master branch ?

@Saelyos
Copy link
Author

Saelyos commented Jan 19, 2021

Yes, I've followed the same steps with master, and I still get the error.

@bradbell
Copy link
Contributor

I changed my loop to execute 10,000 times:

cppad.git>git diff
diff --git a/example/sparse/sparse.cpp b/example/sparse/sparse.cpp
index e9bf09f5b..cddacaa29 100644
--- a/example/sparse/sparse.cpp
+++ b/example/sparse/sparse.cpp
@@ -64,6 +64,10 @@ int main(void)
     CppAD::test_boolofvoid Run(group, width);
 
     // This line is used by test_one.sh
+       for (int i = 0; i < 100000; i++) {
+               Run( colpack_jacobian, "colpack_jacobian" );
+       }
+# if 0
 
     // BEGIN_SORT_THIS_LINE_PLUS_2
     // external compiled tests
@@ -102,6 +106,7 @@ int main(void)
     Run( sparse2eigen,              "sparse2eigen" );
 # endif
     //
+# endif
     // check for memory leak
     bool memory_ok = CppAD::thread_alloc::free_all();
     // print summary at end
cppad.git>

Then in the build/example/sparse directory I executed the command:

sparse>valgrind ./example_sparse > junk
==31944== Memcheck, a memory error detector
==31944== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==31944== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==31944== Command: ./example_sparse
==31944== 
==31944== 
==31944== HEAP SUMMARY:
==31944==     in use at exit: 0 bytes in 0 blocks
==31944==   total heap usage: 15,900,125 allocs, 15,900,125 frees, 6,703,210,377 bytes allocated
==31944== 
==31944== All heap blocks were freed -- no leaks are possible
==31944== 
==31944== For lists of detected and suppressed errors, rerun with: -s
==31944== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Perhaps you could figure out how to change the file bin/get_colpack.sh so that it builds a debug version of the Colpack library and then run the program in the debugger. That might give us some more information.

@xhzheng1026
Copy link

@Saelyos
When I used adolc, I encountered the same problem as you. Have you solved this problem? Or will the same problem not occur just by using CPPAD instead?

@Saelyos
Copy link
Author

Saelyos commented May 5, 2022

I haven't solved this problem and I haven't had the time to investigate on why it fails when I use ColPack with CppAD or Adol-C.
Fortunately, using CppAD without ColPack works perfectly for me.

@xhzheng1026
Copy link

I haven't solved this problem and I haven't had the time to investigate on why it fails when I use ColPack with CppAD or Adol-C. Fortunately, using CppAD without ColPack works perfectly for me.

thank you so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants