Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bear is stuck #443

Open
marxin opened this issue Jan 31, 2022 · 21 comments
Open

Bear is stuck #443

marxin opened this issue Jan 31, 2022 · 21 comments
Labels
need help This is something that I can't do myself.

Comments

@marxin
Copy link

marxin commented Jan 31, 2022

I can't build GCC with bear, where it's stack at the end of compilation:

bear -- make -j16
...
libtool: link: ( cd ".libs" && rm -f "libgfortran.la" && ln -s "../libgfortran.la" "libgfortran.la" )
make[3]: Leaving directory '/dev/shm/objdir/x86_64-pc-linux-gnu/libgfortran'
make[2]: Leaving directory '/dev/shm/objdir/x86_64-pc-linux-gnu/libgfortran'
make[1]: Leaving directory '/dev/shm/objdir'

There's process tree:
Screenshot from 2022-01-31 12-40-28

Where I'm using the latest release:

$ bear --version
bear 3.0.18
@rizsotto
Copy link
Owner

rizsotto commented Feb 2, 2022

Hey Martin, I need more context in order to help. Could you fill out the issue template? Also, run the command with verbose flag and attach the output? Thanks!

@marxin
Copy link
Author

marxin commented Feb 3, 2022

All right, so info from the issue template would be:

uname -a
Linux marxinbox.suse.cz 5.16.2-1-default #1 SMP PREEMPT Mon Jan 24 18:27:48 UTC 2022 (0d710a8) x86_64 x86_64 x86_64 GNU/Linux

Bear is from openSUSE TW distribution, normally installed package.

Using --verbose leads to a different error:

$ bear --verbose -- make
...
g++: fatal error: cannot execute 'cc1plus': execvp: No such file or directory
...

bear.log.txt

@rizsotto
Copy link
Owner

rizsotto commented Feb 3, 2022

Interesting log!!!

I've seen the message g++: fatal error: cannot execute 'cc1plus': execvp: No such file or directory when the PATH environment is empty. And from the logs I see it is empty from the very first commands after Bear executes make.

Is that possible that the Makefile sets the environment empty?

Is this a GCC build? This was problematic on Kali linux too. GCC is executes cc1 or cc1plus, which is not in the PATH. (But it knows the location for it.) What I would expect that the GCC driver program (gcc, cc, g++, c++, etc.) executes the cc1plus with full path. But what it does is just call execvp only with the name of the program, which suppose to search in the PATH.

I don't really know how to fix empty PATH execution for GCC. I know that it works without Bear. (/usr/bin/env - /usr/bin/gcc -c /dev/null works just fine.)

@rizsotto rizsotto added the need help This is something that I can't do myself. label Feb 3, 2022
@marxin
Copy link
Author

marxin commented Feb 3, 2022

So what's weird is that w/o the --verbose argument it works (until the end where it's stuck).

Plus I think GCC drive uses execve with a full path if I see correctly:

strace -f -s 512 g++ -fcf-protection -fno-PIE -c  -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g       -DIN_GCC -fPIC    -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -Ic-family -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/c-family -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include -I/home/marxin/Programming/gcc/gcc/../libcody  -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o c-family/c-common.o -MT c-family/c-common.o -MMD -MP -MF c-family/.deps/c-common.TPo /home/marxin/Programming/gcc/gcc/c-family/c-common.cc 2>&1 | grep execv
execve("/usr/bin/g++", ["g++", "-fcf-protection", "-fno-PIE", "-c", "-DIN_GCC_FRONTEND", "-DIN_GCC_FRONTEND", "-g", "-DIN_GCC", "-fPIC", "-fno-exceptions", "-fno-rtti", "-fasynchronous-unwind-tables", "-W", "-Wall", "-Wno-narrowing", "-Wwrite-strings", "-Wcast-qual", "-Wmissing-format-attribute", "-Woverloaded-virtual", "-pedantic", "-Wno-long-long", "-Wno-variadic-macros", "-Wno-overlength-strings", "-fno-common", "-DHAVE_CONFIG_H", "-I.", "-Ic-family", "-I/home/marxin/Programming/gcc/gcc", "-I/home/marxin/Programming/gcc/gcc/c-family", "-I/home/marxin/Programming/gcc/gcc/../include", "-I/home/marxin/Programming/gcc/gcc/../libcpp/include", "-I/home/marxin/Programming/gcc/gcc/../libcody", "-I/home/marxin/Programming/gcc/gcc/../libdecnumber", "-I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid", "-I../libdecnumber", "-I/home/marxin/Programming/gcc/gcc/../libbacktrace", "-o", "c-family/c-common.o", "-MT", "c-family/c-common.o", "-MMD", "-MP", "-MF", "c-family/.deps/c-common.TPo", "/home/marxin/Programming/gcc/gcc/c-family/c-common.cc"], 0x7fffffffdbf8 /* 81 vars */) = 0
[pid 23829] execve("/usr/lib64/gcc/x86_64-suse-linux/11/cc1plus", ["/usr/lib64/gcc/x86_64-suse-linux/11/cc1plus", "-quiet", "-I", ".", "-I", "c-family", "-I", "/home/marxin/Programming/gcc/gcc", "-I", "/home/marxin/Programming/gcc/gcc/c-family", "-I", "/home/marxin/Programming/gcc/gcc/../include", "-I", "/home/marxin/Programming/gcc/gcc/../libcpp/include", "-I", "/home/marxin/Programming/gcc/gcc/../libcody", "-I", "/home/marxin/Programming/gcc/gcc/../libdecnumber", "-I", "/home/marxin/Programming/gcc/gcc/../libdecnumber/bid", "-I", "../libdecnumber", "-I", "/home/marxin/Programming/gcc/gcc/../libbacktrace", "-MMD", "c-family/c-common.d", "-MF", "c-family/.deps/c-common.TPo", "-MP", "-MT", "c-family/c-common.o", "-D_GNU_SOURCE", "-D", "IN_GCC_FRONTEND", "-D", "IN_GCC_FRONTEND", "-D", "IN_GCC", "-D", "HAVE_CONFIG_H", "/home/marxin/Programming/gcc/gcc/c-family/c-common.cc", "-quiet", "-dumpdir", "c-family/", "-dumpbase", "c-common.cc", "-dumpbase-ext", ".cc", "-mtune=generic", "-march=x86-64", "-g", "-Wextra", "-Wall", "-Wno-narrowing", "-Wwrite-strings", "-Wcast-qual", "-Wsuggest-attribute=format", "-Woverloaded-virtual", "-Wpedantic", "-Wno-long-long", "-Wno-variadic-macros", "-Wno-overlength-strings", "-fcf-protection=full", "-fPIC", "-fno-exceptions", "-fno-rtti", "-fasynchronous-unwind-tables", "-fno-common", "-o", "/tmp/ccUlqdms.s"], 0x501ec0 /* 85 vars */ <unfinished ...>
[pid 23829] <... execve resumed>)       = 0
[pid 23829] write(3, "builtin_dgettext\"\n.LC1001:\n\t.string\t\"__builtin_dwarf_cfa\"\n.LC1002:\n\t.string\t\"__builtin_dwarf_sp_column\"\n.LC1003:\n\t.string\t\"__builtin_eh_return\"\n\t.align 8\n.LC1004:\n\t.string\t\"__builtin_eh_return_data_regno\"\n.LC1005:\n\t.string\t\"__builtin_execl\"\n.LC1006:\n\t.string\t\"__builtin_execlp\"\n.LC1007:\n\t.string\t\"__builtin_execle\"\n.LC1008:\n\t.string\t\"__builtin_execv\"\n.LC1009:\n\t.string\t\"__builtin_execvp\"\n.LC1010:\n\t.string\t\"__builtin_execve\"\n.LC1011:\n\t.string\t\"__builtin_exit\"\n.LC1012:\n\t.string\t\"__builtin_expect\"\n\t.align 8\n.LC10"..., 4096) = 4096
[pid 23830] execve("/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/as", ["/usr/lib64/gcc/x86_64-suse-linux/11/../../../../x86_64-suse-linux/bin/as", "-I", ".", "-I", "c-family", "-I", "/home/marxin/Programming/gcc/gcc", "-I", "/home/marxin/Programming/gcc/gcc/c-family", "-I", "/home/marxin/Programming/gcc/gcc/../include", "-I", "/home/marxin/Programming/gcc/gcc/../libcpp/include", "-I", "/home/marxin/Programming/gcc/gcc/../libcody", "-I", "/home/marxin/Programming/gcc/gcc/../libdecnumber", "-I", "/home/marxin/Programming/gcc/gcc/../libdecnumber/bid", "-I", "../libdecnumber", "-I", "/home/marxin/Programming/gcc/gcc/../libbacktrace", "--gdwarf-5", "--64", "-o", "c-family/c-common.o", "/tmp/ccUlqdms.s"], 0x501ec0 /* 85 vars */ <unfinished ...>
[pid 23830] <... execve resumed>)       = 0

@marxin
Copy link
Author

marxin commented Feb 3, 2022

And yes, cc1plus is really not on PATH:

$ which cc1plus
which: no cc1plus in (/home/marxin/bin/valgrind/bin:/home/marxin/.local/bin:/home/marxin/bin:/usr/local/bin:/usr/bin:/bin:/home/marxin/Programming/gcc-util/boilerplate:/home/marxin/Programming/gcc-util/dumps:/home/marxin/Programming/script-misc)

@rizsotto
Copy link
Owner

rizsotto commented Feb 3, 2022

Nice catch, Bear reports execvp for that execution. Maybe that's going to be the problem. Will look at it on the weekend.

@rizsotto
Copy link
Owner

rizsotto commented Feb 4, 2022

I'm trying to reproduce it with the latest master, but the bug does not show up... The verbose log shows that it's using execvp and the cc1 is with full path.

This is what I'm running on Fedora or Arch:

$ /usr/bin/env - ../Bear.install/bin/bear -- /usr/bin/env - /usr/bin/gcc -c /tmp/empty.c
$ cat compile_commands.json
[
  {
    "arguments": [
      "/usr/bin/gcc",
      "-c",
      "/tmp/empty.c"
    ],
    "directory": "/tmp",
    "file": "/tmp/empty.c"
  }
]

@dkm
Copy link

dkm commented Feb 10, 2022

Not sure this is related, but I've never been able to do a parallel build of gcc with bear (same symptom as @marxin, build is stuck at the end). I usually do a sequential build when I'm not doing any dev. IIRC, @philberty had the same issue).
Happy to help if needed.

@rizsotto
Copy link
Owner

@dkm there are two issues you mention:

  • Bear stuck at the end. Which might be just citnames running slow on the event file... Bear executes two binaries: intercept which collects the executed process names and write it into an event file. (This event file is 4 GB for a linux kernel compilation.) Then it executes citnames which reads the event file and filter out the compiler calls, detects duplicates, format the compilation database entries. (This process is single threaded, takes 1-2 min for the linux kernel.) To check this is the case for you by running the intercept and citnames separately (as bear would do).
  • Can't run parallel builds. The intercept running a gRPC service to collect the executions. And the gRPC has an open bug, about not closing the file descriptors fast enough. (In case if your build fails with "not enough file descriptor" message.) As a workaround, increase the max file descriptor limit.

@rizsotto
Copy link
Owner

I've seen intercept to be slow in the past, because it was not able to write the entries fast enough. (The root cause for it was that intercept was using SQLight to store events, but that was removed in recent versions.) If you running the compilation on a remote drive, or a drive which is slow, that can cause intercept to look like stuck on the job.

Things to try our:

  • Run the intercept and citnames instead of calling bear. You will see at which phase it got stuck.
  • Run intercept with --verbose, which will show how it goes. So we can see if the build is still going on or stuck somewhere in intercept.

@theIDinside
Copy link

theIDinside commented Apr 7, 2022

I have a similar problem; my problem arises whenever I use more than 6 jobs for make.

@hwti
Copy link

hwti commented Nov 17, 2022

I get the same issue at various gcc build steps when used with crosstool-ng, even with only one job.
Maybe there is an issue with command redirection, since killing a cut - process terminates everything (obviously with a failure).

During the build, even the main intercept process uses quite a lot of CPU, it creates a giant >1GB file (logging all commands, including mv for example) which later gets reduced to 3MB.

Bear 2.4.4 works, and is a lot faster.

@dkm
Copy link

dkm commented Dec 28, 2022

FWIW, I've tried again this morning using a freshly built bear: same behavior as described above (also tested with latest debian's package 3.0.20-1+b3). But I'm not sure it really picks everything from my local install as I can see :

└─ bear -- make -j8 all
   └─ intercept --library /usr/$LIB/bear/libexec.so --wrapper /usr/lib/x86_64-linux-gnu/bear/wrap
      ├─ intercept --library /usr/$LIB/bear/libexec.so --wrapper /usr/lib/x86_64-linux-gnu/bear/w
      ├─ intercept --library /usr/$LIB/bear/libexec.so --wrapper /usr/lib/x86_64-linux-gnu/bear/w

The bear command is the correct one. Removing the debian package seems to correct this, so maybe there's something to be fixed in the search routine?

The process never seems to finish, I don't see any CPU activity, still have plenty of free RAM... So not sure what's wrong.:

    1[||                                                                         1.3%] Tasks: 166, 1077 thr, 232 kthr; 1 running
    2[|                                                                          0.6%] Load average: 1.24 4.76 4.86 
    3[|                                                                          0.6%] Uptime: 17:45:36
    4[|||                                                                        2.6%]
    5[|                                                                          0.6%]
    6[||                                                                         1.3%]
    7[|||                                                                        1.9%]
    8[|||                                                                        2.5%]
    9[|||                                                                        1.9%]
   10[|||                                                                        1.9%]
   11[|                                                                          0.6%]
   12[|                                                                          0.6%]
  Mem[||||||||||||||||||||||||||||||||||||                                5.12G/47.0G]
  Swp[                                                                       0K/48.8G]

    PID△USER       PRI  NI  VIRT   RES   SHR S  CPU% MEM%   TIME+  Command                                                                                                   
3212785 dkm         20   0  6284  3944  3600 S   0.0  0.0  0:00.00 │  │  └─ bear -- make -j8 all                                                                             
3212786 dkm         20   0 3771M 43836 15588 S   0.0  0.1  1:11.61 │  │     └─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local/st
3212787 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.03 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212788 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212789 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:01.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212790 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212791 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212792 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212793 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212794 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
3212795 dkm         20   0 3771M 43836 15588 S   0.0  0.1  0:00.00 │  │        ├─ intercept --library /home/dkm/local/stow/bear/lib/bear/libexec.so --wrapper /home/dkm/local
....

@snprajwal
Copy link

snprajwal commented Dec 29, 2022

+1, I'm facing the same issue with gccrs. If I use bear -- make -j8, the compilation succeeds, but the process never exits and like @dkm said, no CPU or RAM consumed either. It just sorta indefinitely hangs. I also noticed that the compile_commands.events.json was around 253M for me. Upon CTRL-C-ing the process, the compile_commands.json contained 14 entries.

@rizsotto
Copy link
Owner

Thanks guys for this report. Is there a way that you could identify the command that hung the build? And create a minimal example which could help me to reproduce this error? That would be a great help to fix this bug.

@dkm
Copy link

dkm commented Jan 4, 2023

Sure, I'll see what I can find, thanks @rizsotto

@dkm
Copy link

dkm commented Jan 4, 2023

Is there a way to see what is being executed? I have a stuck bear but don't really know how to dig it :)

@snprajwal
Copy link

Is there a way to see what is being executed? I have a stuck bear but don't really know how to dig it :)

Maybe try bear -vvvv -- make?

@dkm
Copy link

dkm commented Jan 4, 2023

I've got a stuck bear, with the last lines on the term being:

[16:48:00.206738, cs, 2849623] [pid: 2849622] recognition failed: No tools recognize this execution.                                                                                                                
[16:48:00.207043, cs, 2849623] compilation entries created. [size: 0]                                                                                                                                               
[16:48:00.207048, cs, 2849623] compilation entries to output. [size: 0]                                                                                                                                             
[16:48:00.207176, cs, 2849623] compilation entries written. [size: 0]                                                                                                                                               
[16:48:00.207186, cs, 2849623] succeeded with: 0                                                                                                                                                                    
[16:48:00.208142, br, 2694891] Process wait request: done. [pid: 2849623]                                                                                                                                           
[16:48:00.208207, br, 2694891] Running citnames finished. [Exited with 0]                                                                                                                                           
[16:48:00.224768, br, 2694891] succeeded with: 0 

The log is rather big with all my env, so not very comfortable putting it here. I can give it to someone for debuging :)

The last command seems to be:

[16:48:00.206412, cs, 2849623] [pid: 2849622] execution: {"executable":"/bin/bash","arguments":["/bin/bash","-c","test -f config.h || make \"AR_FLAGS=rc\" \"CC_FOR_BUILD=gcc\" \"CFLAGS=-g -O2  -m32\" \"CXXFLAGS=-
g -O2 -D_GNU_SOURCE  -m32\" \"CFLAGS_FOR_BUILD=-g -O2\" \"CFLAGS_FOR_TARGET=-g -O2\" \"INSTALL=/usr/bin/install -c\" \"INSTALL_DATA=/usr/bin/install -c -m 644\" \"INSTALL_PROGRAM=/usr/bin/install -c\" \"INSTALL_S
CRIPT=/usr/bin/install -c\" \"JC1FLAGS=\" \"LDFLAGS=-m32\" \"LIBCFLAGS=-g -O2  -m32\" \"LIBCFLAGS_FOR_TARGET=-g -O2\" \"MAKE=make\" \"MAKEINFO=makeinfo --split-size=5000000   \" \"PICFLAG=\"
...

@rizsotto
Copy link
Owner

rizsotto commented Jan 4, 2023

Thanks @dkm for this update.

Will test if that command alone can cause the build stuck.

But what I find strange is the output you've pasted here reports the whole build process was finished. The intercept and citnames processes are finished. And even bear finished, which was calling these processes. (The succeeded with... lines are literally the last lines in the main function of these tools.)

@dkm
Copy link

dkm commented Jan 5, 2023

Oh, maybe some thread was started with incorrect parameter and is preventing the process to finish until it is joined?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need help This is something that I can't do myself.
Projects
None yet
Development

No branches or pull requests

6 participants