Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel getting stuck while building #748

Closed
mnett opened this issue Jan 2, 2016 · 6 comments
Closed

Bazel getting stuck while building #748

mnett opened this issue Jan 2, 2016 · 6 comments

Comments

@mnett
Copy link

mnett commented Jan 2, 2016

I've been having trouble building things with Bazel lately. The issue manifests itself with Bazel appearing to be stuck (with no CPU load) at essentially random build steps (including the loading and analysis phase). This happens both when I attempt to build targets from my own repository, as well as in the bootstrap step in the Bazel installation; I've also tried with releases 0.1.0 and 0.1.2 with no success this far.

For example, when running ./compile.sh from HEAD just now I get stuck with the following (aborted after several minutes without any progress):

[mnett@singularity ~/Development/bazel] (master) Sat Jan 02 22:40:25 $ ./compile.sh                                                                                    
INFO: You can skip this first step by providing a path to the bazel binary as second argument:                                                                         
INFO:    ./compile.sh build /path/to/bazel                                                                                                                             
🍃  Building Bazel from scratch............                                                                                                                             
🍃  Building Bazel with Bazel.                                                                                                                                          
.Extracting Bazel installation...                                                                                                                                      
Sending SIGTERM to previous Bazel server (pid=26879)... done.                                                                                                          
.......                                                                                                                                                                
INFO: Found 1 target...                                                                                                                                                
[2 / 45] Writing file src/main/java/com/google/devtools/build/lib/libbazel-main.jar-2.params

If I interpret the output of strace (see below), this seems to be a deadlock.

[mnett@singularity ~] Sat Jan 02 22:40:58 $ sudo strace -p 26879
[sudo] password for mnett: 
Process 26879 attached
futex(0x7f059b3319d0, FUTEX_WAIT, 26880, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30672, si_uid=1000} ---
futex(0x7f059a5028c0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn()                          = 202
futex(0x7f059b3319d0, FUTEX_WAIT, 26880, NULL <unfinished ...>
+++ exited with 143 +++
[mnett@singularity ~] Sat Jan 02 22:45:38 $ sudo strace -p 30715
Process 30715 attached
futex(0x7f50b497b9d0, FUTEX_WAIT, 30716, NULL

The step at which the deadlock happens seems arbitrary and repeatedly running a build command eventually succeeds. Anyway, I didn't find any issues that seem to be related to this, and at this point I'm unsure of how to further debug the problem.

Do you have any suggestions?

@mnett
Copy link
Author

mnett commented Jan 3, 2016

Quick update on the situation: After cold starting Bazel multiple times (due to rebooting the OS) it seems the problem has disappeared entirely. However, let me leave the bug open for now.

@janakdr
Copy link
Contributor

janakdr commented Jan 4, 2016

Hit ctrl-\ while bazel is "stuck" and you'll get a thread dump.
Posting that here may help us diagnose the problem. I'll update the
user manual to mention this feature.

On Jan 3, 2016 4:52 AM, "Michael Nett" notifications@github.com wrote:

Quick update on the situation: After cold starting Bazel multiple times (due to rebooting the OS) it seems the problem has disappeared entirely. However, let me leave the bug open for now.


Reply to this email directly or view it on GitHub.

@mnett
Copy link
Author

mnett commented Jan 5, 2016

Thanks Janak, that's really helpful. Since the problem has gone away, shall we mark this as not reproducible for the time being?

@philwo
Copy link
Member

philwo commented Jan 5, 2016

@janakdr Could you please close this bug once the documentation has been updated?
@mnett Sounds good - please reopen if you reproduce it again in the future!

janakdr added a commit that referenced this issue Jan 7, 2016
…file, to aid in diagnosing issue #748.

--
MOS_MIGRATED_REVID=111360258
@janakdr janakdr closed this as completed in 6e24a37 Jan 7, 2016
@mnett
Copy link
Author

mnett commented Jan 17, 2016

Hi,

the problem cropped up again and this time I got a the thread dump after getting stuck mid-analysis

INFO: Loading complete.  Analyzing...
^\
Sending SIGQUIT to JVM process 3802 (see /home/mnett/.cache/bazel/_bazel_mnett/630833098f940540d7dc43bb06a2df6a/server/jvm.out).

see https://gist.github.com/mnett/e71b2a9e5a36382194b0

@janakdr can you reopen the issue?

@janakdr
Copy link
Contributor

janakdr commented Jan 17, 2016

Looks like you had two stack traces there. First one looks like it was
during loading. The only work being done is:

"skyframe-evaluator 120" #221 prio=5 os_prio=0 tid=0x00007f0b1802a000
nid=0xfbd runnable [0x00007f0ad1ad8000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:593)
at sun.security.ssl.InputRecord.read(InputRecord.java:532)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)

  • locked <0x00000000e67fbf60> (a java.lang.Object)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
  • locked <0x00000000e68aa508> (a sun.security.ssl.AppInputStream)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
  • locked <0x00000000e68aa520> (a java.io.BufferedInputStream)
    at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:244)
    at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:689)
  • locked <0x00000000e68aa548> (a sun.net.www.http.ChunkedInputStream)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3336)
    at org.eclipse.jgit.util.io.UnionInputStream.read(UnionInputStream.java:145)
    at org.eclipse.jgit.transport.SideBandInputStream.read(SideBandInputStream.java:143)
    at org.eclipse.jgit.transport.PackParser.fill(PackParser.java:1131)
    at org.eclipse.jgit.transport.PackParser.access$000(PackParser.java:97)
    at org.eclipse.jgit.transport.PackParser$InflaterStream.read(PackParser.java:1664)
    at java.io.InputStream.read(InputStream.java:101)
    at org.eclipse.jgit.transport.PackParser.whole(PackParser.java:983)
    at org.eclipse.jgit.transport.PackParser.indexOneObject(PackParser.java:916)
    at org.eclipse.jgit.transport.PackParser.parse(PackParser.java:487)
    at org.eclipse.jgit.internal.storage.file.ObjectDirectoryPackParser.parse(ObjectDirectoryPackParser.java:194)
    at org.eclipse.jgit.transport.PackParser.parse(PackParser.java:448)
    at org.eclipse.jgit.transport.BasePackFetchConnection.receivePack(BasePackFetchConnection.java:762)
    at org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(BasePackFetchConnection.java:363)
    at org.eclipse.jgit.transport.TransportHttp$SmartHttpFetchConnection.doFetch(TransportHttp.java:779)
    at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:301)
    at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:291)
    at org.eclipse.jgit.transport.FetchProcess.fetchObjects(FetchProcess.java:245)
    at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:161)
    at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:122)
    at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1138)
    at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:130)
    at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:193)
    at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:133)
    at com.google.devtools.build.lib.bazel.repository.GitCloneFunction.compute(GitCloneFunction.java:126)
    at com.google.devtools.build.skyframe.ParallelEvaluator$Evaluate.run(ParallelEvaluator.java:938)

So, waiting for an http download, I guess. The second dump is during
analysis, and all the threads appear to be doing useful work. Was the
long delay before Bazel printed the "Analyzing" message, or after?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants