Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins: Retry Failed Builds #2967

Closed
3 of 11 tasks
sanssecours opened this issue Sep 16, 2019 · 36 comments
Closed
3 of 11 tasks

Jenkins: Retry Failed Builds #2967

sanssecours opened this issue Sep 16, 2019 · 36 comments

Comments

@sanssecours
Copy link
Member

sanssecours commented Sep 16, 2019

Description

Currently the Jenkins build fails quite often for various reasons. This issue should list some of the problems that currently include:

.

Failures

Branch Failure Reason Failed Build Job/Stage
PR #2932 Maven build debian-unstable-clang-asan
master Homepage build Deploy Website
master Homepage build Deploy Website
PR #2945 Internal compiler error build-elektra-web-base
master Cmake install failure debian-stretch-full
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
master Workspace removal failure Main builds
PR #2945 Haskell build failure debian-stretch-full-optimizations-off
PR #2945 APT install failed build-elektra-website
PR #2932 Maven build debian-unstable-clang-asan
master Timeout debian-stretch-full-mmap-asan
PR #2975 Timeout debian-buster-mingw-w64
master Homepage build Deploy Website
master Homepage build Deploy Website
master Timeout debian-buster-full
master Haskell build failure debian-stretch-full-ini
master Timeout debian-unstable-full
master Failing tests debian-buster-full
master Internal compiler error build-elektra-web-base
master Homepage build Deploy Website
master Homepage build Deploy Website
master Homepage build Deploy Website
master Homepage build Deploy Website
PR #2998 Timeout, Connection problems build-elektra-web-base, debian-buster-full-i386
master Maven build debian-unstable-clang-asan
PR #2998 Timeout build-elektra-website-backend
master Connection problems build-elektra-web-base
master Homepage build Deploy Website
master Maven build debian-unstable-full-clang
master Git commit failure buildPackage/debian/buster
master Git commit failure buildPackage/debian/buster
master Git commit failure buildPackage/debian/buster, buildPackage/debian/stretch
master Git commit failure buildPackage/debian/buster
master Git commit failure buildPackage/debian/buster

Failing Tests

Test Location Times Failed
check_external_example_codegen_econf debian-buster-full 1
check_external_example_codegen_menu debian-buster-full 1
check_external_example_codegen_tree debian-buster-full 1
check_external_example_highlevel debian-buster-full 1
check_spec debian-buster-full 1
testkdb_ensure debian-buster-full 1
@markus2330
Copy link
Contributor

Thank you for collecting the issues!

For the maven builds we already have an issue: #2855

@sanssecours
Copy link
Member Author

For the maven builds we already have an issue: #2855

I know 😊. I already added a link in the issue description.

@markus2330
Copy link
Contributor

Thank you for this elaborate research. We now need to fix one issue after the other.

@markus2330
Copy link
Contributor

For the Haskell problems we can remove the haskell bindings/plugins. They are not maintained anyway.

@sanssecours sanssecours added this to the 1.0.0 milestone Sep 26, 2019
@markus2330 markus2330 removed this from the 1.0.0 milestone Sep 26, 2019
@markus2330
Copy link
Contributor

Haskell will be removed in #3017

@markus2330
Copy link
Contributor

The failures with docker pull failing in the website stage occurs quite often now.

@markus2330 markus2330 mentioned this issue Oct 6, 2019
31 tasks
@dominicjaeger
Copy link
Contributor

I just got connection problems for build-elektra-web-base, too.

3d070e3209ce: Retrying in 1 second

error creating overlay mount to /home/_docker/overlay2/e9563564b9365114c47d90b7e8d307565225097a525e6b1b866a2da2877b2aa8/merged: device or resource busy

script returned exit code 1

This is a full log.

@markus2330 markus2330 mentioned this issue Oct 13, 2019
14 tasks
@dominicjaeger
Copy link
Contributor

The failures with docker pull failing in the website stage occurs quite often now.

Is this all the retrying and waiting after Pulling from build-elektra-web-base (log)?

Additionally, I think this error is new: test_service_convertengine fails during Starting build/hub.libelektra.org/build-elektra-website-backend (log 2)

@markus2330
Copy link
Contributor

Yes, I agree test_service_convertengine is not reported here yet. Actually we can disable the test as the service is not modified anyway.

@sanssecours is there some procedure how to add new tests in the above list?

@sanssecours
Copy link
Member Author

@sanssecours is there some procedure how to add new tests in the above list?

Nope. I already gave up on modifying the list, since the Jenkins build fails too often. I would recommend we just open an issue for each specific problem.

@markus2330
Copy link
Contributor

For issues related to source code I agree. For the issues related to docker/jenkins instability it is enough to collect issues here as it is very limited what we can do next to the migration we already do but unfortunately takes longer as expected. It would be nice if @Mistreated could give more information about the status, maybe in #160.

@markus2330
Copy link
Contributor

Additionally, I think this error is new: test_service_convertengine fails during Starting build/hub.libelektra.org/build-elektra-website-backend (log 2)

Can you please report that separately? The fix is to disable the tests.

@dominicjaeger
Copy link
Contributor

Can you please report that separately?

Done, see #3086

@markus2330
Copy link
Contributor

I think our best guess to make our lives much easier is to "fix" these problems using https://wiki.jenkins.io/display/JENKINS/Naginator+Plugin

Then Jenkins will restart failed jobs several times. I think we could try 5 restarts before giving up?

@Mistreated Can you implement this also on the old server? Or is this too risky?

Before we implement this, however, we need the new Jenkins Node as otherwise the queue will get too long.

@dbulatovic
Copy link
Contributor

I updated the node. It should work now. If something goes wrong you can update me here again.

@sanssecours
Copy link
Member Author

If something goes wrong you can update me here again.

Looks like docker pull fails on hetzner-jenkins1, since the node has not enough free space:

Cannot contact hetzner-jenkins1: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
failed to register layer: ApplyLayer exit status 1 stdout: stderr: write /usr/lib/git-core/git-credential-store: disk quota exceeded

.

@dbulatovic
Copy link
Contributor

Looks like docker pull fails on hetzner-jenkins1, since the node has not enough free space:

Node updated.

@sanssecours
Copy link
Member Author

Build jobs on hetzner-jenkins1 seem to fail, because of permission related problems:

Resource: Could not create directory '/.config'. Reason: Permission denied. Identity: uid: 47000, euid: 47000, gid: 47000, egid: 47000

.

@dbulatovic
Copy link
Contributor

I updated the Node, again, there shouldnt be any permission issues anymore.

@dbulatovic
Copy link
Contributor

Why does Jenkins wants to build a '/.config' and not just '.config' directory?
There is a .config directory inside '/home/jenkins/' but he wants to make .config folder in '/'.

I dont think user 'jenkins' should be able to do that.

@markus2330
Copy link
Contributor

@Mistreated please also make a PR to actually test if the builds work now.

Why does Jenkins wants to build a '/.config' and not just '.config' directory?
There is a .config directory inside '/home/jenkins/' but he wants to make .config folder in '/'.

This might happen if the home directory of the user is /. Did you look into /etc/passwd, maybe something is wrong there?

@dbulatovic
Copy link
Contributor

This might happen if the home directory of the user is /. Did you look into /etc/passwd, maybe something is wrong there?

'jenkins:x:47000:47000::/home/jenkins:/bin/sh'

All looks fine, even in the logs of the node:

'HOME='/home/jenkins' '
'NOTE: Relative remote path resolved to: /home/jenkins/.'

@markus2330
Copy link
Contributor

It would be easier to debug to see a PR with the whole log.

@dbulatovic
Copy link
Contributor

Master node is down.

It would be easier to debug to see a PR with the whole log.

#3134

@sanssecours
Copy link
Member Author

Master node is down.

Thank you for the information. I deleted all log information for old pull requests and reenabled the node. Unfortunately the amount of free space on the Jenkins master is still very low (~ 3.9G).

@markus2330
Copy link
Contributor

@Mistreated I moved the discussion about the hetzner node to #3138. This issue is about temporary failures in the build server, not about wrong setup of the build server.

@sanssecours
Copy link
Member Author

Looks like building Docker images does not work on hetzner-jenkins1:

stderr: error: could not lock config file .git/config: Disk quota exceeded

. I disabled the node.

@dbulatovic
Copy link
Contributor

It just Disk quota exceeded , I did not want to overkill it with memory. I cleaned it up now. Its up again.

@markus2330
Copy link
Contributor

markus2330 commented Nov 8, 2019

Two more tests that sometimes fail (#3168):

 27/134 MemCheck  #23: testcpp_contextual_thread ........***Exception: Other  2.59 sec
Running main() from /opt/gtest/googletest/src/gtest_main.cc
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from test_contextual_thread
[ RUN      ] test_contextual_thread.instanciation

/home/jenkins/workspace/libelektra_PR-3168-L5JHIPUUQR3TWFGKHQIDK6HHW6QAMSQXWJC5ZUZMBLDMLTYA2ENA@2/src/bindings/cpp/tests/testcpp_contextual_thread.cpp:70: Failure

Expected equality of these values:
  ks.lookup ("user/hello").getString ()
    Which is: "8"
  "5"
terminate called without an active exception
60/254 Test  #57: testio_glib .................................***Failed    5.08 sec

BINDING TEST-SUITE

==================

test basics
test idle
test timer
testTimerShouldCallbackOnce (warning): measured 316ms, expected 250ms - deviation 66ms.
testTimerShouldCallbackAtIntervals (warning): measured 343ms, expected 250ms - deviation 93ms.
testTimerShouldCallbackAtIntervals (warning): measured 322ms, expected 250ms - deviation 72ms.
testTimerShouldCallbackAtIntervals (warning): measured 338ms, expected 250ms - deviation 88ms.
../src/bindings/io/test/test_timer.c:273: error in testTimerShouldChangeInterval: timer was not called the required amount of times
test file descriptor
test mix

@markus2330
Copy link
Contributor

Yet another error in https://build.libelektra.org/jenkins/blue/organizations/jenkins/libelektra/detail/master/12/pipeline/

Step 12/31 : RUN curl -o cppcms-${CPPCMS_VERSION}.tar.bz -L         "https://sourceforge.net/projects/cppcms/files/cppcms/${CPPCMS_VERSION}/cppcms-${CPPCMS_VERSION}.tar.bz2/download"     && tar -xjvf cppcms-${CPPCMS_VERSION}.tar.bz     && mkdir cppcms-${CPPCMS_VERSION}/build     && cd cppcms-${CPPCMS_VERSION}/build     && cmake ..     && make -j ${PARALLEL}     && make install     && cd /app/deps     && rm -Rf cppcms-${CPPCMS_VERSION}

 ---> Running in f5ed5e42a480

curl: (92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)

The command '/bin/sh -c curl -o cppcms-${CPPCMS_VERSION}.tar.bz -L         "https://sourceforge.net/projects/cppcms/files/cppcms/${CPPCMS_VERSION}/cppcms-${CPPCMS_VERSION}.tar.bz2/download"     && tar -xjvf cppcms-${CPPCMS_VERSION}.tar.bz     && mkdir cppcms-${CPPCMS_VERSION}/build     && cd cppcms-${CPPCMS_VERSION}/build     && cmake ..     && make -j ${PARALLEL}     && make install     && cd /app/deps     && rm -Rf cppcms-${CPPCMS_VERSION}' returned a non-zero code: 92

script returned exit code 92

I am afraid https://wiki.jenkins.io/display/JENKINS/Naginator+Plugin is the only bigger step forwards.

Unfortunately, it will not fix the problems for Travis or Cirrus.

@markus2330 markus2330 changed the title Jenkins: Build Fails Often Jenkins: Retry Failed Builds Nov 10, 2019
@dominicjaeger
Copy link
Contributor

Do we updated "Times failed" in the start post? check_external_example_codegen_econfis happening quite often currently.

@markus2330
Copy link
Contributor

Trying to update the start post or trying to fix all these issues is hopeless. We need automatic retrying. I hope @Mistreated will implement this soon on our new server.

markus2330 pushed a commit that referenced this issue Nov 16, 2019
@markus2330 markus2330 mentioned this issue Nov 16, 2019
16 tasks
@markus2330
Copy link
Contributor

What do you think about #3224?

markus2330 pushed a commit that referenced this issue Nov 16, 2019
darddan pushed a commit to darddan/libelektra that referenced this issue Nov 17, 2019
darddan pushed a commit to darddan/libelektra that referenced this issue Nov 17, 2019
@markus2330
Copy link
Contributor

markus2330 commented Apr 12, 2020

Problems are solved now. Please open new issues if builds still fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants