Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to wait for custom startup script to finish before launching the agent #346

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

caseyduquettesc
Copy link

@caseyduquettesc caseyduquettesc commented Jul 19, 2022

Fixes #338

Adds a new option to the instance configuration that enables waiting to launch the agent until google-startup-scripts has finished running.

image

Agent log - launcher checks startup status and it's already done
Jul 19, 2022 9:48:14 PM null
FINEST: Instance vl843h is running and ready...
Jul 19, 2022 9:48:14 PM null
INFO: Launching instance: vl843h
Jul 19, 2022 9:48:14 PM null
INFO: bootstrap
Jul 19, 2022 9:48:14 PM null
INFO: Getting keypair...
Jul 19, 2022 9:48:14 PM null
INFO: Using autogenerated keypair
Jul 19, 2022 9:48:14 PM null
INFO: Authenticating as jenkins
Jul 19, 2022 9:48:14 PM null
INFO: Connecting to 10.240.0.130 on port 22, with timeout 10000.
Jul 19, 2022 9:48:24 PM null
INFO: Failed to connect via ssh: The kexTimeout (10000 ms) expired.
Jul 19, 2022 9:48:24 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 9:48:29 PM null
INFO: Connecting to 10.240.0.130 on port 22, with timeout 10000.
Jul 19, 2022 9:48:37 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.130:22
Jul 19, 2022 9:48:37 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 9:48:42 PM null
INFO: Connecting to 10.240.0.130 on port 22, with timeout 10000.
Jul 19, 2022 9:48:42 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.130:22
Jul 19, 2022 9:48:42 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 9:48:47 PM null
INFO: Connecting to 10.240.0.130 on port 22, with timeout 10000.
Jul 19, 2022 9:48:47 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.130:22
Jul 19, 2022 9:48:47 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 9:48:52 PM null
INFO: Connecting to 10.240.0.130 on port 22, with timeout 10000.
Jul 19, 2022 9:48:52 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.130:22
Jul 19, 2022 9:48:52 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 9:48:57 PM null
INFO: Connecting to 10.240.0.130 on port 22, with timeout 10000.
Jul 19, 2022 9:48:57 PM null
INFO: Connected via SSH.
Jul 19, 2022 9:48:57 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 9:48:58 PM null
INFO: Configured startup script is finished.
Jul 19, 2022 9:48:58 PM null
INFO: Verifying: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -fullversion
openjdk full version "11.0.7+10-post-Ubuntu-2ubuntu218.04"
Jul 19, 2022 9:48:58 PM null
INFO: Copying agent.jar to: /var/lib/jenkins/workspace
Jul 19, 2022 9:48:58 PM null
INFO: Launching Jenkins agent via plugin SSH: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -jar /var/lib/jenkins/workspace/agent.jar
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.10.1
This is a Unix agent
Evacuated stdout
Agent successfully connected and online
Agent log - launcher checks startup status and it's sleeping for 60s
Jul 19, 2022 10:46:51 PM null
FINEST: Instance 3pi9fj is running and ready...
Jul 19, 2022 10:46:51 PM null
INFO: Launching instance: 3pi9fj
Jul 19, 2022 10:46:51 PM null
INFO: bootstrap
Jul 19, 2022 10:46:51 PM null
INFO: Getting keypair...
Jul 19, 2022 10:46:51 PM null
INFO: Using autogenerated keypair
Jul 19, 2022 10:46:51 PM null
INFO: Authenticating as jenkins
Jul 19, 2022 10:46:51 PM null
INFO: Connecting to 10.240.1.147 on port 22, with timeout 10000.
Jul 19, 2022 10:47:01 PM null
INFO: Failed to connect via ssh: The kexTimeout (10000 ms) expired.
Jul 19, 2022 10:47:01 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:47:06 PM null
INFO: Connecting to 10.240.1.147 on port 22, with timeout 10000.
Jul 19, 2022 10:47:07 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.1.147:22
Jul 19, 2022 10:47:07 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:47:12 PM null
INFO: Connecting to 10.240.1.147 on port 22, with timeout 10000.
Jul 19, 2022 10:47:12 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.1.147:22
Jul 19, 2022 10:47:12 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:47:17 PM null
INFO: Connecting to 10.240.1.147 on port 22, with timeout 10000.
Jul 19, 2022 10:47:17 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.1.147:22
Jul 19, 2022 10:47:17 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:47:22 PM null
INFO: Connecting to 10.240.1.147 on port 22, with timeout 10000.
Jul 19, 2022 10:47:22 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.1.147:22
Jul 19, 2022 10:47:22 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:47:27 PM null
INFO: Connecting to 10.240.1.147 on port 22, with timeout 10000.
Jul 19, 2022 10:47:28 PM null
INFO: Connected via SSH.
Jul 19, 2022 10:47:28 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:47:28 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:47:28 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:47:33 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:47:33 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:47:33 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:47:38 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:47:38 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:47:38 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:47:43 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:47:43 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:47:43 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:47:48 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:47:48 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:47:48 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:47:53 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:47:53 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:47:53 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:47:58 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:47:58 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:47:58 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:03 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:03 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:03 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:08 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:08 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:08 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:13 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:13 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:13 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:18 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:18 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:18 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:23 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:23 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:23 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:28 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:28 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:28 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:33 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:33 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:33 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:38 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:38 PM null
WARNING: google-startup-scripts has not finished
Jul 19, 2022 10:48:38 PM null
INFO: Waiting for the startup script to finish. Sleeping 5.
Jul 19, 2022 10:48:43 PM null
INFO: Verifying: exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d "=" -f 2)
Jul 19, 2022 10:48:43 PM null
INFO: Configured startup script is finished.
Jul 19, 2022 10:48:43 PM null
INFO: Verifying: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -fullversion
openjdk full version "11.0.7+10-post-Ubuntu-2ubuntu218.04"
Jul 19, 2022 10:48:43 PM null
INFO: Copying agent.jar to: /var/lib/jenkins/workspace
Jul 19, 2022 10:48:43 PM null
INFO: Launching Jenkins agent via plugin SSH: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -jar /var/lib/jenkins/workspace/agent.jar
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.10.1
This is a Unix agent
Evacuated stdout
Agent successfully connected and online
Agent log - option disabled and launcher does not check startup status
Jul 19, 2022 10:54:22 PM null
FINEST: Instance o8pom3 is running and ready...
Jul 19, 2022 10:54:22 PM null
INFO: Launching instance: o8pom3
Jul 19, 2022 10:54:22 PM null
INFO: bootstrap
Jul 19, 2022 10:54:22 PM null
INFO: Getting keypair...
Jul 19, 2022 10:54:22 PM null
INFO: Using autogenerated keypair
Jul 19, 2022 10:54:22 PM null
INFO: Authenticating as jenkins
Jul 19, 2022 10:54:22 PM null
INFO: Connecting to 10.240.0.11 on port 22, with timeout 10000.
Jul 19, 2022 10:54:32 PM null
INFO: Failed to connect via ssh: The kexTimeout (10000 ms) expired.
Jul 19, 2022 10:54:32 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:54:37 PM null
INFO: Connecting to 10.240.0.11 on port 22, with timeout 10000.
Jul 19, 2022 10:54:38 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.11:22
Jul 19, 2022 10:54:38 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:54:43 PM null
INFO: Connecting to 10.240.0.11 on port 22, with timeout 10000.
Jul 19, 2022 10:54:43 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.11:22
Jul 19, 2022 10:54:43 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:54:48 PM null
INFO: Connecting to 10.240.0.11 on port 22, with timeout 10000.
Jul 19, 2022 10:54:48 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.11:22
Jul 19, 2022 10:54:48 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:54:53 PM null
INFO: Connecting to 10.240.0.11 on port 22, with timeout 10000.
Jul 19, 2022 10:54:54 PM null
INFO: Failed to connect via ssh: There was a problem while connecting to 10.240.0.11:22
Jul 19, 2022 10:54:54 PM null
INFO: Waiting for SSH to come up. Sleeping 5.
Jul 19, 2022 10:54:59 PM null
INFO: Connecting to 10.240.0.11 on port 22, with timeout 10000.
Jul 19, 2022 10:54:59 PM null
INFO: Connected via SSH.
Jul 19, 2022 10:54:59 PM null
INFO: Verifying: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -fullversion
openjdk full version "11.0.7+10-post-Ubuntu-2ubuntu218.04"
Jul 19, 2022 10:54:59 PM null
INFO: Copying agent.jar to: /var/lib/jenkins/workspace
Jul 19, 2022 10:54:59 PM null
INFO: Launching Jenkins agent via plugin SSH: /usr/lib/jvm/java-11-openjdk-amd64/bin/java -jar /var/lib/jenkins/workspace/agent.jar
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 4.10.1
This is a Unix agent
Evacuated stdout
Agent successfully connected and online
  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

Comment on lines 129 to 132
// default value will be 0 until it finishes
"exit $(systemctl show google-startup-scripts --property ExecMainExitTimestampMonotonic | cut -d \"=\" -f 2)",
// not "initializing" or "starting" - benefit is doesn't require google-startup-scripts
// "[ \"$(systemctl is-system-running)\" != \"starting\" ]",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we think it's better to check that google-startup-scripts has finished or that systemctl is-system-running does not return "initializing" or "starting"? Those seem to be the two best ways on my ubuntu 18/20 machines, but curious what others think here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @caseyduquettesc thank you. I think it's just fine to check google-startup-scripts has finished.

When can this be merged? I was hoping I could use it pretty soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a project maintainer to review. I'll try asking in the slack channel

@caseyduquettesc caseyduquettesc marked this pull request as ready for review July 20, 2022 06:11
@caseyduquettesc
Copy link
Author

Sorry for the direct ping, but @donmccasland or @alecharp do you know who would be best to review this? Thanks

@TheSushiChef
Copy link

Is this a feature coming any time soon? We'd really love to have this feature as we currently hit a race condition where the pipeline starts building the agent before the startup script finishes logging in and breaks on first execution.

@ogonzalez-sd
Copy link

This would be great. Our current workaround is to have the script stop sshd, then start it again at the end.

Thank you!

@MTSchnepper
Copy link

Any chance we could get codeowners to take a look at this PR? This feature would solve a lot of problems we're facing with setting up Jenkins on GCE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Option to delay agent connection to allow custom startup script to finish
5 participants