Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved connection progress #7385

Merged
merged 7 commits into from
May 23, 2024
Merged

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented May 15, 2024

Continued progress exploration! I just kinda kept pulling on #7375, and kept finding more things to tidy up πŸŽ‰

Fixes:

This changes a few things about the initial connection:

  • Changes timeouts, to provide a sane upper limit on the amount of time allowed to provision an engine, e.g. pulling+starting a docker container (10 minutes)
  • Splits up the connect step into separate parts - this allows for improved debugability if this step is hanging - we can see exactly which part is being slow.
    • Removes noisy "OK!" message from session connection - this isn't really needed anymore, success is indicated by the completion of the span.
  • Makes sure that Connected to engine messages correctly appear with our pretty progress (they weren't being displayed)
  • Restores support for Connected to cloud messages.
  • Ensures that all logging messages respect the detected terminal color profile.

Before:

$ dagger call --source=.:default helm test -v
βœ” connect 0.8s
βœ” initialize 1.1s
βœ” ModuleSource.resolveFromCaller: ModuleSource! 0.1s
  βœ” upload /home/jedevc/Documents/Projects/dagger from dragon (client id: kh8jcn3x5ql8ba7z8dhurxnzi) (exclude: **/.git) (include: **/go.mod, **/go.sum, **/go.work, **/go.work.sum, **/vendor/, **/*.go, dagger.json, ci/**/*) 0.1s
βœ” ModuleSource.resolveDirectoryFromCaller(path: ".", viewName: "default"): Directory! 0.3s
  βœ” upload /home/jedevc/Documents/Projects/dagger from dragon (client id: kh8jcn3x5ql8ba7z8dhurxnzi) (exclude: bin, .git, **/node_modules, **/.venv, **/__pycache__) 0.3s
βœ” dagger(
    source: βœ” blob(digest: "sha256:cbe8a7976450916e3ed0ebadd6eb143df283cc4f4610601bbbb0a32a48662059", mediaType: "application/vnd.oci.image.layer.v1.tar+zstd", size: 5153392, uncompressed: "sha256:a6464e8f195d11a5f4f4588cd8ab8362b85ed8870953b8d66863c637ab9a3c80"): Directory! 0.0s
  ): Dagger! 0.8s
βœ” Dagger.helm: DaggerHelm! 0.6s
βœ” DaggerHelm.test: Void 1.5s
  βœ” Container.from(address: "alpine/helm:3.12.3"): Container! 0.8s
    βœ” HTTP GET 0.3s
    βœ” remotes.docker.resolver.HTTPRequest 0.4s
      βœ” HTTP HEAD 0.4s

<nil>

After:

$ dagger call --source=.:default helm test -v
12:11:45 INF Connected to cloud url=https://dagger.cloud/traces/8d82a45c63f06f1cdd68607d76263869
12:11:46 INF Connected to engine name=cda0040f8104 version=v0.11.4

βœ” connect 0.8s
  βœ” starting engine 0.6s
  βœ” connecting to engine 0.1s
  βœ” starting session 0.2s
βœ” initialize 0.9s
βœ” dagger(
    source: βœ” blob(digest: "sha256:cbe8a7976450916e3ed0ebadd6eb143df283cc4f4610601bbbb0a32a48662059", mediaType: "application/vnd.oci.image.layer.v1.tar+zstd", size: 5153392, uncompressed: "sha256:a6464e8f195d11a5f4f4588cd8ab8362b85ed8870953b8d66863c637ab9a3c80"): Directory! 0.0s
  ): Dagger! 0.6s
βœ” Dagger.helm: DaggerHelm! 0.6s
βœ” DaggerHelm.test: Void 1.1s
  βœ” Container.from(address: "alpine/helm:3.12.3"): Container! 0.4s
    βœ” remotes.docker.resolver.HTTPRequest 0.4s
      βœ” HTTP HEAD 0.4s

<nil>

@jedevc jedevc requested review from vito and aluzzardi May 15, 2024 11:18
"go.opentelemetry.io/otel/trace"
)

func URLForTrace(ctx context.Context) (string, bool) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun little idea about this - this function could imply the existence of a URLForSpan function πŸŽ‰

On a failure, we could print out the URL of a failed span (or even use OSC 8 to display it inline!

telemetry/url.go Outdated Show resolved Hide resolved
This improves visibility of what happens at each step, which can really
help users if the engine is failing to start for some reason.

Signed-off-by: Justin Chadwell <me@jedevc.com>
If the engine can't be provision in 10 minutes, it's likely something is
wrong. 10 minutes is chosen because it might be feasible that on a slow
internet connection, the image might take several minutes to pull.

Signed-off-by: Justin Chadwell <me@jedevc.com>
This is already visually indicated with the wrapping span completing in
a progress view - there's no need to show this additional little
message.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
If the primary span logs have mixed stdout/stderr, we were accidentally
not printing out the required trailing newline:

	12:35:47 WRN failed to resolve image; falling back to leftover engine error="GET https://registry.dagger.io/v2/engine/manifests/dev-fc0a2c7a85e412b1995e2f088571b56415a3d807: MANIFEST_UNKNOWN: manifest unknown"
	<nil>%

This was because we were incorrectly checking for the presence of a
newline - we should be checking if the last log element has a trailing
newline, not that *any* log element contains a newline.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
@jedevc jedevc force-pushed the engine-starting-telemetry branch from e2b5dc1 to b84b9ba Compare May 21, 2024 14:35
Signed-off-by: Justin Chadwell <me@jedevc.com>
@@ -243,6 +228,7 @@ func (fe *Frontend) renderMessages(out *termenv.Output, full bool) (bool, error)
fe.messagesView.SetHeight(10)
}
_, err := fmt.Fprint(out, fe.messagesView.View())
fmt.Fprintln(out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this print on purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup without this, there's no break between the messages view and the progress.

When we finalRender later, we already have this, so we need this one for consistency.

Copy link
Member

@gerhard gerhard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The Engine info & Dagger Cloud Trace URL are particular nice touches. The structure of the output is also much neater. Great job πŸ‘

First 1m 30s - top is this change, bottom is v0.11.4

pr-7385-first-1m30s-nvidia.mp4

Last 1m - top is this change, bottom is v0.11.4

pr-7385-last-1m.mp4

@jedevc jedevc merged commit 417743d into dagger:main May 23, 2024
93 checks passed
@jedevc jedevc deleted the engine-starting-telemetry branch May 23, 2024 08:08
@helderco
Copy link
Contributor

Sorry for being late here, but I think the "Connected to engine" line should be at debug level.

@jedevc
Copy link
Member Author

jedevc commented May 23, 2024

@helderco I think that's fair - the only issue is that then this will only display with the --debug flag, and turning that on produces far too much internal logging. Ideally, some of the internal logging would be set to our ExtraDebug level/trace.

Previously, we always displayed the engine connection information in plain progress, it was just missing in the TUI progress.

Is this worth blocking on? I don't want to remove it in the plain progress, I think it would be a regression there, but I can definitely see it being not as useful for the local TUI (it gets repetitive there).

@helderco
Copy link
Contributor

helderco commented May 23, 2024

The thing is, we used to have both the version and the cloud url in a single, more compact line, but it was constantly being flagged as noisy, especially by Solomon. Maybe having it at the top is better, but at least the version is not necessary to see every time. Doesn't have to be a debug record, can you put it at -v?

@jedevc
Copy link
Member Author

jedevc commented May 23, 2024

Yeah, that's possible requires a little bit of undoing some of the refactors of this PR - I've put something somewhat reasonable in c2f2cec (#7272).

vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 24, 2024
* fix: split out connect step to separate steps

This improves visibility of what happens at each step, which can really
help users if the engine is failing to start for some reason.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: add timeout of 10 minutes to start engine

If the engine can't be provision in 10 minutes, it's likely something is
wrong. 10 minutes is chosen because it might be feasible that on a slow
internet connection, the image might take several minutes to pull.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* chore: remove OK! message when starting the engine

This is already visually indicated with the wrapping span completing in
a progress view - there's no need to show this additional little
message.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* feat: log engine/cloud connection logs

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: trailing newline with combined stdout/stderr

If the primary span logs have mixed stdout/stderr, we were accidentally
not printing out the required trailing newline:

	12:35:47 WRN failed to resolve image; falling back to leftover engine error="GET https://registry.dagger.io/v2/engine/manifests/dev-fc0a2c7a85e412b1995e2f088571b56415a3d807: MANIFEST_UNKNOWN: manifest unknown"
	<nil>%

This was because we were incorrectly checking for the presence of a
newline - we should be checking if the last log element has a trailing
newline, not that *any* log element contains a newline.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: ensure terminal profile propagates to context/global loggers

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: render messages first to match finalRender style

Signed-off-by: Justin Chadwell <me@jedevc.com>

---------

Signed-off-by: Justin Chadwell <me@jedevc.com>
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 24, 2024
* fix: split out connect step to separate steps

This improves visibility of what happens at each step, which can really
help users if the engine is failing to start for some reason.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: add timeout of 10 minutes to start engine

If the engine can't be provision in 10 minutes, it's likely something is
wrong. 10 minutes is chosen because it might be feasible that on a slow
internet connection, the image might take several minutes to pull.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* chore: remove OK! message when starting the engine

This is already visually indicated with the wrapping span completing in
a progress view - there's no need to show this additional little
message.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* feat: log engine/cloud connection logs

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: trailing newline with combined stdout/stderr

If the primary span logs have mixed stdout/stderr, we were accidentally
not printing out the required trailing newline:

	12:35:47 WRN failed to resolve image; falling back to leftover engine error="GET https://registry.dagger.io/v2/engine/manifests/dev-fc0a2c7a85e412b1995e2f088571b56415a3d807: MANIFEST_UNKNOWN: manifest unknown"
	<nil>%

This was because we were incorrectly checking for the presence of a
newline - we should be checking if the last log element has a trailing
newline, not that *any* log element contains a newline.

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: ensure terminal profile propagates to context/global loggers

Signed-off-by: Justin Chadwell <me@jedevc.com>

* fix: render messages first to match finalRender style

Signed-off-by: Justin Chadwell <me@jedevc.com>

---------

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Vikram Vaswani <vikram@dagger.io>
@aluzzardi
Copy link
Member

@jedevc This is awesome, thank you!

One post merge nit: If we're running with a "legacy" token, instead of not printing, any chance we could say something like "Blah blah trace sent to https://dagger.cloud/. Please rotate your Cloud token to get the trace URL from the CLI"

@aluzzardi
Copy link
Member

@vito @jedevc Some minor UX nitpicks: When using tty progress, the URL gets "wiped out" eventually. And with plain progress it's at the very beginning, which works well in CI logs starting from the top but it's harder to find in a terminal.

Thoughts on how we could make it more prominent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants