Individual log is not retrievable via the name returned in the log list #514

iainsproat · 2023-06-26T17:37:13Z

Expected Behavior

curl -k https://speckle.internal:8088/apis/results.tekton.dev/v1alpha2/parents/bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs

This results in the following as expected:

{"records":[{"name":"bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae", "id":"58f21208-d056-4506-a1c8-124b3e34c830", "uid":"58f21208-d056-4506-a1c8-124b3e34c830", "data":{"type":"results.tekton.dev/v1alpha2.Log", "value":"base64encodedlogs"}, "etag":"58f21208-d056-4506-a1c8-124b3e34c830-1687799787144731088", "createdTime":"2023-06-26T17:16:27.144731Z", "createTime":"2023-06-26T17:16:27.144731Z", "updatedTime":"2023-06-26T17:16:27.144731Z", "updateTime":"2023-06-26T17:16:27.144731Z"}], "nextPageToken":""}`

I expect that using the .records[0].name would return the individual log record that can be seen in the list, e.g.:

curl -k https://speckle.internal:8088/apis/results.tekton.dev/v1alpha2/parents/bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae

I may be doing something incorrect here, but it seems to match the API documentation.

What I expect to have returned is the following:

{"name":"bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae", "id":"58f21208-d056-4506-a1c8-124b3e34c830", "uid":"58f21208-d056-4506-a1c8-124b3e34c830", "data":{"type":"results.tekton.dev/v1alpha2.Log", "value":"base64encodedlogs"}

Actual Behavior

Retrieving the .records[0].name and attempting to access the resource results in the log record not being found.

curl -k https://speckle.internal:8088/apis/results.tekton.dev/v1alpha2/parents/bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae

{"error":{"code":5,"message":"Log doesn't exist","details":[]}}

Steps to Reproduce the Problem

Steps detailed above

Additional Info

Kubernetes version:

Output of kubectl version:

Client Version: v1.27.3
Kustomize Version: v5.0.1
Server Version: v1.27.3

Tekton Pipeline version:

Client version: 0.31.1
Pipeline version: v0.48.0

The text was updated successfully, but these errors were encountered:

iainsproat · 2023-06-26T17:46:44Z

The API pod logs if this helps:

{"level":"error","ts":1687800526.2499058,"caller":"v1alpha2/logs.go:61","msg":"no logs exist for bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae","stacktrace":"github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).GetL
og\n\tgithub.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:61\ngithub.com/tektoncd/results/proto/v1alpha2/results_go_proto._Logs_GetLog_Handler\n\tgithub.com/tektoncd/results/proto/v1alpha2/results_go_proto/api_grpc.pb.go:579\ngithub.com/grpc-ecosystem/go-grpc-promethe
us.(*ServerMetrics).StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:121\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.
com/grpc-ecosystem/go-grpc-middleware/auth.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/auth/auth.go:66\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/cha
in.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/server_interceptors.go:53\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.co
m/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServ
er.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgo
ogle.golang.org/grpc@v1.53.0/server.go:1627\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/grpc@v1.53.0/server.go:1708\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\tgoogle.golang.org/grpc@v1.53.0/server.go:965"}

sayan-biswas · 2023-06-27T07:15:21Z

@iainsproat Which log type did you set in the config? File or S3
Can you share the log message from UpdateLog method?

sayan-biswas · 2023-06-27T07:30:35Z

Also logs are not enabled by default. Although the API can be enabled from the config, separate configuration are required. For File type, PVC need to be deployed and for S3 type, the rest of the S3 related parameter in the config needs to be set.
There is a PR merged recently which makes this easier through kustomize.
https://github.com/tektoncd/results/blob/main/config/README.md#deploying-with-a-kustomization

iainsproat · 2023-06-27T10:51:15Z

Thanks for responding @sayan-biswas

Your suggestion about looking at the logs for the UpdateLog method pointed me in the right location. I can see that there appears to be a race condition in Tekton Results Watcher between the deletion of the Task Run by Tekton Results Watcher after it has completed and the uploading of logs by the Watcher.

I am running Tekton results watcher with the following flag: -completed_run_grace_period -1s. It has a negative value to immediately 'garbage collect' the completed TaskRuns & PipelineRuns, as suggested by the documentation.

In the Tekton Watcher logs I can see that garbage collection occurring successfully:

{"level":"info","time":"2023-06-27T10:21:24.301Z","logger":"watcher","caller":"dynamic/dynamic.go:216","msg":"Deleting object","commit":"26df57c-dirty","knative.dev/traceid":"e55448ea-474f-449f-8f5e-6b1800c70bb5","knative.dev/key":"789b32327d/hello-task-run","results.tekton.dev/kind":"TaskRun","results.tekton.dev/result":"789b32327d/results/24f8d7c5-0caa-4999-bd4c-52ffedcf6f43","results.tekton.dev/record":"789b32327d/results/24f8d7c5-0caa-4999-bd4c-52ffedcf6f43/records/24f8d7c5-0caa-4999-bd4c-52ffedcf6f43","results.tekton.dev/gracePeriod":"-1s","results.tekton.dev/uid":"24f8d7c5-0caa-4999-bd4c-52ffedcf6f43","results.tekton.dev/time-taken-seconds":0}

However, some time later Tekton Watcher then logs the following failure:

{"level":"error","time":"2023-06-27T10:21:24.320Z","logger":"watcher","caller":"dynamic/dynamic.go:319","msg":"Error streaming log","commit":"26df57c-dirty","knative.dev/traceid":"e55448ea-474f-449f-8f5e-6b1800c70bb5","knative.dev/key":"789b32327d/hello-task-run","results.tekton.dev/kind":"TaskRun","namespace":"789b32327d","kind":"TaskRun","name":"hello-task-run","error":"error reading from tkn reader: Unable to get TaskRun: taskruns.tekton.dev \"hello-task-run\" not found","stacktrace":"github.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).sendLog.func1\n\tgithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic/dynamic.go:319"}

As it is an intermittent issue and not consistent I am confident that S3 is configured correctly. I can query the blob storage provider directly and see the data for other Tekton Task and Pipeline Runs (which presumably won the race condition!) have been stored, and can be queried via the API.

My expectation is that the garbage collection of completed TaskRuns/PipelineRuns would wait until all the logs are confirmed as having been successfully stored before proceeding to delete the completed TaskRun or PipelineRun.

In the meantime, as a workaround, I'll increase the value of -completed_run_grace_period flag to be a larger positive number to allow sufficient time for the logging to completed prior to garbage collection.

I've attached the entire logs output for both the API and Watcher. During this logged period I started Tekton Results, applied a Task, and then applied a TaskRun.
tekton-watcher-logs.txt
tekton-api-logs.txt

sayan-biswas · 2023-06-27T14:26:43Z

@iainsproat Yes, this happens because the log streaming from the watcher to the API server only starts after the PipelineRun/Taskrun moves to completed of failed status. If the pod gets deleted from an aggressive pruning such as immediately then it creates a race condition.
Not sure about what can be done about fetching the logs, but the pruning conditions can be modified to wait for the streaming to complete even if it's set to -1s .

sayan-biswas · 2023-06-27T14:29:57Z

/assign

tekton-robot · 2023-09-25T15:17:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2023-10-25T15:23:01Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

ramessesii2 · 2024-02-05T12:50:52Z

/assign

iainsproat added the kind/bug Categorizes issue or PR as related to a bug. label Jun 26, 2023

tekton-robot assigned sayan-biswas Jun 27, 2023

tekton-robot added the lifecycle/stale label Sep 25, 2023

tekton-robot added lifecycle/rotten and removed lifecycle/stale labels Oct 25, 2023

This was referenced Nov 23, 2023

Add owner references check optional in watcher #672

Merged

[DNM] enable immediate pruning of PipelineRuns in Tekton Results openshift-pipelines/pipeline-service#868

Closed

sayan-biswas removed the lifecycle/rotten label Jan 17, 2024

ramessesii2 mentioned this issue Feb 5, 2024

[WIP] finalizer approach to fix race condition due to pruning in results watcher #703

Open

8 tasks

tekton-robot assigned ramessesii2 Feb 5, 2024

ramessesii2 linked a pull request Feb 16, 2024 that will close this issue

[WIP] Logs API apporach to fix race condition due to pruning in results watcher #713

Open

8 tasks

gabemontero mentioned this issue Mar 13, 2024

fix: add stored status explicitly for logs #704

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Individual log is not retrievable via the name returned in the log list #514

Individual log is not retrievable via the name returned in the log list #514

iainsproat commented Jun 26, 2023

iainsproat commented Jun 26, 2023

sayan-biswas commented Jun 27, 2023

sayan-biswas commented Jun 27, 2023

iainsproat commented Jun 27, 2023 •

edited

sayan-biswas commented Jun 27, 2023 •

edited

sayan-biswas commented Jun 27, 2023

tekton-robot commented Sep 25, 2023

tekton-robot commented Oct 25, 2023

ramessesii2 commented Feb 5, 2024

Individual log is not retrievable via the name returned in the log list #514

Individual log is not retrievable via the name returned in the log list #514

Comments

iainsproat commented Jun 26, 2023

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

iainsproat commented Jun 26, 2023

sayan-biswas commented Jun 27, 2023

sayan-biswas commented Jun 27, 2023

iainsproat commented Jun 27, 2023 • edited

sayan-biswas commented Jun 27, 2023 • edited

sayan-biswas commented Jun 27, 2023

tekton-robot commented Sep 25, 2023

tekton-robot commented Oct 25, 2023

ramessesii2 commented Feb 5, 2024

iainsproat commented Jun 27, 2023 •

edited

sayan-biswas commented Jun 27, 2023 •

edited