Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Individual log is not retrievable via the name returned in the log list #514

Open
iainsproat opened this issue Jun 26, 2023 · 9 comments · May be fixed by #713
Open

Individual log is not retrievable via the name returned in the log list #514

iainsproat opened this issue Jun 26, 2023 · 9 comments · May be fixed by #713
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@iainsproat
Copy link

Expected Behavior

curl -k https://speckle.internal:8088/apis/results.tekton.dev/v1alpha2/parents/bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs

This results in the following as expected:

{"records":[{"name":"bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae", "id":"58f21208-d056-4506-a1c8-124b3e34c830", "uid":"58f21208-d056-4506-a1c8-124b3e34c830", "data":{"type":"results.tekton.dev/v1alpha2.Log", "value":"base64encodedlogs"}, "etag":"58f21208-d056-4506-a1c8-124b3e34c830-1687799787144731088", "createdTime":"2023-06-26T17:16:27.144731Z", "createTime":"2023-06-26T17:16:27.144731Z", "updatedTime":"2023-06-26T17:16:27.144731Z", "updateTime":"2023-06-26T17:16:27.144731Z"}], "nextPageToken":""}`

I expect that using the .records[0].name would return the individual log record that can be seen in the list, e.g.:

curl -k https://speckle.internal:8088/apis/results.tekton.dev/v1alpha2/parents/bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae

I may be doing something incorrect here, but it seems to match the API documentation.

What I expect to have returned is the following:

{"name":"bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae", "id":"58f21208-d056-4506-a1c8-124b3e34c830", "uid":"58f21208-d056-4506-a1c8-124b3e34c830", "data":{"type":"results.tekton.dev/v1alpha2.Log", "value":"base64encodedlogs"}

Actual Behavior

Retrieving the .records[0].name and attempting to access the resource results in the log record not being found.

curl -k https://speckle.internal:8088/apis/results.tekton.dev/v1alpha2/parents/bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae
{"error":{"code":5,"message":"Log doesn't exist","details":[]}}

Steps to Reproduce the Problem

Steps detailed above

Additional Info

  • Kubernetes version:

    Output of kubectl version:

    Client Version: v1.27.3
    Kustomize Version: v5.0.1
    Server Version: v1.27.3
    
  • Tekton Pipeline version:

    Client version: 0.31.1
    Pipeline version: v0.48.0
    
@iainsproat iainsproat added the kind/bug Categorizes issue or PR as related to a bug. label Jun 26, 2023
@iainsproat
Copy link
Author

The API pod logs if this helps:

{"level":"error","ts":1687800526.2499058,"caller":"v1alpha2/logs.go:61","msg":"no logs exist for bfeb4b5d45/results/47373980-cbcb-4b57-8746-40cc5fd75649/logs/77a71268-1422-39f4-8b37-8e5264c6f2ae","stacktrace":"github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).GetL
og\n\tgithub.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:61\ngithub.com/tektoncd/results/proto/v1alpha2/results_go_proto._Logs_GetLog_Handler\n\tgithub.com/tektoncd/results/proto/v1alpha2/results_go_proto/api_grpc.pb.go:579\ngithub.com/grpc-ecosystem/go-grpc-promethe
us.(*ServerMetrics).StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:121\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.
com/grpc-ecosystem/go-grpc-middleware/auth.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/auth/auth.go:66\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/cha
in.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/server_interceptors.go:53\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.co
m/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServ
er.func1.1.1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgo
ogle.golang.org/grpc@v1.53.0/server.go:1627\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/grpc@v1.53.0/server.go:1708\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\tgoogle.golang.org/grpc@v1.53.0/server.go:965"}

@sayan-biswas
Copy link
Contributor

@iainsproat Which log type did you set in the config? File or S3
Can you share the log message from UpdateLog method?

@sayan-biswas
Copy link
Contributor

Also logs are not enabled by default. Although the API can be enabled from the config, separate configuration are required. For File type, PVC need to be deployed and for S3 type, the rest of the S3 related parameter in the config needs to be set.
There is a PR merged recently which makes this easier through kustomize.
https://github.com/tektoncd/results/blob/main/config/README.md#deploying-with-a-kustomization

@iainsproat
Copy link
Author

iainsproat commented Jun 27, 2023

Thanks for responding @sayan-biswas

Your suggestion about looking at the logs for the UpdateLog method pointed me in the right location. I can see that there appears to be a race condition in Tekton Results Watcher between the deletion of the Task Run by Tekton Results Watcher after it has completed and the uploading of logs by the Watcher.

I am running Tekton results watcher with the following flag: -completed_run_grace_period -1s. It has a negative value to immediately 'garbage collect' the completed TaskRuns & PipelineRuns, as suggested by the documentation.

In the Tekton Watcher logs I can see that garbage collection occurring successfully:

{"level":"info","time":"2023-06-27T10:21:24.301Z","logger":"watcher","caller":"dynamic/dynamic.go:216","msg":"Deleting object","commit":"26df57c-dirty","knative.dev/traceid":"e55448ea-474f-449f-8f5e-6b1800c70bb5","knative.dev/key":"789b32327d/hello-task-run","results.tekton.dev/kind":"TaskRun","results.tekton.dev/result":"789b32327d/results/24f8d7c5-0caa-4999-bd4c-52ffedcf6f43","results.tekton.dev/record":"789b32327d/results/24f8d7c5-0caa-4999-bd4c-52ffedcf6f43/records/24f8d7c5-0caa-4999-bd4c-52ffedcf6f43","results.tekton.dev/gracePeriod":"-1s","results.tekton.dev/uid":"24f8d7c5-0caa-4999-bd4c-52ffedcf6f43","results.tekton.dev/time-taken-seconds":0}

However, some time later Tekton Watcher then logs the following failure:

{"level":"error","time":"2023-06-27T10:21:24.320Z","logger":"watcher","caller":"dynamic/dynamic.go:319","msg":"Error streaming log","commit":"26df57c-dirty","knative.dev/traceid":"e55448ea-474f-449f-8f5e-6b1800c70bb5","knative.dev/key":"789b32327d/hello-task-run","results.tekton.dev/kind":"TaskRun","namespace":"789b32327d","kind":"TaskRun","name":"hello-task-run","error":"error reading from tkn reader: Unable to get TaskRun: taskruns.tekton.dev \"hello-task-run\" not found","stacktrace":"github.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).sendLog.func1\n\tgithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic/dynamic.go:319"}

As it is an intermittent issue and not consistent I am confident that S3 is configured correctly. I can query the blob storage provider directly and see the data for other Tekton Task and Pipeline Runs (which presumably won the race condition!) have been stored, and can be queried via the API.

My expectation is that the garbage collection of completed TaskRuns/PipelineRuns would wait until all the logs are confirmed as having been successfully stored before proceeding to delete the completed TaskRun or PipelineRun.

In the meantime, as a workaround, I'll increase the value of -completed_run_grace_period flag to be a larger positive number to allow sufficient time for the logging to completed prior to garbage collection.

I've attached the entire logs output for both the API and Watcher. During this logged period I started Tekton Results, applied a Task, and then applied a TaskRun.
tekton-watcher-logs.txt
tekton-api-logs.txt

@sayan-biswas
Copy link
Contributor

sayan-biswas commented Jun 27, 2023

@iainsproat Yes, this happens because the log streaming from the watcher to the API server only starts after the PipelineRun/Taskrun moves to completed of failed status. If the pod gets deleted from an aggressive pruning such as immediately then it creates a race condition.
Not sure about what can be done about fetching the logs, but the pruning conditions can be modified to wait for the streaming to complete even if it's set to -1s .

@sayan-biswas
Copy link
Contributor

/assign

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@ramessesii2
Copy link
Member

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
4 participants