Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Trace IDs to log lines #624

Draft
wants to merge 4 commits into
base: release-v1.0.4
Choose a base branch
from

Conversation

shulin-sq
Copy link
Contributor

@shulin-sq shulin-sq commented Apr 24, 2024

What type of PR is this?

Which issue does this PR fix:
#603

related conversation: uber-go/zap#654

What does this PR do / Why do we need it:

If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:

Testing done on this change:

running in a development environment:

{"level":"info","ts":"2024-04-24T02:23:58.515Z","logger":"controller.route","caller":"controllers/route_controller.go:157","msg":"reconcile starting","type":"route","name":"lattice-redacted-srv-http","trace_id":"bc67d345-3b33-499a-9e3e-329e4dff92b9"}
{"level":"info","ts":"2024-04-24T02:23:58.515Z","logger":"controller.route","caller":"controllers/route_controller.go:277","msg":"Found aws-vpc-lattice for Route for lattice-redacted-srv-http, redacted-srv","trace_id":"bc67d345-3b33-499a-9e3e-329e4dff92b9"}
{"level":"info","ts":"2024-04-24T02:23:58.515Z","logger":"controller.route","caller":"controllers/route_controller.go:323","msg":"reconcile, adding or updating","name":"lattice-redacted-srv-http","trace_id":"bc67d345-3b33-499a-9e3e-329e4dff92b9"}
{"level":"info","ts":"2024-04-24T02:23:58.524Z","logger":"controller.iam-auth-policy","caller":"controllers/iamauthpolicy_controller.go:85","msg":"reconcile starting","type":"iamauthpolicy","name":"lattice-redacted-srv-http","trace_id":"ccdf0c8a-6fd0-43d6-86d4-5e4840e4a28c"}
{"level":"info","ts":"2024-04-24T02:23:58.524Z","logger":"controller.iam-auth-policy","caller":"controllers/iamauthpolicy_controller.go:96","msg":"reconcile IAM policy","req":"redacted-srv/lattice-redacted-srv-http","targetRef":{"group":"gateway.networking.k8s.io","kind":"HTTPRoute","name":"lattice-redacted-srv-http"},"tra
ce_id":"ccdf0c8a-6fd0-43d6-86d4-5e4840e4a28c"}
{"level":"info","ts":"2024-04-24T02:23:58.525Z","logger":"controller.route","caller":"gateway/model_build_lattice_service.go:122","msg":"Setting customer-domain-name: redacted-srv.global.dev.sq.lattice for route lattice-redacted-srv-http-redacted-srv","trace_id":"bc67d345-3b33-499a-9e3e-329e4dff92b9"}
{"level":"info","ts":"2024-04-24T02:24:11.292Z","logger":"controller.iam-auth-policy","caller":"controllers/iamauthpolicy_controller.go:114","msg":"reconciled IAM policy","req":"redacted-srv/lattice-redacted-srv-http","targetRef":{"group":"gateway.networking.k8s.io","kind":"HTTPRoute","name":"lattice-redacted-srv-http"},"i
sDeleted":false,"trace_id":"ccdf0c8a-6fd0-43d6-86d4-5e4840e4a28c"}
{"level":"info","ts":"2024-04-24T02:24:11.292Z","logger":"controller.iam-auth-policy","caller":"controllers/iamauthpolicy_controller.go:87","msg":"reconcile completed","name":"lattice-redacted-srv-http","type":"iamauthpolicy","trace_id":"ccdf0c8a-6fd0-43d6-86d4-5e4840e4a28c"}

Automation added to e2e:

Will this PR introduce any new dependencies?:

Will this break upgrades or downgrades. Has updating a running cluster been tested?:

Does this PR introduce any user-facing change?:


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Comment on lines +106 to +113
ctx = gwlog.NewTrace(ctx)
gwlog.AddMetadata(ctx, "type", "accesslogpolicy")
gwlog.AddMetadata(ctx, "name", req.Name)

r.log.Infow(ctx, "reconcile starting", gwlog.GetMetadata(ctx)...)
defer func() {
r.log.Infow(ctx, "reconcile completed", gwlog.GetMetadata(ctx)...)
}()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's an example of how I attempted to implement somehting like https://stripe.com/blog/canonical-log-lines

at the end of the reconcile event we can print out all the metadata saved in the context so far. eg this can include information like if issues occurred during the reconcile attempt that were ignorable, or decisions made while servicing the event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this being super useful.

@@ -265,12 +273,12 @@ func (r *accessLogPolicyReconciler) buildAndDeployModel(
if err != nil {
return nil, err
}
r.log.Debugw("Successfully built model", "stack", jsonStack)
r.log.Debugw(context.TODO(), "Successfully built model", "stack", jsonStack)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for log lines that currently do not have a context in the scope, I added a todo for now

@@ -38,7 +38,7 @@ var ZeroTransitionTime = metav1.NewTime(time.Time{})
func (h *enqueueRequestsForGatewayEvent) Create(ctx context.Context, e event.CreateEvent, queue workqueue.RateLimitingInterface) {
gwNew := e.Object.(*gateway_api.Gateway)

h.log.Infof("Received Create event for Gateway %s-%s", gwNew.Name, gwNew.Namespace)
h.log.Infof(ctx, "Received Create event for Gateway %s-%s", gwNew.Name, gwNew.Namespace)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can also be effectively todo's. tracing starts in the controller. I made this decision since I couldn't figure out how to push the context into the reconcile queue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only way we could maybe do that is by creating our own implementation of a NamespacedName that also includes the origin trace ID (generated here). Then we could push that object into the queue, cast it, and pull it out during Reconcile. We'd probably want to create a new trace ID for each reconcile operation anyway. Probably not something we need to solve right now.

"log"
"os"

"go.uber.org/zap"
"go.uber.org/zap/zapcore"
)

type Logger = *zap.SugaredLogger
type TracedLogger struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is where most of the changes are

@@ -0,0 +1,65 @@
package gwlog
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is how I add the metadata fields into the context

Copy link
Contributor

@erikfuller erikfuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. It's nicely encapsulated and easy to understand, which takes the pressure off getting it perfect round one.

I don't have any major concerns with the approach, just added a couple comments/questions.


func NewTrace(ctx context.Context) context.Context {
currID := uuid.New()
return context.WithValue(context.WithValue(ctx, traceID, currID.String()), metadata, newMetadata())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we'd have to recurse one level each time we wanted to get to the trace ID. Would it be more performant to have a struct with the metadata + the traceID? Alternatively, should the traceID just be part of the metadata? Or is there maybe some other benefit to nesting them like this I'm not seeing?

@@ -38,7 +38,7 @@ var ZeroTransitionTime = metav1.NewTime(time.Time{})
func (h *enqueueRequestsForGatewayEvent) Create(ctx context.Context, e event.CreateEvent, queue workqueue.RateLimitingInterface) {
gwNew := e.Object.(*gateway_api.Gateway)

h.log.Infof("Received Create event for Gateway %s-%s", gwNew.Name, gwNew.Namespace)
h.log.Infof(ctx, "Received Create event for Gateway %s-%s", gwNew.Name, gwNew.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only way we could maybe do that is by creating our own implementation of a NamespacedName that also includes the origin trace ID (generated here). Then we could push that object into the queue, cast it, and pull it out during Reconcile. We'd probably want to create a new trace ID for each reconcile operation anyway. Probably not something we need to solve right now.

Comment on lines +106 to +113
ctx = gwlog.NewTrace(ctx)
gwlog.AddMetadata(ctx, "type", "accesslogpolicy")
gwlog.AddMetadata(ctx, "name", req.Name)

r.log.Infow(ctx, "reconcile starting", gwlog.GetMetadata(ctx)...)
defer func() {
r.log.Infow(ctx, "reconcile completed", gwlog.GetMetadata(ctx)...)
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this being super useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants