Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support monolithic deployment mode #722

Merged

Conversation

andreasgerstmayr
Copy link
Collaborator

@andreasgerstmayr andreasgerstmayr commented Dec 21, 2023

Support Tempo monolithic deployment mode with a new TempoMonolithic CR.
Partially resolves #710.

  • In-memory storage
  • PV storage
  • Ingestion OTLP/gRPC
  • Ingestion OTLP/HTTP
  • Jaeger UI
  • Overlay config
  • Management State
  • Tests

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2023

Codecov Report

Attention: 207 lines in your changes are missing coverage. Please review.

Comparison is base (2d4c168) 77.76% compared to head (c5eba0d) 76.78%.

Files Patch % Lines
controllers/tempo/tempomonolithic_controller.go 29.89% 58 Missing and 10 partials ⚠️
internal/manifests/monolithic/statefulset.go 77.13% 47 Missing and 4 partials ⚠️
apis/tempo/v1alpha1/tempomonolithic_webhook.go 56.89% 25 Missing ⚠️
controllers/tempo/common.go 61.90% 17 Missing and 7 partials ⚠️
internal/manifests/monolithic/configmap.go 71.42% 17 Missing and 3 partials ⚠️
internal/manifests/mutate.go 63.63% 6 Missing and 2 partials ⚠️
internal/manifests/monolithic/build.go 70.00% 4 Missing and 2 partials ⚠️
controllers/tempo/tempostack_create_or_update.go 71.42% 1 Missing and 1 partial ⚠️
internal/manifests/config/configmap.go 60.00% 2 Missing ⚠️
controllers/tempo/tempostack_controller.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #722      +/-   ##
==========================================
- Coverage   77.76%   76.78%   -0.99%     
==========================================
  Files          68       77       +9     
  Lines        5155     5733     +578     
==========================================
+ Hits         4009     4402     +393     
- Misses        949     1110     +161     
- Partials      197      221      +24     
Flag Coverage Δ
unittests 76.78% <68.68%> (-0.99%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
…ngle pod

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
@andreasgerstmayr andreasgerstmayr marked this pull request as ready for review January 11, 2024 14:05
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
.env Outdated Show resolved Hide resolved
}

// MonolithicObservabilityMetricsSpec defines the metrics settings of the Tempo deployment.
type MonolithicObservabilityMetricsSpec struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we reuse smth from the microservice type?

e.g. the whole observability spec

type ObservabilitySpec struct {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be inconsistent with the

<feature>:
  enabled: true

style of the CR. I'd prefer to migrate the TempoStack, maybe in the next CRD version? Shouldn't be too difficult to create a conversion webhook for this.

spec:                                    # TempoMonolithicSpec defines the desired state of TempoMonolithic.
  observability:                         # Observability defines observability configuration for the Tempo deployment
    metrics:                             # Metrics defines the metrics configuration of the Tempo deployment
      prometheusRules:                   # ServiceMonitors defines the PrometheusRule configuration
        enabled: false                   # Enabled defines if the operator should create PrometheusRules for this Tempo deployment
      serviceMonitors:                   # ServiceMonitors defines the ServiceMonitor configuration
        enabled: false                   # Enabled defines if the operator should create ServiceMonitors for this Tempo deployment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we agreed on using the enabled field which is better supported in tools like kustomize and kubectl edit (the empty structs are removed). Are we going to reuse some parts from the monolithic APIs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can reuse a TLS struct, the multitenancy structs, ManagementStateType, LimitSpec and the storage secret.


// Default implements webhook.Defaulter so a webhook will be registered for the type.
func (r *TempoMonolithic) Default() {
log := ctrl.Log.WithName("tempomonolithic-webhook")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this rendered in the logs? Isn't it too long?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll print this line:

{"level":"debug","ts":"2024-01-15T19:07:08.746058013+01:00","logger":"tempomonolithic-webhook","msg":"running defaulter webhook","name":"sample"}

if debug logs are enabled (go run ./main.go --zap-log-level=debug start).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as we'll set the defaults in the reconcile loop now, I'm removing this log statement.

I'll keep it there for the validating webhook, which is still in use.

apis/tempo/v1alpha1/tempomonolithic_types.go Show resolved Hide resolved
apis/tempo/v1alpha1/tempomonolithic_types.go Outdated Show resolved Hide resolved
apis/tempo/v1alpha1/tempomonolithic_types.go Show resolved Hide resolved
docs/operator/api.md Outdated Show resolved Hide resolved
docs/spec/tempomonolithic.yaml Outdated Show resolved Hide resolved
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>

// MonolithicTracesStorageSpec defines the traces storage for the Tempo deployment.
type MonolithicTracesStorageSpec struct {
// Backend defines the backend for storing traces. Default: memory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned about using the in-memory as default.

The upstream uses PV as default and the in-memory can be easily overlooked and use significant resources in the cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're only concerned about OOM situations, the memory counts towards the container resource limit:

While tmpfs is very fast be aware that, unlike disks, files you write count against the memory limit of the container that wrote them

https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

I'd prefer to keep memory as default, as it's great for quick testing/demos/showcases (and is the default for jaeger all-in-one), and changing it to pv is easy (and can/should be mentioned in the docs).

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
// Observability defines observability configuration for the Tempo deployment
//
// +kubebuilder:validation:Optional
Observability *MonolithicObservabilitySpec `json:"observability,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking again, can we reuse some of the structs from the microservies type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, question is do we value consistency inside the same CR or consistency between the two CRs more?

spec:
  observability:                         # Observability defines observability configuration for the Tempo deployment
    metrics:                             # Metrics defines the metrics configuration of the Tempo deployment
      prometheusRules:                   # ServiceMonitors defines the PrometheusRule configuration
        enabled: false                   # Enabled defines if the operator should create PrometheusRules for this Tempo deployment
      serviceMonitors:                   # ServiceMonitors defines the ServiceMonitor configuration
        enabled: false

vs

spec:
  observability:                         # ObservabilitySpec defines how telemetry data gets handled.
    grafana:                             # Grafana defines the Grafana configuration for operands.
      createDatasource: false            # CreateDatasource specifies if a Grafana Datasource should be created for Tempo.
      instanceSelector:                  # InstanceSelector specifies the Grafana instance where the datasource should be created.
    metrics:                             # Metrics defines the metrics configuration for operands.
      createPrometheusRules: false       # CreatePrometheusRules specifies if Prometheus rules for alerts should be created for Tempo components.
      createServiceMonitors: false       # CreateServiceMonitors specifies if ServiceMonitors should be created for Tempo components.
    tracing:                             # Tracing defines a config for operands.
      jaeger_agent_endpoint: "localhost:6831" # JaegerAgentEndpoint defines the jaeger endpoint data gets send to.
      sampling_fraction: ""

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first example is consistent with the rest of the Monolithic CR, and allows additional settings for prometheusRules or serviceMonitors in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer consistency on the same CRD, just one question, in the future is possible to have some sort of consolidation in order to get both? That will imply a breaking change though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking of maybe doing this change in v1alpha2 of TempoStack if we have a consensus.

apis/tempo/v1alpha1/tempomonolithic_types.go Outdated Show resolved Hide resolved
// Ingress defines the ingress configuration for Jaeger UI
//
// +kubebuilder:validation:Optional
Ingress *MonolithicJaegerUIIngressSpec `json:"ingress,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be the ingress spec reused from the microservices? There are more settings users might want to configure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already ready in #755:

spec:
  jaegerui:                              # JaegerUI defines the Jaeger UI configuration
    enabled: false                       # Enabled defines if the Jaeger UI should be enabled
    ingress:                             # Ingress defines the ingress configuration for Jaeger UI
      annotations:                       # Annotations defines the annotations of the Ingress object.
        "key": ""
      enabled: false                     # Enabled defines if an Ingress object should be created for Jaeger UI
      host: ""                           # Host defines the hostname of the Ingress object.
      ingressClassName: ""               # IngressClassName is the name of an IngressClass cluster resource. Ingress controller implementations use this field to know whether they should be serving this Ingress resource.
    route:                               # Route defines the route configuration for Jaeger UI
      annotations:                       # Annotations defines the annotations of the Ingress object.
        "key": ""
      enabled: false                     # Enabled defines if a Route object should be created for Jaeger UI
      host: ""                           # Host defines the hostname of the Ingress object.
      ingressClassName: ""               # IngressClassName is the name of an IngressClass cluster resource. Ingress controller implementations use this field to know whether they should be serving this Ingress resource.
      termination: "edge"                # Termination specifies the termination type. Default: edge.

vs spec of TempoStack:

      jaegerQuery:                       # JaegerQuerySpec defines Jaeger Query specific options.
        enabled: false                   # Enabled is used to define if Jaeger Query component should be created.
        ingress:                         # Ingress defines Jaeger Query Ingress options.
          annotations:                   # Annotations defines the annotations of the Ingress object.
            "key": ""
          host: ""                       # Host defines the hostname of the Ingress object.
          ingressClassName: ""           # IngressClassName is the name of an IngressClass cluster resource. Ingress controller implementations use this field to know whether they should be serving this Ingress resource.
          route:                         # Route defines OpenShift Route specific options.
            termination: ""              # Termination specifies the termination type. By default "edge" is used.
          type: ""                       # Type defines the type of Ingress for the Jaeger Query UI. Currently ingress, route and none are supported.

(ingressClassName should have been under ingress, as it doesn't apply to route)

Do we value we value consistency inside the same CR or consistency between the two CRs more?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, I'd prefer consistency on the same CR

}

func (m *ImmutableErr) Error() string {
return fmt.Sprintf("update to immutable field %s is forbidden", m.field)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it print the existing and desired as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did initially, but printing structs with fmt.Sprintf("%v", some_struct) is an unreadable mess if the struct is big.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we then remove fields that are not used in the error struct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the message to show the result of cmp.Diff() now - it's still a bit unreadable as it's a single line, but when replacing the \n and \t we see a nice diff in the logs, and we already had a dependency on this library anyway (https://github.com/google/go-cmp).

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
// ExtraConfig defines any extra (overlay) configuration for components
//
// +kubebuilder:validation:Optional
ExtraConfig *MonolithicExtraConfigSpec `json:"extraConfig,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not reusing the same as microservices? The only reason I think is because the microservices could include other configs in the future. If that is the reason I'm ok

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we worked on the same feature at the same time. I'll check if I can reuse the struct and logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR and reused the struct and logic from the TempoStack now.

}

// MonolithicStorageSpec defines the storage for the Tempo deployment.
type MonolithicStorageSpec struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, Why this have it's own spec and inside only have one structure? or why to not use MonolithicTracesStorageSpec directly. Is this for mimic the tempo configuration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's to mimic the tempo configuration. I thought maybe tempo has plans to store other things in the future, so I'll keep the same here also.

But I don't have very strong opinions on this, I could remove that extra layer if you like.

// OTLP defines the ingestion configuration for OTLP
//
// +kubebuilder:validation:Optional
OTLP *MonolithicIngestionOTLPSpec `json:"otlp,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be the only protocol supported?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make the gateway a drop-in feature, so the service ports should not change if gateway is enabled or not. So I can only support protocols which the gateway also supports.
Afaics the gateway only supports otlp/grpc and otlp/http, right?

With the general move to the OTEL SDK, and using the OTEL collector, I think it's fine to only support OTLP.

Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com>
@andreasgerstmayr andreasgerstmayr merged commit d8afdec into grafana:main Jan 26, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Tempo deployments in monolithic mode
4 participants