Skip to content

Commit

Permalink
section 5 content updated
Browse files Browse the repository at this point in the history
  • Loading branch information
Anusha Narapureddy authored and Anusha Narapureddy committed Mar 11, 2024
1 parent 2582c4e commit 3be8dd9
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 38 deletions.
2 changes: 1 addition & 1 deletion 01-welcome-setup.md
Expand Up @@ -49,7 +49,7 @@ workshop-control-plane Ready control-plane 75s v1.27.3
```

### Cleanup

k de
```bash
kind delete cluster --name=workshop
```
Expand Down
2 changes: 1 addition & 1 deletion 03-auto-instrumentation.md
Expand Up @@ -121,7 +121,7 @@ The `Instrumentation` CR does not instrument the workloads. The instrumentation

```bash
kubectl patch deployment frontend-deployment -n tutorial-application -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-sdk":"true"}}}} }'
kubectl patch deployment backend1-deployment -n tutorial-application -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-python":"true"}}}} }'
kubectl patch deployment frontend-deployment -n tutorial-application -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-sdk":"true"}}}} }'
kubectl patch deployment backend2-deployment -n tutorial-application -p '{"spec": {"template":{"metadata":{"annotations":{"instrumentation.opentelemetry.io/inject-java":"true"}}}} }'
kubectl get pods -n tutorial-application -w
...
Expand Down
41 changes: 22 additions & 19 deletions 05-sampling.md
Expand Up @@ -10,17 +10,10 @@ This tutorial step covers the basic usage of the OpenTelemetry Collector on Kube

## Sampling, what does it mean and why is it important?

Sampling refers to the practice of selectively capturing and recording traces of requests flowing through a distributed system, rather than capturing every single request. It is crucial in distributed tracing systems because modern distributed applications often generate a massive volume of requests and transactions, which can overwhelm the tracing infrastructure or lead to excessive storage costs if every request is
traced in detail.
Sampling refers to the practice of selectively capturing and recording traces of requests flowing through a distributed system, rather than capturing every single request. It is crucial in distributed tracing systems because modern distributed applications often generate a massive volume of requests and transactions, which can overwhelm the tracing infrastructure or lead to excessive storage costs if every request is traced in detail.

For example, a medium sized setup producing ~1M traces per minute can result in a cost of approximately $250,000 per month. (Note that this depends on your infrastructure costs, the SaaS provider you choose, the amount of metadata, etc.)

To get a better feel for the cost, you may want to play with some SaaS cost calculators.

- TODO
- TODO
- TODO

For more details, check the [offical documentation](https://opentelemetry.io/docs/concepts/sampling/).

### How can we now reduce the number of traces?
Expand All @@ -40,19 +33,23 @@ Update the sampling % in the Instrumentation CR and restart the deployment for t
https://github.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/blob/d4b917c1cc4a411f59ae5dd770b22de1de9f6020/app/instrumentation-head-sampling.yaml#L13-L15

```yaml
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/app/instrumentation-head-sampling.yaml
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/main/app/instrumentation-head-sampling.yaml
kubectl rollout restart deploy -n tutorial-application
kubectl get pods -w -n tutorial-application
```

<details>
<summary>Jaeger's Remote Sampling extension</summary>

TODO:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/jaegerremotesampling/README.md
See the pod spec for one of the deployment:

```bash
kubectl describe pod backend2-deployment-64ddcc76fd-w85zh -n tutorial-application
```

https://opentelemetry.io/docs/languages/sdk-configuration/general/#otel_traces_sampler_arg

</details>
```diff
Environment:
OTEL_TRACES_SAMPLER: parentbased_traceidratio
- OTEL_TRACES_SAMPLER_ARG: 1
+ OTEL_TRACES_SAMPLER_ARG: 0.5
```

### Tailbased Sampling

Expand All @@ -61,7 +58,7 @@ Tail sampling is where the decision to sample a trace takes place by considering
Deploy the opentelemetry collector with `tail_sampling` enabled.

```shell
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/backend/05-tail-sampling-collector.yaml
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/main/backend/05-tail-sampling-collector.yaml
kubectl get pods -n observability-backend -w
```

Expand Down Expand Up @@ -103,10 +100,16 @@ Requires two deployments of the Collector, the first layer routing all the spans
Apply the YAML below to deploy a layer of Collectors containing the load-balancing exporter in front of collectors performing tail-sampling:

```shell
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/backend/05-scale-otel-collectors.yaml
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/main/backend/05-scale-otel-collectors.yaml
kubectl get pods -n observability-backend -w
```

<TODO: Add screenshot>

### Advanced Topic: Jaeger's Remote Sampling extension

TODO:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/jaegerremotesampling/README.md


[Next steps](./06-RED-metrics.md)
10 changes: 8 additions & 2 deletions app/instrumentation-head-sampling.yaml
Expand Up @@ -24,5 +24,11 @@ spec:
value: http://otel-collector.observability-backend.svc.cluster.local:4318
java:
env:
- name: OTEL_LOGS_EXPORTER
value: otlp
- name: OTEL_INSTRUMENTATION_METHODS_INCLUDE
value: io.opentelemetry.dice.DiceApplication[main];
- name: OTEL_INSTRUMENTATION_HTTP_SERVER_CAPTURE_RESPONSE_HEADERS
value: Content-Type,Date
# - name: OTEL_INSTRUMENTATION_TOMCAT_ENABLED
# value: "false"
# - name: OTEL_INSTRUMENTATION_SERVLET_ENABLED
# value: "false"
18 changes: 3 additions & 15 deletions backend/05-scale-otel-collectors.yaml
Expand Up @@ -21,20 +21,8 @@ spec:
endpoint: 0.0.0.0:4318
processors:
# Sample 100% of traces with ERROR-ing spans (omit traces with all OK spans)
# and traces which have a duration longer than 500ms
tail_sampling:
decision_wait: 10s # time to wait before making a sampling decision is made
num_traces: 100 # number of traces to be kept in memory
expected_new_traces_per_sec: 10 # expected rate of new traces per second
policies:
- name: keep-errors
type: status_code
status_code: {status_codes: [ERROR]}
- name: keep-slow-traces
type: latency
latency: {threshold_ms: 500}
batch:
exporters:
debug:
verbosity: detailed
Expand Down Expand Up @@ -62,7 +50,7 @@ spec:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
processors: [batch]
exporters: [otlp/traces]
metrics:
receivers: [otlp]
Expand Down

0 comments on commit 3be8dd9

Please sign in to comment.