Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task: adding local deploy targets #6

Merged
merged 4 commits into from Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
39 changes: 27 additions & 12 deletions README.md
@@ -1,26 +1,41 @@
# Decisive Engine deployment
# Decisive Engine deployment

сreates an AWS CDK stack that deploys an EKS cluster with the following components:

- AWS Cert-Manager for managing TLS certificates
- OpenTelemetry Operator for deploying the OpenTelemetry Collector
- Prometheus for monitoring the cluster
- MyDecisive API and MyDecisive Engine UI, which are the main components of the Decisive Engine

## Deployment steps

* `make config` configures aws client, cdk stack and Otel CR
* `make bootstrap` bootstraps cdk stack deployment
* `make install` runs cdk stack deployment
* `make clean` cleans up environment, removes log files
- `make config` configures aws client, cdk stack and Otel CR
- `make bootstrap` bootstraps cdk stack deployment
- `make install` runs cdk stack deployment
- `make clean` cleans up environment, removes log files

## Destroy stack
* `cdk destroy`

- `cdk destroy`

## Useful commands

* `npm run build` compile typescript to js
* `npm run watch` watch for changes and compile
* `npm run test` perform the jest unit tests
* `cdk deploy` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk synth` emits the synthesized CloudFormation template
- `npm run build` compile typescript to js
- `npm run watch` watch for changes and compile
- `npm run test` perform the jest unit tests
- `cdk deploy` deploy this stack to your default AWS account/region
- `cdk diff` compare deployed stack with current state
- `cdk synth` emits the synthesized CloudFormation template

## Local Deployment commands

- `make -f ./make/Makefile-local-recipes create-mdai` deploy a local cluster from scratch
- `make -f ./make/Makefile-local-recipes delete-mdai` deletes mdai cluster deployed locally and all artifacts associated
- `make -f ./make/Makefile-local-recipes delete-mdai-all` deletes mdai cluster deployed locally and all artifacts associated, plus helm charts
- `make -f ./make/Makefile-local-recipes update-mdai-collector` updates mdai the collector to the latest configuration

_Make sure to update your `.bashrc` or `.zshrc` file with the following:_

```@bash
export GOBIN=${GOBIN:-$(go env GOPATH)/bin}
```
2 changes: 1 addition & 1 deletion jobs/cronjob-logs.yaml
Expand Up @@ -3,7 +3,7 @@ kind: CronJob
metadata:
name: telemetrygen-logs
spec:
# Change this to the frequency you desire -- currently it's set to 5 minutes
# Change this to the frequency you desire -- currently it's set to 1 minute
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
Expand Down
2 changes: 1 addition & 1 deletion jobs/cronjob-metrics.yaml
Expand Up @@ -3,9 +3,9 @@ kind: CronJob
metadata:
name: telemetrygen-metrics
spec:
# Change this to the frequency you desire -- currently it's set to 1 minute
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this was removed

failedJobsHistoryLimit: 1
jobTemplate:
spec:
Expand Down
1 change: 1 addition & 0 deletions jobs/cronjob-traces.yaml
Expand Up @@ -3,6 +3,7 @@ kind: CronJob
metadata:
name: telemetrygen-traces
spec:
# Change this to the frequency you desire -- currently it's set to 5 minutes
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
Expand Down
80 changes: 80 additions & 0 deletions make/Makefile-local-install
@@ -0,0 +1,80 @@
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~ LOCAL DIST INSTALL SCRIPTS ~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# ~~~~~~~~~~~ INSTALL DEPENDENCY SCRIPT ~~~~~~~~~~~~~~

# Install system requirements for MDAI
# .SILENT: mdai-system-reqs
.PHONY: mdai-system-reqs
mdai-system-reqs:
@echo "🟢 Start mdai-system-reqs..."
@go version || brew list go || brew install go \
kubectl version || brew list kubectl || brew install kubectl \
npm -v || brew list npm || brew install npm \
cdk version || brew list aws-cdk || brew install aws-cdk \
docker -v || brew list --cask docker || brew install --cask docker \
docker pull otel/opentelemetry-collector:0.95.0 \
helm version || brew list helm || brew install helm \
kind version || brew list kind || brew install kind
@echo "✅ Complete mdai-system-reqs!"

# Add helm chart repos available on system / machine
# .SILENT: mdai-add-helm-charts
.PHONY: mdai-add-helm-charts
mdai-add-helm-charts:
@echo "🟢 Start mdai-add-helm-charts..."
@helm repo list | grep prometheus-community || helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
@helm repo list | grep open-telemetry || helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
@helm repo list | grep mydecisive || helm repo add mydecisive https://decisiveai.github.io/mdai-helm-charts
@echo "✅ Complete mdai-add-helm-charts!"

# Install helm charts to cluster
.SILENT: mdai-install-helm-charts
.PHONY: mdai-install-helm-charts
mdai-install-helm-charts:
@echo "🟢 Start mdai-install-helm-charts..."
helm repo update
helm upgrade -f ./templates/prometheus-values.yaml prometheus prometheus-community/prometheus --install --wait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed a file path here, changing prometheus_values.yaml to prometheus-values.yaml when I did my tabs fixes. However I think maybe the values in that file actually might break the metrics scraping 😬 Mine no longer seem to work, but that could be specific to my setup

helm upgrade opentelemetry-operator open-telemetry/opentelemetry-operator --install --set admissionWebhooks.certManager.enabled=false --set admissionWebhooks.certManager.autoGenerateCert=true --wait
helm upgrade mdai-api mydecisive/mdai-api --version 0.0.3 --install
helm upgrade mdai-console mydecisive/mdai-console --version 0.0.6 --install
@echo "✅ Complete mdai-install-helm-charts!"

# ~~~~~~~~~~~~~~~~~ CREATE ACTION RULES ~~~~~~~~~~~~~~~~~

# Creates a cluster for an MDAI engine to be created
.SILENT: create-mdai-cluster
.PHONY: create-mdai-cluster
create-mdai-cluster:
@kind get clusters | grep -q mdai-local || kind create cluster --name mdai-local
kubectl cluster-info --context kind-mdai-local

# Creates a cluster for an MDAI engine to be created
.SILENT: mdai-install-cluster-dependencies
.PHONY: mdai-install-cluster-dependencies
mdai-install-cluster-dependencies: \
mdai-system-reqs mdai-add-helm-charts mdai-install-helm-charts

# wait for required pods to be installed
.SILENT: mdai-wait-for-pods
.PHONY: mdai-wait-for-pods
mdai-wait-for-pods:
@echo "🟢 Start mdai-wait-for-pods..."
kubectl -n default wait --for condition=ready pod -l app.kubernetes.io/name=opentelemetry-operator
kubectl -n default wait --for condition=ready pod -l app.kubernetes.io/component=controller-manager
@echo "✅ Complete mdai-wait-for-pods!"

# deploy collector config
.SILENT: mdai-deploy-config
.PHONY: mdai-deploy-config
mdai-deploy-config:
kubectl apply -f ./templates/otel-collector.yaml

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~ LOCAL INSTALL RECIPE ~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.SILENT: install-all
.PHONY: install-all
install-all: create-mdai-cluster mdai-install-cluster-dependencies mdai-wait-for-pods mdai-deploy-config
29 changes: 29 additions & 0 deletions make/Makefile-local-recipes
@@ -0,0 +1,29 @@
# ~~~~~~~~~~~~~~~~~ RECIPES ~~~~~~~~~~~~~~~~~

.SILENT: create-mdai
.PHONY: create-mdai
create-mdai:
@echo "🏁 Create MDAI Engine started..."
@time $(MAKE) -f ./make/Makefile-local-install install-all
@echo "🐙 Create MDAI Engine completed successfully!"

.SILENT: delete-mdai
.PHONY: delete-mdai
delete-mdai:
@echo "🐙 Destroy MDAI Engine started..."
@$(MAKE) -f ./make/Makefile-local-uninstall uninstall-all
@echo "🪦 Destroy MDAI Engine completed successfully!"

.SILENT: delete-mdai-all
.PHONY: delete-mdai-all
delete-mdai-all:
@echo "🐙 Destroy MDAI Engine started..."
@$(MAKE) -f ./make/Makefile-local-uninstall uninstall-all-artifacts
@echo "🪦 Destroy MDAI Engine completed successfully!"

.SILENT: update-mdai-collector
.PHONY: update-mdai-collector
update-mdai-collector:
@echo "🐙 Update started"
kubectl apply -f ./templates/otel-collector.yaml
@echo "🐙 Update done"
55 changes: 55 additions & 0 deletions make/Makefile-local-uninstall
@@ -0,0 +1,55 @@
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~ LOCAL DIST UNINSTALL SCRIPTS ~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# delete collector config
.SILENT: delete-mdai-config
.PHONY: delete-mdai-config
delete-mdai-config:
kubectl delete -f./templates/otel-collector.yaml --ignore-not-found=true

# delete cluster
.SILENT: delete-mdai-cluster
.PHONY: delete-mdai-cluster
delete-mdai-cluster:
kind delete cluster --name mdai-local

# uninstall helm releases
.SILENT: uninstall-helm-releases
.PHONY: uninstall-helm-releases
uninstall-helm-releases:
helm uninstall mdai-console mdai-api prometheus opentelemetry-operator --ignore-not-found

# uninstall helm chart repos
.SILENT: uninstall-helm-artifacts
.PHONY: uninstall-helm-artifacts
uninstall-helm-artifacts:
# todo: need to figure out how to try/catch and move on
@echo "Executing uninstall-helm-artifact..."
helm repo remove open-telemetry
helm repo remove prometheus-community
helm repo remove mydecisive
@echo "Successfully ran uninstall-helm-artifact!"

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~ LOCAL UNINSTALL RECIPIE ~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# TODO: move this to a cluster management makefile
.SILENT: delete-config
.PHONY: delete-config
delete-config: delete-mdai-config

# TODO: evaluate if command is needed
.SILENT: uninstall-helm-repos
.PHONY: uninstall-helm-repos
uninstall-helm-repos: uninstall-helm-artifacts


.SILENT: uninstall-all
.PHONY: uninstall-all
uninstall-all: uninstall-helm-releases | delete-mdai-cluster

.SILENT: uninstall-all-artifacts
.PHONY: uninstall-all-artifacts
uninstall-all-artifacts: uninstall-helm-releases uninstall-helm-artifacts delete-mdai-cluster
51 changes: 51 additions & 0 deletions templates/otel-collector.yaml
@@ -0,0 +1,51 @@
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: gateway
spec:
ports:
- name: promexporter
port: 9464
protocol: TCP
- name: metrics1
port: 8888
protocol: TCP
- name: metrics2
port: 4317
config: |
receivers:
otlp:
protocols:
grpc:
http:
extensions:
zpages:
endpoint: 0.0.0.0:55679
processors:
batch:
send_batch_size: 1024
send_batch_max_size: 1500
timeout: 0s
exporters:
debug:
# verbosity: detailed
prometheus:
endpoint: "0.0.0.0:9464"
resource_to_telemetry_conversion:
enabled: true
enable_open_metrics: true
service:
extensions: [zpages]
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug]
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug]
9 changes: 6 additions & 3 deletions templates/prometheus-values.yaml
Expand Up @@ -39,12 +39,15 @@ serverFiles:
- job_name: otel-collector
honor_labels: true
tls_config:
insecure_skip_verify: true
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name, __meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_pod_container_port_number]
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component, __meta_kubernetes_pod_annotation_prometheus_io_scrape]
separator: ;
regex: test-collector-collector;8888;true;8888
regex: opentelemetry-collector;true
action: keep
- source_labels: [__address__]
regex: '.*:(431[78])'
action: drop