Skip to content

Commit

Permalink
Merge pull request #6 from DecisiveAI/local-deployment
Browse files Browse the repository at this point in the history
task: adding local deploy targets
  • Loading branch information
arcanez committed Mar 12, 2024
2 parents 2e57eb5 + 0609f34 commit edff27a
Show file tree
Hide file tree
Showing 9 changed files with 251 additions and 17 deletions.
39 changes: 27 additions & 12 deletions README.md
@@ -1,26 +1,41 @@
# Decisive Engine deployment
# Decisive Engine deployment

сreates an AWS CDK stack that deploys an EKS cluster with the following components:

- AWS Cert-Manager for managing TLS certificates
- OpenTelemetry Operator for deploying the OpenTelemetry Collector
- Prometheus for monitoring the cluster
- MyDecisive API and MyDecisive Engine UI, which are the main components of the Decisive Engine

## Deployment steps

* `make config` configures aws client, cdk stack and Otel CR
* `make bootstrap` bootstraps cdk stack deployment
* `make install` runs cdk stack deployment
* `make clean` cleans up environment, removes log files
- `make config` configures aws client, cdk stack and Otel CR
- `make bootstrap` bootstraps cdk stack deployment
- `make install` runs cdk stack deployment
- `make clean` cleans up environment, removes log files

## Destroy stack
* `cdk destroy`

- `cdk destroy`

## Useful commands

* `npm run build` compile typescript to js
* `npm run watch` watch for changes and compile
* `npm run test` perform the jest unit tests
* `cdk deploy` deploy this stack to your default AWS account/region
* `cdk diff` compare deployed stack with current state
* `cdk synth` emits the synthesized CloudFormation template
- `npm run build` compile typescript to js
- `npm run watch` watch for changes and compile
- `npm run test` perform the jest unit tests
- `cdk deploy` deploy this stack to your default AWS account/region
- `cdk diff` compare deployed stack with current state
- `cdk synth` emits the synthesized CloudFormation template

## Local Deployment commands

- `make -f ./make/Makefile-local-recipes create-mdai` deploy a local cluster from scratch
- `make -f ./make/Makefile-local-recipes delete-mdai` deletes mdai cluster deployed locally and all artifacts associated
- `make -f ./make/Makefile-local-recipes delete-mdai-all` deletes mdai cluster deployed locally and all artifacts associated, plus helm charts
- `make -f ./make/Makefile-local-recipes update-mdai-collector` updates mdai the collector to the latest configuration

_Make sure to update your `.bashrc` or `.zshrc` file with the following:_

```@bash
export GOBIN=${GOBIN:-$(go env GOPATH)/bin}
```
2 changes: 1 addition & 1 deletion jobs/cronjob-logs.yaml
Expand Up @@ -3,7 +3,7 @@ kind: CronJob
metadata:
name: telemetrygen-logs
spec:
# Change this to the frequency you desire -- currently it's set to 5 minutes
# Change this to the frequency you desire -- currently it's set to 1 minute
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
Expand Down
2 changes: 1 addition & 1 deletion jobs/cronjob-metrics.yaml
Expand Up @@ -3,9 +3,9 @@ kind: CronJob
metadata:
name: telemetrygen-metrics
spec:
# Change this to the frequency you desire -- currently it's set to 1 minute
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
Expand Down
1 change: 1 addition & 0 deletions jobs/cronjob-traces.yaml
Expand Up @@ -3,6 +3,7 @@ kind: CronJob
metadata:
name: telemetrygen-traces
spec:
# Change this to the frequency you desire -- currently it's set to 5 minutes
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
Expand Down
80 changes: 80 additions & 0 deletions make/Makefile-local-install
@@ -0,0 +1,80 @@
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~ LOCAL DIST INSTALL SCRIPTS ~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# ~~~~~~~~~~~ INSTALL DEPENDENCY SCRIPT ~~~~~~~~~~~~~~

# Install system requirements for MDAI
# .SILENT: mdai-system-reqs
.PHONY: mdai-system-reqs
mdai-system-reqs:
@echo "🟢 Start mdai-system-reqs..."
@go version || brew list go || brew install go \
kubectl version || brew list kubectl || brew install kubectl \
npm -v || brew list npm || brew install npm \
cdk version || brew list aws-cdk || brew install aws-cdk \
docker -v || brew list --cask docker || brew install --cask docker \
docker pull otel/opentelemetry-collector:0.95.0 \
helm version || brew list helm || brew install helm \
kind version || brew list kind || brew install kind
@echo "✅ Complete mdai-system-reqs!"

# Add helm chart repos available on system / machine
# .SILENT: mdai-add-helm-charts
.PHONY: mdai-add-helm-charts
mdai-add-helm-charts:
@echo "🟢 Start mdai-add-helm-charts..."
@helm repo list | grep prometheus-community || helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
@helm repo list | grep open-telemetry || helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
@helm repo list | grep mydecisive || helm repo add mydecisive https://decisiveai.github.io/mdai-helm-charts
@echo "✅ Complete mdai-add-helm-charts!"

# Install helm charts to cluster
.SILENT: mdai-install-helm-charts
.PHONY: mdai-install-helm-charts
mdai-install-helm-charts:
@echo "🟢 Start mdai-install-helm-charts..."
helm repo update
helm upgrade -f ./templates/prometheus-values.yaml prometheus prometheus-community/prometheus --install --wait
helm upgrade opentelemetry-operator open-telemetry/opentelemetry-operator --install --set admissionWebhooks.certManager.enabled=false --set admissionWebhooks.certManager.autoGenerateCert=true --wait
helm upgrade mdai-api mydecisive/mdai-api --version 0.0.3 --install
helm upgrade mdai-console mydecisive/mdai-console --version 0.0.6 --install
@echo "✅ Complete mdai-install-helm-charts!"

# ~~~~~~~~~~~~~~~~~ CREATE ACTION RULES ~~~~~~~~~~~~~~~~~

# Creates a cluster for an MDAI engine to be created
.SILENT: create-mdai-cluster
.PHONY: create-mdai-cluster
create-mdai-cluster:
@kind get clusters | grep -q mdai-local || kind create cluster --name mdai-local
kubectl cluster-info --context kind-mdai-local

# Creates a cluster for an MDAI engine to be created
.SILENT: mdai-install-cluster-dependencies
.PHONY: mdai-install-cluster-dependencies
mdai-install-cluster-dependencies: \
mdai-system-reqs mdai-add-helm-charts mdai-install-helm-charts

# wait for required pods to be installed
.SILENT: mdai-wait-for-pods
.PHONY: mdai-wait-for-pods
mdai-wait-for-pods:
@echo "🟢 Start mdai-wait-for-pods..."
kubectl -n default wait --for condition=ready pod -l app.kubernetes.io/name=opentelemetry-operator
kubectl -n default wait --for condition=ready pod -l app.kubernetes.io/component=controller-manager
@echo "✅ Complete mdai-wait-for-pods!"

# deploy collector config
.SILENT: mdai-deploy-config
.PHONY: mdai-deploy-config
mdai-deploy-config:
kubectl apply -f ./templates/otel-collector.yaml

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~ LOCAL INSTALL RECIPE ~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.SILENT: install-all
.PHONY: install-all
install-all: create-mdai-cluster mdai-install-cluster-dependencies mdai-wait-for-pods mdai-deploy-config
29 changes: 29 additions & 0 deletions make/Makefile-local-recipes
@@ -0,0 +1,29 @@
# ~~~~~~~~~~~~~~~~~ RECIPES ~~~~~~~~~~~~~~~~~

.SILENT: create-mdai
.PHONY: create-mdai
create-mdai:
@echo "🏁 Create MDAI Engine started..."
@time $(MAKE) -f ./make/Makefile-local-install install-all
@echo "🐙 Create MDAI Engine completed successfully!"

.SILENT: delete-mdai
.PHONY: delete-mdai
delete-mdai:
@echo "🐙 Destroy MDAI Engine started..."
@$(MAKE) -f ./make/Makefile-local-uninstall uninstall-all
@echo "🪦 Destroy MDAI Engine completed successfully!"

.SILENT: delete-mdai-all
.PHONY: delete-mdai-all
delete-mdai-all:
@echo "🐙 Destroy MDAI Engine started..."
@$(MAKE) -f ./make/Makefile-local-uninstall uninstall-all-artifacts
@echo "🪦 Destroy MDAI Engine completed successfully!"

.SILENT: update-mdai-collector
.PHONY: update-mdai-collector
update-mdai-collector:
@echo "🐙 Update started"
kubectl apply -f ./templates/otel-collector.yaml
@echo "🐙 Update done"
55 changes: 55 additions & 0 deletions make/Makefile-local-uninstall
@@ -0,0 +1,55 @@
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~ LOCAL DIST UNINSTALL SCRIPTS ~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# delete collector config
.SILENT: delete-mdai-config
.PHONY: delete-mdai-config
delete-mdai-config:
kubectl delete -f./templates/otel-collector.yaml --ignore-not-found=true

# delete cluster
.SILENT: delete-mdai-cluster
.PHONY: delete-mdai-cluster
delete-mdai-cluster:
kind delete cluster --name mdai-local

# uninstall helm releases
.SILENT: uninstall-helm-releases
.PHONY: uninstall-helm-releases
uninstall-helm-releases:
helm uninstall mdai-console mdai-api prometheus opentelemetry-operator --ignore-not-found

# uninstall helm chart repos
.SILENT: uninstall-helm-artifacts
.PHONY: uninstall-helm-artifacts
uninstall-helm-artifacts:
# todo: need to figure out how to try/catch and move on
@echo "Executing uninstall-helm-artifact..."
helm repo remove open-telemetry
helm repo remove prometheus-community
helm repo remove mydecisive
@echo "Successfully ran uninstall-helm-artifact!"

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~ LOCAL UNINSTALL RECIPIE ~~~~~~~~~~~~~~~~~~~
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# TODO: move this to a cluster management makefile
.SILENT: delete-config
.PHONY: delete-config
delete-config: delete-mdai-config

# TODO: evaluate if command is needed
.SILENT: uninstall-helm-repos
.PHONY: uninstall-helm-repos
uninstall-helm-repos: uninstall-helm-artifacts


.SILENT: uninstall-all
.PHONY: uninstall-all
uninstall-all: uninstall-helm-releases | delete-mdai-cluster

.SILENT: uninstall-all-artifacts
.PHONY: uninstall-all-artifacts
uninstall-all-artifacts: uninstall-helm-releases uninstall-helm-artifacts delete-mdai-cluster
51 changes: 51 additions & 0 deletions templates/otel-collector.yaml
@@ -0,0 +1,51 @@
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: gateway
spec:
ports:
- name: promexporter
port: 9464
protocol: TCP
- name: metrics1
port: 8888
protocol: TCP
- name: metrics2
port: 4317
config: |
receivers:
otlp:
protocols:
grpc:
http:
extensions:
zpages:
endpoint: 0.0.0.0:55679
processors:
batch:
send_batch_size: 1024
send_batch_max_size: 1500
timeout: 0s
exporters:
debug:
# verbosity: detailed
prometheus:
endpoint: "0.0.0.0:9464"
resource_to_telemetry_conversion:
enabled: true
enable_open_metrics: true
service:
extensions: [zpages]
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug]
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug]
9 changes: 6 additions & 3 deletions templates/prometheus-values.yaml
Expand Up @@ -39,12 +39,15 @@ serverFiles:
- job_name: otel-collector
honor_labels: true
tls_config:
insecure_skip_verify: true
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name, __meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_pod_container_port_number]
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component, __meta_kubernetes_pod_annotation_prometheus_io_scrape]
separator: ;
regex: test-collector-collector;8888;true;8888
regex: opentelemetry-collector;true
action: keep
- source_labels: [__address__]
regex: '.*:(431[78])'
action: drop

0 comments on commit edff27a

Please sign in to comment.