Goal of this repo is to see how elaborate elasticsearch query can be for our purpose we will be running few hundred async jobs simulatenously and sending the logs to the efk
The logs will be one of the kinds below
- job start
- action
- job finish
with the fixed format
and for the sake of uniformity, axe-logger rather than env-logger was used
goals we want to achieve are
- group logs by uuid
- measure time between each logs from 1
- measure time between logs that we want from 1
- export data from 1,2,3 to csv
Once 1,2,3 is done, exporting data so that we can analyze depends on how we use that data, thus manual scripting is necessary. Manual scripting is not the goal of this poc and is left to the team for further work.
Be as specific as possible
only been tested on my local machine with ubuntu 20.04
- Run elasticsearch and kibana locally
https://jinhokwon.github.io/devops/elasticsearch/elasticsearch-docker/
docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elasticsearch7 docker.elastic.co/elasticsearch/elasticsearch:7.9.1
docker run -d --link elasticsearch7:elasticsearch -p 5601:5601 --name kibana7 docker.elastic.co/kibana/kibana:7.9.1
- Run minikube
check the minikube default setting
minikube start
- Deploy fluentbit
since the fluent bit pod running inside the minikube need to access es container outside minikube docker, you need to change fluent_bit/config.yaml file so that host points to the actual ip address of host.minikube.internal. For my case, it is 192.168.49.1 but could be different on yours machine
kubectl apply -f fluent_bit/role.yaml
kubectl apply -f fluent_bit/config.yaml
kubectl apply -f fluent_bit/daemonset.yaml
- deploy our sample application in the minikube
kubectl apply -f sample.yaml
-
Copy and paste queries from the queries file to the kibana dashboard
Queries files are located in the queries/ directory with each number standing for group logs by uuid, measure time between each logs from 1, measure time between logs that we want from 1 respectively.
Queries were written in json format since we will probably use elasticsearch client to make api calls for scripting and in this case, painless scripting is actually very painful.
this may vary based on our need
- send request to get unique uuids(using script1)
- send request for each unique uuid to get logs sorted with timestamp(using script2)
- calculute necessary data(latency etc) based on data from 2
- add all data from 3?
//simple pseudocode
struct Log;
struct MeaningfulData;
struct Goal;
fn get_list_of_uuids() -> Vec<String>;
fn send_request_for_uuid(uuid: String) -> Vec<Log>;
fn calculate_data(logs: Vec<Log>) -> MeaningfulData;
fn ananlyze_meaningful_data(data: Vec<MeaningfulData>) -> Goal;
let uuids = get_list_of_uuids();
let mut meaningful_datas= vec![];
for uuid in uuids {
meanigful_datas.push(calculate_data(send_request_for_uuid(uuid)));
}
let goal = ananlyze_meaningful_data(meaningful_datas);
//done
- delete existing index with
curl -X DELETE 'http://localhost:9200/tracing'
Role of this demo is to show how this poc could function as a tool for analyzing data
- send query for getting unique job_ids
- get list of unique job_ids => and compare(maybe make a validator)
- send request for each job_id and format data
- calculate data we want and save
- visualize
cargo run --bin demo
Then, the result will be both saved as a csv file and printed out on the console.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-from-size.html