Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.
Francesco Poldi edited this page Mar 30, 2018 · 8 revisions

Elasticsearch How-To

Zero Point

First of all you have to download two main tools:

Important notes:

  1. Elasticsearch requires at least Java 8, it is recommended to use the Oracle JDK version 1.8.0_131;
  2. Starting with version 6.0.0 Kibana only supports 64 bit operating systems, so if you are using earlier versions you should upgrade or just simply create the index before indexing data.

Elasticsearch is basically a search engine and Kibana is a tool for data visualization. We will index some data to the first one and create a dashboard with the second one.

Now everything is ready to go.

Initial setup

Since is Kibana that connects to Elasticsearch, let's run Elasticsearch first.

Expected Elasticsearch's output:

[2018-03-30T17:32:46,525][INFO ][o.e.n.Node] [T7Twj0J] started

Expected Kibana's output:

log [15:45:50.267] [info][status][plugin:elasticsearch@6.2.2] Status changed from yellow to green - Ready

If you are not getting these outputs I suggest you to dig in the corresponding documentation.

Now that everything is up and running:

  1. Index some data: python3.6 tweep.py --elasticsearch localhost:9200 -u user --database tweep.db (the --database arg is optional, --elasticsearch is mandatory and its value is for default settings, as in our case;

  2. Now we can create the index (that I already created): open your browser and go to http://localhost:5601 (again, this is a default value), Dev Tools tab, copy&paste index.json and than click the green arrow. Expected output is

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "tweep"
}
  1. Go to Management tab, Index Patterns, Create Index Pattern, Index Pattern: tweep and choose datestamp as time field;

  2. Go to the Discover tab, choose tweep and you should see something like this:

1

Visualizations setup

So now we have some data to play with but we need to visualize it as we want some meaning.

Here is a histogram based on daily activity:

daily

How to:

  • Visulize tab and than the blue + symbol;
  • Vertical bar;
  • select tweep;
  • X-Axis, Aggregation: Terms, Field: hour, Order By: Term, Order: Ascending, Size: 24;
  • click on Add sub-buckets, Split Series, Sub Aggregation: Terms, Field: username, Order By: Count, Order: Discending (or Ascending, depends on your needs), Size: 5 (for Top 5 or "worsts" 5);
  • click on the blue arrow.

You can do the same for a weekly activity, just replace the settings for the X-Axis: hour and Size: 24 with day and Size: 7.

Pie Charts for top users: you can base this on likes, retweets and replies.

Following the previous steps:

  • create a Pie chart;
  • Split Slices, Aggregation: Terms, Field: username;
  • etc...

Important: write _exists_:likes or _exists_:retweets or _exists_:replies to filter out specific rank base.

You should see something like this:

likes

Dashboard setup

Pretty easy:

  • select Dashboard tab;
  • create new one;
  • add previously created visualizations.

Now you have a basic setup. More is about to come.