Skip to content

Gods-of-Bigdata/SS_yab

Repository files navigation

SS_Yab

Tasks

  • Sahamyab Crawler - NSQ Producer
  • Tweet Preprocessing - Cassandra DBMS
  • Elastic - Kibana dashboard - Redis
  • Flask dashboard
  • ML model
  • Clickhouse DBMS - Superset visualization

Prerequisites

NSQ
pynsq (pip package)
colorama (pip package)
requests (pip package)
openjdk-8
Cassandra
cassandra-driver (pip package)
hazm (pip package)
nltk (pip package)
elasticsearch (pip package)
redis (pip package)
wordcloudfa (pip package)
jwt (pip package)
psutil (pip package)
flask-login(pip package)
docker

Installing

- NSQ

1- Download latest NSQ binaries HERE.
2- Extract the archive, add bin folder to system PATH variable.
3- Install prerequisties libraries:

$ pip install pynsq colorama requests
$ pip install hazm
$ pip install https://github.com/sobhe/hazm/archive/master.zip --upgrade

- Preprocess

1- Install prerequisites libraries:

$ pip install hazm (also will install nltk as prerequisite)
$ pip install https://github.com/sobhe/hazm/archive/master.zip --upgrade

2- Download and extract nlp prerequisite resource.zip in project folder.

- Cassandra

1- Install jdk-8

$ sudo apt-get install openjdk-8-jdk
$ export JAVA_HOME=path_to_java_home

2- Install Cassandra:

$ echo "deb https://downloads.apache.org/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
$ curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install cassandra

3- Install Cassandra Python Driver (cassandra-driver):

$ pip install cassandra-driver

- Elasticsearch & Kibana

1- Install Elasticsearch & Kibana. (We are using 7.8.0)

For Ubuntu, follow these steps:

  • Elasticsearch:
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ sudo apt-get install apt-transport-https
$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
$ sudo apt-get update && sudo apt-get install elasticsearch
  • Kibana:
$ wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
$ sudo apt-get install apt-transport-https
$ echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
$ sudo apt-get update && sudo apt-get install kibana

2- Install Python Elasticsearch Client

$ python -m pip install elasticsearch

- Redis

1- Install Redis

$ sudo apt update
$ sudo apt install redis-server

2- In the config file, change supervised no to supervised systemd, so it will run with the system start-up.
In the end, restart the server:

$ sudo systemctl restart redis.service

3- Install redis-py (or here):

$ pip install redis

- Clickhouse

create and run clickhouse container

$ docker network create -d bridge sahamyab
$ docker run -d -p 8123:8123 -p 9000:9000 --network="sahamyab" --name clickhouse --ulimit nofile=262144:262144

- Apache Superset

create and run superset container

$ docker run --detach -p 8080:8088 --name superset --network="sahamyab" amancevice/superset

Usage

1- In one shell, start nsqlookupd:

$ nsqlookupd

2- In another shell, start nsqd:

$ nsqd --lookupd-tcp-address=127.0.0.1:4160

3- In another shell, start nsqadmin:

$ nsqadmin --lookupd-http-address=127.0.0.1:4161

4- Now run Sahamyab tweet crawler/producer:

$ python sahamyab_producer.py

5- You can run an example program for consuming tweets:

$ python sahamyab_consumer_example.py

** To use consumer.py you must first run these:
Cassandra:

$ sudo Cassandra -R

Elasticsearch:

$ sudo /bin/systemctl daemon-reload
$ sudo /bin/systemctl enable elasticsearch.service
$ sudo systemctl start elasticsearch.service

Kibana:

$ sudo /bin/systemctl daemon-reload
$ sudo /bin/systemctl enable kibana.service
$ sudo systemctl start kibana.service

Also you need to import dashboard.ndjson into Kibana (Saved objects).

Redis:
If you did that config part, should already be runnig; if not:

$ sudo systemctl start redis.service

Clickhouse:

$ python3 clickhouse_consumer.py

Superset:

Go to localhost:8080

Add clickhouse to sources -> databases by clickhouse://clichouse and sahamyab in sources -> tables

Go to manage -> import dashboards and import superset_dashboard.json that found in resources folder of project

Flask dashboard:

cd flask_dashboard
sudo python3 run.py

License

This project is licensed under the GPLv2 - see the LICENSE.md file for details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •