BigData

This is an attempt to create a basic library for Big Data in python

Plan

web crawler -> cluster -> map-reduce -> NoSQL -> visualization

Execution

scrapy -> HDFS -> spark -> elasticsearch -> js react client

TODO: web crawler (scrapy) -> cluster (HDFS) -> map-reduce (spark) -> NoSQL (elasticsearch) -> SQL (SQLite) -> visualization (matplotlib)

Projects

moto prices

Pseudo Code

for site in sites_list:
    for div_element:
        recurse()
    if div_element is None:
    for html_element.text():
        type = filter/diagnose(element)
        sql.insert("INSERT VALUES(type element);")
HDFS.save_file(moto_list.db)
json = Spark.process(HDFS.get(moto_list.db))
Elastic.save(json)
react.fetch(json).visualize()

tasks

לרוץ על כל span
לפלטר לפי הערך (גם אם מלוכלך)
להכניס ל sql לפי הפילטר
לשמור את ה sql ב HDFS

site with moto prices
scrape model & prices
save to HDFS
map-reduce/process & mine/analyze/(ML?)
save to elastic
Flask
visualize in react

model => range of prices

index	year	cc	price	color	model
1	2002	400	200$	#FFF	kawasaki
2	2003	200			ninja
3	2002	400		white

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.idea		.idea
BD_projects		BD_projects
Hadoop		Hadoop
NoSQL/ElasticSearch		NoSQL/ElasticSearch
SQL		SQL
Spark		Spark
Web		Web
testsAndOthers		testsAndOthers
user/hdfs/test/example.csv		user/hdfs/test/example.csv
venv/pythonVS/pythonVS		venv/pythonVS/pythonVS
visualization		visualization
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

BD_projects

BD_projects

Hadoop

Hadoop

NoSQL/ElasticSearch

NoSQL/ElasticSearch

SQL

SQL

Spark

Spark

Web

Web

testsAndOthers

testsAndOthers

user/hdfs/test/example.csv

user/hdfs/test/example.csv

venv/pythonVS/pythonVS

venv/pythonVS/pythonVS

visualization

visualization

README.md

README.md

Repository files navigation

BigData

Plan

Execution

Projects

moto prices

Pseudo Code

tasks

About

Releases

Packages

Languages

adamal92/BigData

Folders and files

Latest commit

History

Repository files navigation

BigData

Plan

Execution

Projects

moto prices

Pseudo Code

tasks

About

Resources

Stars

Watchers

Forks

Languages