RT-UBER-NYC-TAXI

What's New (06/04/2018)

All the API's i.e. UBER, LYFT and QueryAPIandStoreCSV have been rewritten to use RXJava2 constructs Flowable Single with retry support and delaying of stream errors.
Requests are scheduled on Schedulers.io().
No messy Executors or Callback hell.

General Information

This project contains the code to query Uber API for price information.
This project contains the code for the lyft API.
It also contains the code for MLLIB and SPARK to build predictive models
The directories contain specific code data contains data (obtained from uber and lyft)
lyft-client contains java code to invoke LYFT API. It has support for rate limiting.
uber-client contains java code to invoke Uber API. It has support for rate limiting.
QueryAPIandStoreCSV contains the code to invoke both Uber and Lyft API in parallel
flume contains flume script.
lyft-analytics contains spark and mllib analytics code on lyft datasource
uber-analytics contains spark and mllib analytics code on uber datasource
nyc-yellow-taxi-analytics contains spark and mllib analytics code on nyc yellow taxi datasource
data contains part of the data obtained by querying uber and lyft API
scripts contains hive and impala queries used for filtering the data.

Running the code

To test uber and lyft api's you can run the main class in lyft-client(LyftClientUtilTest.java) or uber-client(UberClientUtil.java). These jars are not runnable.
Remember you still have to add the values for property keys in resources folder of each client.
The main code that invokes the above api to gather data is under RT-UBER-NYC-TAXI/QueryAPIAndStoreCSV/build/libs/QueryAPIAndStoreCSV1-0.jar. You can run this code. You need to specify a sample data file located in the data folder. for ex java -jar QueryAPIAndStoreCSV-1.0.jar -d data -f query-data.csv
You can then create tables in hive using the db_scripts.sql.
Although flume is not required you can use it. For flume just run the flume agent. flume-ng agent --conf conf -f flume.conf -n flume-hive-ingest
Again running spark code for any of the 3 modules requires that you create hive tables and load and store data. The scripts directory contains the scripts. After you are done with that run the Test*.class in each project. Initially the model will be trained using that data and stored, from next call onwards the execution will be fast as stored model will be used to do predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Lyft-Client		Lyft-Client
QueryAPIAndStoreCSV		QueryAPIAndStoreCSV
Uber-Client		Uber-Client
data		data
flume		flume
images		images
lyft-analytics		lyft-analytics
nyc-yellow-taxi-analytics		nyc-yellow-taxi-analytics
scripts		scripts
uber-analytics		uber-analytics
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lyft-Client

Lyft-Client

QueryAPIAndStoreCSV

QueryAPIAndStoreCSV

Uber-Client

Uber-Client

data

data

flume

flume

images

images

lyft-analytics

lyft-analytics

nyc-yellow-taxi-analytics

nyc-yellow-taxi-analytics

scripts

scripts

uber-analytics

uber-analytics

.gitignore

.gitignore

README.md

README.md

Repository files navigation

RT-UBER-NYC-TAXI

What's New (06/04/2018)

General Information

Running the code

About

Releases

Packages

Languages

ramannanda9/RT-UBER-NYC-TAXI

Folders and files

Latest commit

History

Repository files navigation

RT-UBER-NYC-TAXI

What's New (06/04/2018)

General Information

Running the code

About

Resources

Stars

Watchers

Forks

Languages