Taraana30

Taraana30 is a web scraper that calculates weekly Top 30 Bollywood Songs from various platforms:

Radio Mirchi - List
Gaana - Play List
saavn - Play List
Hungama - Play List
Wynk - Play List

The Web Scraper scrapes the above List/Play List for the data from various providers and save the data in the data folder : ./data/<date of previous week's saturday>/ more on <date of previous week's saturday> later

Usage

The Web Scraper provides 2 main commands:

To get just top_30.csv and candidates.csv:

    $ python main.py

Run the above command from the root of the folder to produce top_30.csv and candidates.csv files in the data folder: ./data/<date of previous week's saturday>/top_30.csv and /candidates.csv

To get all .csv files from the scraper:

   $ python main.py --all

Run the above command from the root of the folder to procude:

candidates.csv
top_30.csv
gaana.csv
hungama.csv
mirchi.csv
saavn.csv
wynk.csv

in the ./data/<date of previous week's saturday>/ folder if it exist or create it then create the files

<date of previous week's saturday>: The folder naming system of the scraper uses the date of the previous Saturday to distinguish between two weeks. The date is in the formate: DD-MM-YYYY. (The week changes on Sunday 00:00:00 i.e., even if the scrip is run on Saturday the folder name will be the date of previous Saturday)

Example:

./
├── data
    ├── 22-06-2019
    │   ├── top_30.csv
    │   └── candidates.csv
    └── 29-06-2019
        ├── top_30.csv
        ├── candidates.csv
        ├── gaana.csv
        ├── hungama.csv
        ├── mirchi.csv
        ├── saavn.csv
        └── wynk.csv

Installation

Currently the tool is only present as a GitHub repository and could be used from there only

Fork and Clone to your machine
Run the pipenv: pipenv shell
Run pipenv install --ignore-pipfile to install all dependencies to your machine. The main dependencies are:
- beautifulsoup4
- requests
- lxml

Technical Aspects

Language: Python v3.7.3
Scraping Module: BeautifulSoup v4.7.1 with lxml parser
I/O request Module: requests v2.22.0
Misc: This project is a collection of 5 scrapers one for each plateform:
- radiomirchi.com
- gaana.com
- hungama.com
- jiosaavn.com
- wynk.in
To speed up the scraping process particularly the delay in various I/O requests for gathering the source code from various platforms Multi-Threading is used, 1 thread per scraper (i.e., thread pool of 5 threads)

Note: The data/22-06-2019 folder in the repository is just an example/sample folder with data just to see the output of the script. No .csv file should be saved with the scraper. If you want to disable this feature, then remove *.csv from .gitignore file

Using as a Package

When using as a package import the main module and call the taraana30() function on it.

from Taraana import main

# This function will just write the .csv files and will not return anything
main.taraana30()

the taraana30() takes 1 optional argument all_files which tells how many files to create

all_files=True: this is same as passing --all argument to the script from terminal
all_files=False: (Default) this is same as running the script without any argument

Note: The taraana30() function makes the ./data/<date of previous week's saturday>/ folder relative to the script it is called from (it uses the os.getcwd() function to find the current working directory and makes the ./data/<date of previous week's saturday>/ directory in it)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Readme Content		Readme Content
data/22-06-2019		data/22-06-2019
scrapers		scrapers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
main.py		main.py
top_30.py		top_30.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme Content

Readme Content

data/22-06-2019

data/22-06-2019

scrapers

scrapers

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

README.md

README.md

init.py

init.py

main.py

main.py

top_30.py

top_30.py

Repository files navigation

Taraana30

Usage

Installation

Technical Aspects

Using as a Package

About

Releases

Packages

Contributors 3

Languages

License

Yash-Handa/Taraana30

Folders and files

Latest commit

History

Repository files navigation

Taraana30

Usage

Installation

Technical Aspects

Using as a Package

About

Resources

License

Stars

Watchers

Forks

Languages