Mine-Youtube-Data

Code that mines data from the youtube comments section from any playlist.

A Data Mining Script

Note : This is not an application or a module to be downloaded and used. It is more like a blueprint to use for mining data. I created this for my use only, but it is public to view and anyone can use it if they can figure out how to. One has to read through the code and understand it in order to use it because the code will be provided a private api key.

This uses the Youtube Data Api. Data must be collected over the course of days, since the youtube data api has a daily quota. get_data.py is to be run each day with renewed quota until all data has been collected. The key given will be used to exhaustion each day unless there is no more data. Do not change the key and run before cleaning the state. It may result in key getting banned from using the Youtube Data API.
For reference, quota exhausted.json contains the response given by youtube when the quota has been used and comments disabled.json contains the response given by youtube when the comments for a video are disabled.
get_data.py does the actual data mining, cab_1729.py contains all details for what to mine.
to_json.py converts the data to a more human readable format. It does not however convert all the data, images are also stored in the data files. Images are not converted to json.
Data is stored in shelve files with the same name as the playlist, state shelf stores how much data has been mined, for next time.
It is recommended to not change anything in the files before all the data has been mined, except the semaphore counts given in get_data.py
logs directory is for storing the output log which is not done by default. I personally use tmux-logging for that.
storage is a folder to backup the data
.bat files are for quickly handling backup
tests.py is for storing the tests.
In order to use, it is almost necessary to read the code or open the data shelf using python to understand in what format the data is being stored.
All the filtering is done natively, so filtering code is completely private. This goes through all the available comments from the playlist, so Youtube servers only get the API key, IP, and playlist id.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
logs		logs
storage		storage
GIF.gif		GIF.gif
LICENSE		LICENSE
README.md		README.md
bring_shelve.bat		bring_shelve.bat
cab_1729.py		cab_1729.py
comments disabled.json		comments disabled.json
get_data.py		get_data.py
get_video_data.diff		get_video_data.diff
quota exhausted.json		quota exhausted.json
rem_shelve.bat		rem_shelve.bat
store_shelve.bat		store_shelve.bat
tests.py		tests.py
to_json.py		to_json.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logs

logs

storage

storage

GIF.gif

GIF.gif

LICENSE

LICENSE

README.md

README.md

bring_shelve.bat

bring_shelve.bat

cab_1729.py

cab_1729.py

comments disabled.json

comments disabled.json

get_data.py

get_data.py

get_video_data.diff

get_video_data.diff

quota exhausted.json

quota exhausted.json

rem_shelve.bat

rem_shelve.bat

store_shelve.bat

store_shelve.bat

tests.py

tests.py

to_json.py

to_json.py

Repository files navigation

Mine-Youtube-Data

A Data Mining Script

About

Releases

Packages

Languages

License

cab-1729/Mine-Youtube-Data

Folders and files

Latest commit

History

Repository files navigation

Mine-Youtube-Data

A Data Mining Script

About

Topics

Resources

License

Stars

Watchers

Forks

Languages