You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I want to thank you for this code. I've used it for my university project.
However, as an additional feature I've created a small python script "reviewsInLanguages.py" to collect the reviews in the languages other than English. If you find it okay, please add it to your code base. This python script can be used after running following command:
python3 amazon-reviews-scraper/amazon_comments_scraper.py -s "text to be searched" \
&>> ../outputfiles/input.txt
import sys
import langid
import pandas as pd
# a code by Raghvendra Pratap Singh
# M.Sc. student, Dublin City University, Ireland, 2019-20
#
#usage:
#python3 reviewsInLanguages.py <inputfile> <two letter language> <output.csv>
#
#example:
#python3 reviewsInLanguages.py inputs_dir/God_Talks_with_Arjuna_01012017.txt hi outputs_dir/God_Talks_with_Arjuna.csv
fileValue = sys.argv[1]
file1 = open(fileValue, 'r')
Lines = file1.readlines()
list = []
count = 0
ListOfLanguages = ['af','am','an','ar','as','az','be','bg','bn','br','bs','ca','cs','cy','da','de','dz','el','en','eo','es','et','eu','fa','fi','fo','fr','ga','gl','gu','he','hi','hr','ht','hu','hy','id','is','it','ja','jv','ka','kk','km','kn','ko','ku','ky','la','lb','lo','lt','lv','mg','mk','ml','mn','mr','ms','mt','nb','ne','nl','nn','no','oc','or','pa','pl','ps','pt','qu','ro','ru','rw','se','si','sk','sl','sq','sr','sv','sw','ta','te','th','tl','tr','ug','uk','ur','vi','vo','wa','xh','zh','zu']
if len(sys.argv[2])==2:
if sys.argv[2] in ListOfLanguages:
# Strips the newline character
for line in Lines:
a = langid.classify(line)
if a[0]==sys.argv[2]:
list.append(line)
else:
print("Please check https://pypi.org/project/langid/1.1dev/ and if your input language is available there, add it to ListOfLanguages")
else:
print("please enter the language with length of 2 characters")
sys.exit()
df = pd.DataFrame(list)
df.to_csv(sys.argv[3], encoding='utf-8')
Note: A better approach would be to run this command through the scheduler in Linux
It worked well for me and I collected 2900+ reviews.
The text was updated successfully, but these errors were encountered:
Hello, I want to thank you for this code. I've used it for my university project.
However, as an additional feature I've created a small python script "reviewsInLanguages.py" to collect the reviews in the languages other than English. If you find it okay, please add it to your code base. This python script can be used after running following command:
Note: A better approach would be to run this command through the scheduler in Linux
It worked well for me and I collected 2900+ reviews.
The text was updated successfully, but these errors were encountered: