ICT2107-team-p3_6-project

This is the implementation of our ICT2107 Hadoop Project done by:

Student	Student ID
Fabian Chua	2101506
Shaun Sartra Varghese	2102172
Pang Ka Ho	2102047
Norman Chia	2100686
Wang Qixian	2101751

You can view the dashboard demo here on https://ict2107-team-p3-6-project.vercel.app/

1) Running the GlassDoor Crawler

run 'pip install -r requirements.txt'
change GlassDoor link on line 142 of GlassDoorCrawler.py to the company you want to crawl
Run GlassDoorCrawler.py
Crawled reviews will be automatically saved to a CSV file named after the company

2) Running the Sentimental Analysis JAR

There are 2 different JARs, one for each SA method (Lexicon/ CoreNLP)

2.1) Running Lexicon Sentimental Analysis JAR

Ensure that the input, output directory the list of negative and positive words is present in the HDFS. (Reader may use the word list that we have used by going to the following directory: Backend > SentimentAnalysis > words.csv)
The input directory should contain 1 or more csv files as input data in the following format: Summary,Date,JobTitle,AuthorLocation,OverallRating,Pros,Cons
Navigate to the folder that contains the lexiconSa jar file.
Usage: hadoop jar group_p3_6_lexiconSa.jar org.example.WordComparisonAnalysis <input_dir> <output_dir> <words.csv>
Example usage :

hadoop jar group_p3_6_lexiconSa.jar org.example.WordComparisonAnalysis hdfs://localhost:9000/user/shaunv/project/wordMapInput/ hdfs://localhost:9000/user/shaunv/project/wordMapOutput/ hdfs://localhost:9000/user/shaunv/project/words.csv

Output will be present in the output_dir specified in HDFS.

2.2) Running CoreNLP Sentimental Analysis JAR

⚠️ Running the CoreNLP analysis on large datasets is VERY resource intensive and takes a VERY LONG time: We recommend using a smaller test dataset!

Ensure that the input, output directory is present in the HDFS.
The input directory should contain 1 or more csv files as input data in the following format: Summary,Date,JobTitle,AuthorLocation,OverallRating,Pros,Cons
Navigate to the folder that contains the nlpSa jar file.
Usage: hadoop jar group_p3_6_nlpSa.jar org.example.GlassdoorSentimentAnalysis <input_dir> <output_dir>
Example usage :

hadoop jar group_p3_6_nlpSa.jar org.example.GlassdoorSentimentAnalysis hdfs://localhost:9000/user/shaunv/project/nlpSaInput/ hdfs://localhost:9000/user/shaunv/project/nlpSaOutput/

Output will be present in the output_dir specified in HDFS.

3) Running the Topic Modelling JAR

Ensure that the input directory and a list of stopwords is present in the HDFS. (Reader may use the stopwords that we have used by going to the following directory: Backend > TopicModelling > mallet > stopwords.txt
The input directory should contain 1 or more csv files as input data in the following format: Summary,Date,JobTitle,AuthorLocation,OverallRating,Pros,Cons
Navigate to the folder that contains the topic modelling jar file.
Usage: hadoop jar group_p3_6_tm.jar <input_dir> <output_dir> <stopwords.txt>
Example usage :

hadoop jar group_p3_6_tm.jar project/input project/output project/stopwords.txt

Output will be present in the output_dir specified in HDFS.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Backend		Backend
frontend		frontend
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend

Backend

frontend

frontend

.DS_Store

.DS_Store

LICENSE

LICENSE

README.md

README.md

Repository files navigation

ICT2107-team-p3_6-project

1) Running the GlassDoor Crawler

2) Running the Sentimental Analysis JAR

There are 2 different JARs, one for each SA method (Lexicon/ CoreNLP)

2.1) Running Lexicon Sentimental Analysis JAR

2.2) Running CoreNLP Sentimental Analysis JAR

3) Running the Topic Modelling JAR

About

Releases

Packages

Contributors 4

Languages

License

fabianchua6/ict2107-team-p3_6-project

Folders and files

Latest commit

History

Repository files navigation

ICT2107-team-p3_6-project

1) Running the GlassDoor Crawler

2) Running the Sentimental Analysis JAR

There are 2 different JARs, one for each SA method (Lexicon/ CoreNLP)

2.1) Running Lexicon Sentimental Analysis JAR

2.2) Running CoreNLP Sentimental Analysis JAR

3) Running the Topic Modelling JAR

About

Topics

Resources

License

Stars

Watchers

Forks

Languages