Bro-ssessment

Project Structure

Bro-ssessment/
├── *.py                    # TO BE REFACTOR, a collection of scripts to perform analysis tasks
├── brossessment            # Internal python modules, such as DB models
├── data                    # Raw documents
│   ├── posts
│   │   ├── <posts_1>.excel
│   │   ├── <posts_2>.excel
│   │   └── <...more...posts>
│   ├── syllabus
│   │   ├── <class_1>.txt
│   │   ├── <class_2>.txt
│   │   └── <...more...syllabus>
│   └── <...more...>
├── infra                   # OPTIONAL, handy scripts to spin up a Postgres instance with zero effort
├── misc                    # A collection of utility scripts for preprocessing raw documents
└── sql                     # SQL scripts use to create database tables

Behaviour Metrics

Off-topic posts

We use Laten Semantic Analysis to compare course syllabus and posts from discussion forum to identify off-topic posts. In the case of PeppeR project, all syllabus are provided in the format of .pdf and .docs. We provid a handy script to convert them into regular text file for the sake of convinence for parsing.

We use textract library to extract texts from those files. Follow the installation instruction. You can find a list of supported extension in their documentation.

Place all your files with the naming convention of <course_id>.{pdf,docx,doc,etc} under data/syllabus. Then run

$ python misc/pdf_2_txt.py data/syllabus

Run the off topic analysis.py

$ python off_topic_analysis.py

Parse CCS result

$ python misc/extract_ccs.py

Produce features.csv for modeling

# Run the script, and the csv file will be avaliable inside the /data folder
$ python extract_features.py

Then you may use the analysis.Rmd file to train the model.

Development

# Setup python virtual environment
$ pip3 install --user virtualenv
$ virtualenv venv --python=python3
$ source venv/bin/activate

# Install python dependencies
$ pip install -r requirements.txt

# Setup environment variable
$ cp .env.example .env
$ source .env

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.vscode		.vscode
brossessment		brossessment
data/syllabus		data/syllabus
infra		infra
misc		misc
sql		sql
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
analysis.Rmd		analysis.Rmd
classStatisticsQuery.sql		classStatisticsQuery.sql
docker-compose.yml		docker-compose.yml
excelParser.py		excelParser.py
extract_features.py		extract_features.py
fetch_sentiment_score.py		fetch_sentiment_score.py
fetch_sentiment_score_bro.py		fetch_sentiment_score_bro.py
fetch_sentiment_score_textblob.py		fetch_sentiment_score_textblob.py
fetch_sentiment_score_vader.py		fetch_sentiment_score_vader.py
frequency_dictionary_en_82_765.txt		frequency_dictionary_en_82_765.txt
gca_analysis.py		gca_analysis.py
main.py		main.py
off_topic_analysis.py		off_topic_analysis.py
pos.py		pos.py
replyQuery.sql		replyQuery.sql
requirements.txt		requirements.txt
sentimentAnalysis.py		sentimentAnalysis.py
spellCheckTest.py		spellCheckTest.py
spell_pg.py		spell_pg.py
test_sentiment.py		test_sentiment.py
tone.json		tone.json
userStatisticsQuery.sql		userStatisticsQuery.sql

Bro-ssessment/Bro-ssessment

Folders and files

Latest commit

History

Repository files navigation

Bro-ssessment

Project Structure

Behaviour Metrics

Off-topic posts

Parse CCS result

Produce features.csv for modeling

Development

About

Topics

Resources

Stars

Watchers

Forks

Languages