GitHub - Annrison/BigVis: Brand-centric network comprising 600 million articles from the makeUp board on PTT

Description

This project is for the BigVis seminar held by National Sun Yat-sen University in 2021.

In this project, we analyzed a 600 million-article text network on MakeUp board (PTT) and developed an interactive front-end interface for effective data visualization with SparkR and shiny app.

Tools

R, shiny app, sparkR

Data processing workflow

1. Preprocess in cloud

We use hadoop to preprosess the vast amount of the documents in cloud server, in this step, we exlude documents which are too short, and calculate the importance of the word by tf-idf.

2. Local server

In this step, after we get the importance of the words:

We choose words with high tfidf and categorized them into six category, the details of the categories would be explained below.
Build word-sentences and word-document matrix to create the brand-centric network. The nodes are words, and the edges are in two types, they can be the time co-occurance of the word pair or the correlation of the words.

How to use

Create the network

Choose the brand on the left side bar
Choose the number of the nodes displayed on the web page (4 ~ 32 nodes)
Choose the relation type of the edges (correlation / co-occurance)
Choose the threshold of the word relation, only the word pairs which have the relation higher than the threshold would be displayed on the page.

View the article

Click on the edge of the word pair which you are interested. and the sentences containing the word pair would be displayed on the bottom right side bar.
The sentences would be sorted by post date, and by clicking the sentence, the full article would be show in the buttom left side bar.

Interactive interface

Network

The category of the keywords

brand: Name of the makeup brand, like m.a.c, benifit, dior
feature: Features of the product, like 持久力, 廣感, 自然光
product: Name of the makeup product, like 染眉膏, 眉筆, 口紅
condition: Product trial effect, like 致痘, 偏乾, 卡粉
problem: The user's makeup concerns while applying makeup,like 黑眼圈, 乾肌, 油肌
emotion: Emotion of the user, like 心動, 必買, 燒到

Article bar

The article and sentences containing the word pair would be displayed in this format.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
.DS_Store		.DS_Store
README.md		README.md
pttMakeUp.Rmd		pttMakeUp.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

.DS_Store

.DS_Store

README.md

README.md

pttMakeUp.Rmd

pttMakeUp.Rmd

Repository files navigation

Description

Related links

Tools

Data processing workflow

1. Preprocess in cloud

2. Local server

How to use

Create the network

View the article

Interactive interface

Network

The category of the keywords

Article bar

About

Releases

Packages

Annrison/BigVis

Folders and files

Latest commit

History

Repository files navigation

Description

Related links

Tools

Data processing workflow

1. Preprocess in cloud

2. Local server

How to use

Create the network

View the article

Interactive interface

Network

The category of the keywords

Article bar

About

Resources

Stars

Watchers

Forks