Skip to content

Scripts for processing and mining (classic) literature and other text data, such as screenplays

Notifications You must be signed in to change notification settings

lhehnke/text-mining-literature

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

text-mining-literature

Scripts for processing and mining (classic) literature and PDF files

Description: text_mining_dracula

The script covers

  • downloading and processing public domain works in the Project Gutenberg collection with gutenbergr
  • transforming works into a tidy format
  • mining works by
    • calculating and plotting word frequencies
    • plotting word and comparison clouds
    • conducting sentiment analyses (nrc)

using the example of Bram Stoker's Dracula.

Description: text_mining_the_room

Corresponding blog post: https://lhehnke.github.io/notes/2018/01/25/text_mining_the_room

The script covers

  • downloading, importing and processing PDF files in R
  • transforming PDF files into a tidy format
  • mining PDF files by
    • calculating and plotting word frequencies
    • conducting sentiment analyses (nrc; bing)
    • plotting word and comparison clouds
    • visualizing the most frequent positive and negative words (bing sentiments)

using the script of The Room a.k.a. the worst film ever made (directed, produced, written by and starring Tommy Wiseau).

Source: https://theroomscriptblog.files.wordpress.com/2016/04/the-room-original-script-by-tommy-wiseau.pdf

Example plot:

About

Scripts for processing and mining (classic) literature and other text data, such as screenplays

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages