Import Microsoft Word Transcript into R : Shorter Method
trinker edited this page Aug 23, 2012
·
1 revision
If your transcripts are in a Microsoft Word format this tutorial will demonstrate one procedure for cleaning and importing your data into R for use with qdap
. This method is shorter and automates most of the parsing for the researcher. If this method (relies on read.transcript
) fails the researcher will have to use the alternative method and do the parsing by hand.
###The following video demonstrates how to clean a Microsoft Word based transcript and read it into R.
Video
MS Word Transcript and R Script (zip file)
library(qdap)
dat <- read.transcript(file = "Test.xlsx", header = FALSE,
col.names=c("person", "dialogue"))
htruncdf(dat,,50)
#use rm_row to remove between row annotations
dat <- rm_row(dataframe = dat, search.column = "person", terms = c("[Cro", "[St"))
dat #look at it
#use column number instead
rm_row(dat, 1, c("[Cro", "[St"))
#The dash argument: see also ellipsis & quote2bracket arguments
args(read.transcript) #function arguments
dat <- read.transcript(file = "Test.xlsx", header = FALSE,
col.names=c("person", "dialogue"), dash = "(pause)")
left.just(rm_row(dat, 1, c("[Cro", "[St")), 2)
The bracketX
and bracketXtract
functions
examp2 <- examp2 <- structure(list(person = structure(c(1L, 2L, 1L, 3L), .Label = c("bob",
"greg", "sue"), class = "factor"), text = c("I love chicken [unintelligible]!",
"Me too! (laughter) It's so good.[interupting]", "Yep it's awesome {reading}.",
"Agreed. {is so much fun}")), .Names = c("person", "text"), row.names = c(NA,
-4L), class = "data.frame")
examp2
bracketX(examp2$text, 'square')
bracketX(examp2$text, 'curly')
bracketX(examp2$text)
examp2
bracketXtract(examp2$text, 'square')
bracketXtract(examp2$text, 'curly')
bracketXtract(examp2$text)
paste2(bracketXtract(examp2$text, 'curly'), " ")