-
Notifications
You must be signed in to change notification settings - Fork 43
Import Microsoft Word Transcript Into R : Longer Safer Method
trinker edited this page Aug 22, 2012
·
4 revisions
If your transcripts are in a Microsoft Word format this tutorial will demonstrate one procedure for cleaning and importing your data into R for use with qdap
.
###The following video demonstrates how to clean a Microsoft Word based transcript and read it into R. ------------------------INSERT VIDEO UPON APPROVAL---------------
name | rich char | replacement |
---|---|---|
ellipsis | … | ... or (pause) |
left curly quote | “ | |
right curly quote | ” | |
left curly apostrophe | ‘ | ' |
right curly apostophe | ’ | ' |
en dash | – | ... or (pause) |
em dash | — | ... or (pause) |
bracket types | names |
---|---|
<text> | angle |
(text) | round |
{text} | curly |
[text] | square |
library(qdap);library(gdata)
#doc is dependant on the name of the researcher's document
doc <- "TCH 7 Pre-data Les 2, Year 1, 1-15-09.csv"
dat1 <- read.csv(doc, header=FALSE, strip.white = TRUE, sep=",",
as.is=FALSE, na.strings= " ")
truncdf(dat1, 80)
htruncdf(dat1, 15, 80)
htruncdf(dat1)
left.just(htruncdf(dat1, 15, 80), 2)
The bracketX
and bracketXtract
functions
examp2 <- examp2 <- structure(list(person = structure(c(1L, 2L, 1L, 3L), .Label = c("bob",
"greg", "sue"), class = "factor"), text = c("I love chicken [unintelligible]!",
"Me too! (laughter) It's so good.[interupting]", "Yep it's awesome {reading}.",
"Agreed. {is so much fun}")), .Names = c("person", "text"), row.names = c(NA,
-4L), class = "data.frame")
examp2
bracketX(examp2$text, 'square')
bracketX(examp2$text, 'curly')
bracketX(examp2$text)
examp2
bracketXtract(examp2$text, 'square')
bracketXtract(examp2$text, 'curly')
bracketXtract(examp2$text)
paste2(bracketXtract(examp2$text, 'curly'), " ")