Skip to content

Code Matrix [cm] Family of Functions

trinker edited this page Sep 22, 2012 · 22 revisions

Often we want to code transcripts according to some coding scheme and then use this information to generate descriptive statistics, create visualizations and produce statistical analysis. The cm family of qdap functions can assist in converting coded transcripts into a code matrix.

The following tutorial will walk the user through the use of cm_blank, cm_transform, cm_range.temp, cm_fill, cm2long and cm_combine.

###The basic process for creating a code matrix (cm)

  1. use cm_blank with a list of codes to generate a blank code matrix
  2. either fill in a .csv coded file and read in or use cm_fill to feed a list of range codes
  3. use the code matrix for analysis; for analysis requiring long format use cm2long

###The following video demonstrates how to use the code matrix (cm) family of funtions. http://youtu.be/9GxnEhsRINk

To download the .pdf of the truncated Romeo and Juliet transcript used in this analysis click here.

R code used in the code matrix (cm) family functions tutorial

library(qdap)                                                       #load qdap
browseURL("https://dl.dropbox.com/u/61803503/SampsonGregory.pdf")   #the coded transcript
dat <- head(rajSPLIT, 5)[, -8]                                         #create a fake data set

codes <- qcv(nm, sr, cr, spr, fc)                                   #our decided upon codes
(dat1 <- cm_blank(dat, "dialogue", codes = codes))                  #create a blank code matrix
cm_blank(dat, "dialogue", codes = codes, transpose = TRUE)[, 1:12]  #transposed version
cm_blank(dat, "dialogue", codes = codes, csv = TRUE)                #write standard to csv
cm_blank(dat, "dialogue", codes = codes, csv = TRUE,                #write transformed to csv
    transpose = TRUE, file.name = "test")[, 1:12]

#########################
# TWO METHODS TO CODING #
#===========================================================
#method 1: dummy code in a csv either long or wide format

#read in the coded standard csv
dat2 <- read.csv(file = "http://dl.dropbox.com/u/61803503/test1.csv", strip.white = TRUE, as.is=FALSE)

#read in the coded transformed csv
dat3 <- read.csv("http://dl.dropbox.com/u/61803503/test2.csv", strip.white = TRUE, as.is=FALSE)
cm_transform(dat3, "dialogue")  #transform the transformed dataframe
cm_range.temp(codes)

#method 2: use the word.num column/row to range code 
coded <- list(
    nm = 1,
    sr = c(3, 16, 36),
    cr = c(5, 19, 23),
    spr = c(27, 30),
    fc = c(8:15, 25:40)
)

dat.fill <- cm_fill(dat1, coded) #now feed the range codes to cm_fill

#===========================================================

#Using cm_combine to combine codes into a parent node
dat.fill <- cm_combine(dat.fill, combined.columns = list(other = c("nm", "fc"),  refs = 10:12))

#        OR

dat.fill <- cm_combine(dat.fill, 
    combined.columns = list(
        other = c("nm", "fc"),  
        refs = 10:12
    )
)

#====
#Using cm2long to make a Gantt plot 
#First use gantt with all codes and grouping variables
dat.gantt <- with(dat.fill, gantt(text, 
    list(person, tot, act, sex, fam.aff, died, word.num, nm, 
    sr, cr, spr, fc, other, refs), plot = FALSE))

#take just the last two columns (start end) and splice onto the filled
#matrix data frame; use cm2long with codes to reshape the data
NEW <- cm2long(data.frame(dat.fill, dat.gantt[, c("start", "end")]), 
    code.vars = qcv(terms="nm sr cr spr fc"), no.code="nc")

NEW2 <- cm2long(data.frame(dat.fill, dat.gantt[, c("start", "end")]), 
    code.vars = qcv(terms="other refs"), no.code="nc")

library(ggplot2)
gantt_wrap(NEW, "code", fill.var="person")
#====

gantt_wrap(NEW, "code", facet.var="person")
gantt_wrap(NEW, "code", facet.var="person", fill.var="person")
#====

gantt_wrap(NEW2, "code", facet.var="person")
gantt_wrap(NEW2, "code",  fill.var="person")

###Gantt plots produced from the script above Gantt Plot 1 Gantt Plot 2 Gantt Plot 2