Skip to content

Abstract Parsing in Meta Analysis

trinker edited this page Sep 17, 2012 · 27 revisions

The process of parsing through abstracts for exclusion/inclusion can be made more automated with thoughtful researcher choices for abstract parsing vocabulary and the qdap package's termco.a function.

The steps to importing a references library into R and parsing with termco.a depend on researcher choices and the citation program the researcher uses (Endnote, Zotero and Jabref [bibtex] are some of the key programs researchers use). The following series of videos and scripts are meant as a choose your own destiny style tutorial rather than a linear sequence of steps to pass through. The following list is a general pathway a researcher may take (the eventual goal is to get your reference file into a .csv format):

The following programs are open source tools used in this tutorial: Zotero, Jabref, Mendeley

###General Outline of the Process

  1. If you use Endnote export your Endnote (.enl) library to Zotero (.ris) [got to step 2]

  2. If you use Zotero export your Zotero (.ris) library to bibtex (.bib) format [go to step 3]

  3. If you use bibtex (.bib; I use Jabref) export to csv and then import into R

  4. Make sure you gathered from non traditional data bases as well (eg google Scholar, ProQuest Dissertations and Theses) as these may be sources of unpublished work and dissertations. Because often non significant results are never published and wind up on google scholar, using these types ofwork in your analysis can help reduce publication bias.

  5. Remove duplicates. Mendeley is particularly good at removing duplicates. For a series of videos on using Mendeley click here

###Gathering Reference Search Results from Various Data Bases Exporting From Database to Zotero (.ris)
Exporting From google Scholar to Zotero (.ris)
Exporting From Database to Jabref (.bib)
Exporting From ProQuest to Zotero (.ris)

###Preparing References for Importing Into R Exporting Endnote Library (.enl) to Zotero (.ris)
Exporting Zotero Library (.ris) to Jabref Library (.bib)
Exporting Jabref Library (.bib) to .csv

###Importing References Into R and Cleaning

library(qdap)
url_dl("ref_test.csv") 
options(width=10000)

#header = FALSE allows all columns to read in
x <- read.csv("ref_test.csv", header=FALSE, row.names=NULL, stringsAsFactors = FALSE)
htruncdf(x, 20)
truncdf(x)

colnames(x)[1:26] <- as.character(unlist(x[1, 1:26]))
x <- x[-1, ]; rownames(x) <- NULL   #remove first row (this was the header)

#remove any empty columns and rows
FUN <- function(x) !all(is.na(x))   #function to remove blank columns
x <- x[, sapply(x, FUN)]            #remove blank columns
#function to rm empty columns
metaclean <- function(x)gsub('\"', "", gsub("\\.(?=\\.*$)", "", x, perl=TRUE)) 
x <- rm_empty_row(x)                #remove blank rows

htruncdf(x, 20)  #use this to make decisions about which columns to keep/paste
#create an index of the columns containing the abstract and key terms pieces
index <- which(colnames(x) %in% qcv(V27, V31))   
truncdf(x[, index])

z <- data.frame(id=1:nrow(x), x[, 1:which(colnames(x) == "Year")], 
    abstract = metaclean(scrubber(paste2(x[, index[1]:index[2]]))), 
    stringsAsFactors = FALSE) 

truncdf(z, 10)                      #view it
z$abstract                          #the abstract

#remove symbols etc
parse.symb <- c("{", "}", "(", ")", "/", "-")  #vector of removal terms
z$abstract <- mgsub(parse.symb, " ", z$abstract)
v <- split(z, scrubber(z$abstract )%in% c("", " "))     #separate files with blank abstracts

# delete("ref_test.csv")            #delete the sample csv file

###Using qdap to Analyze Abstracts This is a continuation from the Importing References Into R and Cleaning script.

Video to Accompany the Script Below

library(qdap)
url_dl("ref_test_clean.csv") 
v <- list(read.csv("ref_test_clean.csv", 
    row.names=NULL, stringsAsFactors = FALSE), NA)
options(width=10000)

#generate word lists (dictionaries) to exclude/include terms
matches <- list(
    gender = c(" male", " female", " women", " man ", " men ", " boy", " girl"),
    brain = c(" brain", " cogni", " process", " mental"),
    reading = c(" read ", " reads ", " reading ", " comprehen", " strat", " skill"),
    teach = c(" teach", " taught", " instruct", " pedagogy")
)

a <- with(v[[1]], termco.a(abstract, id, match.list  = matches, 
    short.term=TRUE, ignore.case = TRUE))
a
names(a)
head(a$raw, 20)

b <- with(v[[1]], termco.a(abstract, id, match.list  = unlist(matches), 
    short.term=TRUE, ignore.case = TRUE))
b
head(b$raw, 20)
termco2mat(b$raw)
v[[2]]  # <- find abstracts for these ones (they were missing)            

htruncdf(a$raw)
All <- rowSums(a$raw[, 3:6]) > 0
brain <- a$raw[, 4] > 0
reading <- a$raw[, 5] > 0

#Articles that contain > 0 for each category
Reduce(`+`, lapply(a$raw[, 3:6], function(x) x > 0))
#notice that no article contained every category (n = 4)

p <- z[brain, ]
truncdf(p)
write.table(p, file = "foo.csv",  sep = ",", col.names = T, 
    row.names=F, qmethod = "double") 

truncdf(z[All, ])

# delete("ref_test_clean.csv")            #delete the sample csv file

###Using qdap to Classify an Article as Qualitative or Quantitative

Video to Accompany the Script Below

code to come