Skip to content

Releases: trinker/qdap

qdap Version 2.2.0

04 Oct 06:07
Compare
Choose a tag to compare

NEWS

Versioning

Releases will be numbered with the following semantic versioning format:

<major>.<minor>.<patch>

And constructed with the following guidelines:

  • Breaking backward compatibility bumps the major (and resets the minor
    and patch)
  • New additions without breaking backward compatibility bumps the minor
    (and resets the patch)
  • Bug fixes and misc. changes bumps the patch

CHANGES IN qdap VERSION 2.2.0

BUG FIXES

  • bag_o_words did not make use of the bag_o_words2 helper function that has
    finer grained control of the output. ... were ignored but now are respected.
  • fry threw an error if a group contained < 300 words but had enough text to
    generate 2 texts chunks of 100 words each, caught by S. Enrico P. Indiogine.
    The bug has been fixed as these groups are dropped and a warning given.
  • phrase_net threw an error caused by dplyr's (0.3) approach to subsetting
    columns. Proviously a vector was returned, now a tbl_df object is returned:
    tidyverse/dplyr#587. This was adtreeded by using
    explicit df[[index]] rather than df[, index].

NEW FEATURES

  • chunker added to break text, optionally by grouping variables, into equal
    chunks. The chunk size can be specified by giving number of words to be in
    each chunk or the number of chunks.

IMPROVEMENTS

all_words gains char.keep and char2space arguments to enable retention
of characters and multi word phrases. These features are passed to
freq_terms as well. Suggestd by stackoverflow's lawyeR
(http://stackoverflow.com/a/26162401/1000343).

CHANGES

  • rm_url has been moved into its own canned regex pattern extraction/replacer
    package named qdapRegex.
  • name2sex now uses the gender package to predict sex. This makes the
    function slightly slower but much more accurate than previous versions.
    Because of this increased accuracy and dependence on gender, the arguments
    pred.sex, fuzzy.match, and database are no longer necessary and have
    been removed.

CHANGES IN qdap VERSION 2.1.1

BUG FIXES

  • syllable_count returned the sentence (recycled) in the words column of the
    output. This behavior has been fixed. See GitHub issue #188 for details.
  • syn returned antonyms for some words. This was caused by the dictionary:
    qdapDictionaries::key.syn contained antonyms and elemets the were error
    messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)
  • The pres_debates2012 data set contained three errors in speech attribution.
    This has been corrected and the turn of talk (tot) as well.
  • word_stats would throw an error if no poly-syllable words existed. This has
    been corrected (reported by Nicolas Turenne).

NEW FEATURES

  • qdap_df and %&% added to mimic some of the functionality of dplyr's
    tbl_df and chaining pipe in a more specific, less flexible, qdap oriented
    way.
  • Text added to view and change the text.var attribute of a data.frame of the classqdap_df`.
  • cumulative generic method added to view cumulative scores over time.
  • formality picks up a cumulative method.
  • polarity picks up a cumulative method.
  • end_mark picks up a class (end_mark), plot method, and a cumulative
    method.
  • syllable_sum, polysyllable_sum, and combo_syllable_sum pick up a
    class, plot method, and a cumulative method.
  • wfm becomes a generic method currently applied to a text.var that is:
    character, factor (coerced to character), or wfdf.
  • unbag added as a compliment to bag_o_words and friends for undoing string
    splitting. A convenience wrapper for paste(collapse = " ").
  • as.Corpus.TermDocumentMatrix, as.Corpus.DocumentTermMatrix, and
    as.Corpus.wfm added to convert a matrix format to a tm::Corpus.
  • exclude becomes a generic method for various classes. Functionality is the
    same but with improved code readability.
  • check_spelling_interactive, check_spelling, which_misspelled, and
    correct allow the user to identify potentially misspelled words and
    optionally suggest replacements.
  • random_data & random_sent added to generate random sentence data sets and
    vectors.
  • comma_spacer added to ensure strings with commas contain a space after them.
  • check_text added to identify potential problems in text.
  • replace_ordinal added to convert ordinal representations of 1 through 100 to
    strictly ordinal text (e.g., "1st" becomes "first").
  • A vignette: Cleaning Text & Debugging was added to assist users with
    cleaning and debugging problems in qdap.
  • pronoun_type, and subject_pronoun_type, object_pronoun_type added to
    examine usage of subject/object pronouns by grouping variable.

MINOR FEATURES

IMPROVEMENTS

  • wfm gains a speedup through generic classes and tm package integration
    (strip is no longer used in wfm).
  • as.tdm.character and as.dtm.character gain a speed boost with a tm
    package integration.
  • Added message to as.data.frame.Corpus for missing end-marks suggesting the
    use of: sent.split = FALSE.
  • as.Corpus familiy of functions didn't necessarily respect document names and
    sometimes used numeric sequence instead. The introduction of a reader via
    tm::readTabular has fixed this.
  • sentSplit now gives warnings for text that may contain anomalies such as:
    non-ASCII characters, factors, missing punctuation, empty cells, and no
    alphabetic characters found.
  • read.transcript now gives a warning when reading from a .docx file and the
    separator (sep) used is still found in the text as this may indicate the
    data did not split correctly.
  • dispersion_plot now takes a named list of vectors of terms as the argument to
    match.terms. The vectors are combined as a unified theme named with the
    names of the list supplied to match.terms.

CHANGES

  • as.data.frame.Corpus's default value for sent.split is now FALSE.
  • The state column in the qdap::DATA2 data-set is now character (previously
    factor).

CHANGES IN qdap VERSION 2.1.0

BUG FIXES

  • new_project did not copy the .Rprofile over into the new project. This has
    been fixed. Reference issue #184.
  • sentiment_frame coerced words to factor. stringsAsFactors = FALSE has
    been added to prevent this.
  • polarity did not work on > 1 grams due to a bug in sentiment_frame
    converting character to factor (thanks for the find @chewth). See GitHub
    issue #185 for details.

NEW FEATURES

  • unique_by added to allow the user to find terms unique to individual
    elements of a grouping variable.
  • build_qdap_vignette replaces the temporary place holder version of the
    Introduction to qdap vignette. This function will replace the (1) HTML,
    (2) source, & (3) R code found in browseVignettes(package = 'qdap').

MINOR FEATURES

  • sub_holder picks up a alpha.type argument that allows the user to specify
    whether alpha or numeric keys should be used.
  • replace_number picks up a remove argument that removes numbers from text.

IMPROVEMENTS

  • qheat becomes a generic method. This means some of the internal function
    class checking has been moved to individual methods for those classes.
    Additionally, qheat now works with logical matrices/data.frames.
  • The tm package compatibility functions have been renamed in a more R-ish
    way and take the form of generic methods for specific classes. For example,
    df2tm_corpus becomes as.Corpus. Here is a complete list of changes:
    • df2tm_courpus is now as.Corpus
    • tm_corpus2df is now as.data.frame
    • as.wfm is now a generic method
    • tm_corpus2wfm is now as.wfm
    • tm2qdap is now as.wfm
    • tdm is now as.tdm or as.TermDocumentMatrix
    • dtm is now as.dtm or as.DocumentTermMatrix

CHANGES

  • colsplit2df and colpaste2df no longer convert character columns to factor.
  • df2tm_corpus is deprecated. It will be removed in a subsequent version of
    qdap. Use as.Corpus instead.
  • tm_corpus2df is deprecated. It will be removed in a subsequent version of
    qdap. Use as.data.frame instead.
  • tm2qdap is deprecated. It will be removed in a subsequent version of
    qdap. Use as.wfm instead.
  • tm_corpus2wfm is deprecated. It will be removed in a subsequent version of
    qdap. Use as.wfm instead.
  • tdm is deprecated. It will be removed in a subsequent version of qdap.
    Use as.tdm or as.TermDocumentMatrix instead.
  • dtm is deprecated. It will be removed in a subsequent version of qdap.
    Use as.dtm or as.DocumentTermMatrix instead.
  • The Introduction to qdap .Rmd vignette has been moved to an internal
    directory. The HTML version is not built by default. This saves CRAN space
    and time checking the package source. The file has been replaced with a
    temporary place holder that contains instructions for building the actual
    vignette. The user may also use the `bui...
Read more

qdap Version 2.1.1

02 Aug 14:52
Compare
Choose a tag to compare

CHANGES IN qdap VERSION 2.1.1

BUG FIXES

  • syllable_count returned the sentence (recycled) in the words column of the
    output. This behavior has been fixed. See GitHub issue #188 for details.
  • syn returned antonyms for some words. This was caused by the dictionary:
    qdapDictionaries::key.syn contained antonyms and elemets the were error
    messages (character). This has been fixed. Reference issue #190. (Jingjing Zou)
  • The pres_debates2012 data set contained three errors in speech attribution.
    This has been corrected and the turn of talk (tot) as well.
  • word_stats would throw an error if no poly-syllable words existed. This has
    been corrected (reported by Nicolas Turenne).

NEW FEATURES

  • qdap_df and %&% added to mimic some of the functionality of dplyr's
    tbl_df and chaining pipe in a more specific, less flexible, qdap oriented
    way.
  • Text added to view and change the text.var attribute of a data.frame of the classqdap_df`.
  • cumulative generic method added to view cumulative scores over time.
  • formality picks up a cumulative method.
  • polarity picks up a cumulative method.
  • end_mark picks up a class (end_mark), plot method, and a cumulative
    method.
  • syllable_sum, polysyllable_sum, and combo_syllable_sum pick up a
    class, plot method, and a cumulative method.
  • wfm becomes a generic method currently applied to a text.var that is:
    character, factor (coerced to character), or wfdf.
  • unbag added as a compliment to bag_o_words and friends for undoing string
    splitting. A convenience wrapper for paste(collapse = " ").
  • as.Corpus.TermDocumentMatrix, as.Corpus.DocumentTermMatrix, and
    as.Corpus.wfm added to convert a matrix format to a tm::Corpus.
  • exclude becomes a generic method for various classes. Functionality is the
    same but with improved code readability.
  • check_spelling_interactive, check_spelling, which_misspelled, and
    correct allow the user to identify potentially misspelled words and
    optionally suggest replacements.
  • random_data & random_sent added to generate random sentence data sets and
    vectors.
  • comma_spacer added to ensure strings with commas contain a space after them.
  • check_text added to identify potential problems in text.
  • replace_ordinal added to convert ordinal representations of 1 through 100 to
    strictly ordinal text (e.g., "1st" becomes "first").
  • A vignette: Cleaning Text & Debugging was added to assist users with
    cleaning and debugging problems in qdap.
  • pronoun_type, and subject_pronoun_type, object_pronoun_type added to
    examine usage of subject/object pronouns by grouping variable.

MINOR FEATURES

IMPROVEMENTS

  • wfm gains a speedup through generic classes and tm package integration
    (strip is no longer used in wfm).
  • as.tdm.character and as.dtm.character gain a speed boost with a tm
    package integration.
  • Added message to as.data.frame.Corpus for missing end-marks suggesting the
    use of: sent.split = FALSE.
  • as.Corpus familiy of functions didn't necessarily respect document names and
    sometimes used numeric sequence instead. The introduction of a reader via
    tm::readTabular has fixed this.
  • sentSplit now gives warnings for text that may contain anomalies such as:
    non-ASCII characters, factors, missing punctuation, empty cells, and no
    alphabetic characters found.
  • read.transcript now gives a warning when reading from a .docx file and the
    separator (sep) used is still found in the text as this may indicate the
    data did not split correctly.
  • dispersion_plot now takes a named list of vectors of terms as the argument to
    match.terms. The vectors are combined as a unified theme named with the
    names of the list supplied to match.terms.

CHANGES

  • as.data.frame.Corpus's default value for sent.split is now FALSE.
  • The state column in the qdap::DATA2 data-set is now character (previously
    factor).

qdap Version 2.1.0

15 Jun 20:21
Compare
Choose a tag to compare

CHANGES IN qdap VERSION 2.1.0

BUG FIXES

  • new_project did not copy the .Rprofile over into the new project. This has
    been fixed. Reference issue #184.
  • sentiment_frame coerced words to factor. stringsAsFactors = FALSE has
    been added to prebent this.
  • polarity did not work on > 1 grams due to a bug in sentiment_frame
    converting character to factor (chewth). See GitHub issue #185 for details.

NEW FEATURES

  • unique_by added to allow the user to find terms unique to individual
    elements of a grouping variable.
  • build_qdap_vignette replaces the temporary place holder version of the
    Introduction to qdap vignette. This function will replace the (1) HTML,
    (2) source, & (3) R code found in browseVignettes(package = 'qdap').

MINOR FEATURES

  • sub_holder picks up a alpha.type argument that allows the user to specify
    whether alpha or numeric keys should be used.
  • replace_number picks up a remove argument that removes numbers from text.

IMPROVEMENTS

  • qheat becomes a generic method. This means some of the internal function
    class checking has been moved to individual methods for those classes.
    Additionally, qheat now works with logical matrices/data.frames.
  • The tm package compatibility functions have been renamed in a more R-ish
    way and take the form of generic methods for specific classes. For example,
    df2tm_corpus becomes as.Corpus. Here is a complete list of changes:
    • df2tm_courpus is now as.Corpus
    • tm_corpus2df is now as.data.frame
    • as.wfm is now a generic method
    • tm_corpus2wfm is now as.wfm
    • tm2qdap is now as.wfm
    • tdm is now as.tdm or as.TermDocumentMatrix
    • dtm is now as.dtm or as.DocumentTermMatrix

CHANGES

  • colsplit2df and colpaste2df no longer convert character columns to factor.
  • df2tm_corpus is deprecated. It will be removed in a subsequent version of
    qdap. Use as.Corpus instead.
  • tm_corpus2df is deprecated. It will be removed in a subsequent version of
    qdap. Use as.data.frame instead.
  • tm2qdap is deprecated. It will be removed in a subsequent version of
    qdap. Use as.wfm instead.
  • tm_corpus2wfm is deprecated. It will be removed in a subsequent version of
    qdap. Use as.wfm instead.
  • tdm is deprecated. It will be removed in a subsequent version of qdap.
    Use as.tdm or as.TermDocumentMatrix instead.
  • dtm is deprecated. It will be removed in a subsequent version of qdap.
    Use as.dtm or as.DocumentTermMatrix instead.
  • The Introduction to qdap .Rmd vignette has been moved to an internal
    directory. The HTML version is not built by default. THis saves CRAN space
    and time checking the package source. The file has been replaced with a
    temporary place holder that contains instructions for building the actual
    vignette. The user may also use the build_qdap_vignette directly.
  • qdap incorporates the chanegs from the tm package version: 0.6:
    http://cran.r-project.org/web/packages/tm/news.html Reference issue #187.

qdapTools Version 2.0.0.b

31 May 15:25
Compare
Choose a tag to compare

CHANGES IN qdap VERSION 2.0.0

The qdapTools package now houses several former qdap functions. While
qdapTools is a Dependency and all of these functions will be accessible to
the qdap user there is a break in backward compatibility if these functions
are included in code. For this reason this release is a major bump of qdap.

BUG FIXES

  • replace_number did not replace single digits numbers. Spotted by Ben Bolker.
    This behavior has been fixed and unit testing added for this function. See
    issue #178.

NEW FEATURES

  • sub_holder added; this function holds the place for particular character
    values, allowing the user to manipulate the vector and then revert the place
    holders back to the original values.
  • Network method added to make network plots of select qdap objects.
  • qtheme, theme_nightheat, theme_duskheat, theme_norah,theme_cafe, theme_grayscale,theme_badkitchen, andtheme_hipsteradded to style Network` plots.
  • polarity picks up a Network method.
  • formality picks up a Network method.
  • qdap officially begins utilizing the testthat package for unit testing,
    though only a few functions have begun the process, more will be added over
    time.

MINOR FEATURES

IMPROVEMENTS

CHANGES

  • The qdapTools package now houses the following former qdap functions:
    hash, %ha%, hash_look, hms2sec, id, lookup, %l%, %l+%, %l*%,
    repo2github, sec2hms, text2color, url_dl, v_outer, list2df,
    matrix2df, vect2df, list_df2df, list_vect2df, counts2list,
    vect2list, & mtabulate. These functions will continue to be available to
    qdap users in interactive mode (qdapTools is a Dependency and thus these
    functions are loaded into the workspace by default). This will allow this
    bundle of functions to be used outside of qdap without calling the larger qdap
    package per the request of Kirill Muller (see issue #165).
  • As schedulaed the dissimialrity function has been removed from the qdap
    package to avoid conflict with the tm package. Use Dissimilarity function
    instead.

qdap Version 2.0.0

25 Apr 04:31
Compare
Choose a tag to compare

Initial 2.0.0 bump:

CHANGES IN qdap VERSION 2.0.0

The qdapTools package now houses several former qdap functions. While
qdapTools is a Dependency and all of these functions will be accessible to
the qdap user there is a break in backward compatability if these functions
are included in code. For this reason this release is a major bump of qdap.

CHANGES

  • The qdapTools package now houses the following former qdap functions:
    hash, %ha%, hash_look, hms2sec, id, lookup, %l%, %l+%, %l*%,
    repo2github, sec2hms, text2color, url_dl, v_outer. These functions
    will continue to be available to qdap users in interactive mode (qdapTools
    is a Dependency and thus these functions are loaded into the workspace by
    default). This will allow this bundle of functions to be used outside of
    qdap without calling the larger qdap package per the request of Kirill Muller
    (see issue #165).
  • The dissimialrity function has been removed from the qdap package to avoid
    conflict with the tm package. Use Dissimilarity function instead.

qdap Version 1.3.6

24 Apr 20:49
Compare
Choose a tag to compare

CHANGES IN qdap VERSION 1.3.6

MINOR FEATURES

  • polarity picks up a constrain argument that constrains the polarity values
    to be between -1 and 1.

IMPROVEMENTS

  • polarity's equation now uses primes on the de-amplifiers before they're
    confined to be >= -1. This avoids confusion in the indicator function that
    took the de-amplifiers variable and returned the same variable.
  • dist_tab's frequency columns used a capital F in Freq. This was not
    consistent across all column names and has been changed to lower case.

CHANGES

  • polarity_frame is deprecated and will be removed in a subsequent release.
    Please use sentiment_frame instead.