01 Sep 09:39

ablaette

8859d8d

Nested Boxes Latest

Latest

New features

Using the corpus class throughout is an opportunity to keep the corpus ID
together with the registry directory of a corpus. And as we are able now to
handle corpora defined in different registry files, the temporary registry
directory is not necessary any more. It still exists, yet only for temporary
corpora and corpora that are described by registry files that cannot be
modified, i.e. corpora shipped in packages. The test corpus of the polmineR
package is an important respective scenario.
get_token_stream() now has an argument min_length.
registry_*() functions are superseded by RcppCWB::corpus_* functions and
throw a warning that they are deprecated.
The REUTERS corpus is not included in the package any more: There was an
identical copy of the REUTERS corpus included in the RcppCWB package. All
examples and unit tests now use use(pkg = "RcppCWB", corpus = "REUTERS") to
make the REUTERS corpus available.
size() works for partition/subcorpus with s-attribute that is a child
of the s-attribute the object is based on #216.
The trim()-method for context objects has a new argument fn for
supplying a (trimming) function to be applied all match contexts.
A new s-attribute "protocol_date" has been added to sample corpus
"GERMAPARLMINI", so that sample data for nested corpus data is available. To
prevent confusion between s-attributes "protocol_date" (at protocol-level) and
"date" (at speaker-level), argument s_attribute_date is stated explicitly in
all examples.
Method size() has been refactored to work with nested corpora.
Method encoding() and replace method encoding<- are defined for call
and quosure objects to get and adjust the encoding, replacing a previously
unexported function .recode_call().
The subset() methods for corpus and subcorpus objects now handle
expressions for subsetting as quosures, laying the ground to program against
subset(), see respective update of the examples, #212.
Functionality for indexing bundle objects with single square brackets is
developed now. Indexing with double brackets, suppling multiple values for i
is deprecated. The aim is a consistent behavior that a bundle indexed by [
will always return a bundle, and indexing with [[ always gets a single object
from the list of objects. #214

Minor improvements

The use() function now has an additional argument corpus to specify which
corpus from a package shall be loaded (#138).
The get_token_stream()-method for partition_bundle objects is more memory
efficient (no exhaustion for big corpora) and faster.
Significantly improved performance of split()-method for corpus objects.
The split()-method for corpus objects offers progress bar.
as.speeches() for corpus objects has new argument subset, offering a
significantly faster approach than the method for subcorpus objects in many
cases.
The size() method will return NA and issue a telling warning if the slot
corpus and registry_dir of the corpus object are not filled #222.
get_token_stream() will return list of integer values if decode is
TRUE (#213).
After applying trim() on a context object using arguments positivelist
or negativelist, the count slot as reported by length was not updated.
Fixed. (#220)
The enrich() method for context objects has a new argument stat for
creating / updating the data.table in the slot stat.
Method subset() for subcorpus objects has been debugged to work with
nested corpora.
New option polmineR.mdsub configures substitutions that are applied on
markdown documents to prevent presence of characters that would be
misinterpreted as formatting instructions. Fixes #166.
The messages issued by check_cqp_query() now include a hint that argument
check can be used to omit checking the CQP syntax to prevent false positives.
Addresses #171.

Bug fixes

The ability of cooccurrences() (and context()) to process more than one
p-attribute has been lost temporarily. Fixed. #208.
Removed a bug for hits() method for partition objects #215.
After applying trim() on a context object using arguments positivelist
or negativelist, the count statistics reported in the stat slot were not
updated. Fixed. (#220)
Structural attributes do not disappear any more after adding tooltips to a
kwic object #218.
Method subset() would not work reliably with argument regex if more than
one expression is passed #212. Fixed.
terms() did not work for subcorpus objects. Fixed. #209
When applying as.speeches() on a subcorpus, the date may have been missing
from the object names. Fixed. #219
Fixed an issue that minNchar in the noise() method would work exactly the
way opposite to the way intended #211.
The slot registry_dir of a cooccurrences_bundle derived from a
partition_bundle was not filled, resulting in an error of the show()-method
for the cooccurrences_bundle. Fixed #222.

Documentation

The documentation of the cooccurrences() method now includes example code
for creating a table using DT::datatable() with buttons for exporting tables
(to Excel, for instance).

Assets 2

05 May 21:16

ablaette

v0.8.6

7734c2a

Yellow Submarine

New Features

The dispersion() method now accepts an argument fill, a logical value to
explicitly control whether (#160) zero matches for a value of a structural
attribute should be reported. The performance of adding columns (requred only if
two structural attributes are provided) is improved substantially by using the
reference semantic of the data.table package. If many columns are added at once,
a warning issued by the data.table package is supplemented by an further
explanatory warning of the polmineR package. Filling up the data.table was
limited previously to freq = FALSE, this limitation is lifted.
The html() method is implemented for remote_subcorpus objects.
The hits() method is implemented for remote_corpus and remote_subcorpus
class (#160).
A new S4 class ranges is introduced to manage ranges of corpus positions for
query matches. This is a preparatory step to remove an inconsistency from the
hits class that mixed two very usages (getting ranges of corpus positions for
matches and getting counts).
A new S4 method ranges serves as the constructor to prepare a ranges class
object. In combination with as.data.table(), it replaces former functionality
of hits() without argument s_attribute.
The output of the hits() method is altered, making it much more consistent
than previously: The method will consistently return a hits object.
The method hits() has a new argument fill that will report zeros for
combinations of s-attributes with no matches for a query.
The argument subset for the subset method for remote_corpus objects can
now be a call (#162), this is a basis for passing vectors to OpenCPU server. -
p_attributes() implemented for remote_corpus and remote_partition.
A new regions() method (for corpus class objects to start with) returns a
regions class object with a regions matrix (slot cpos) with regions for an
s-attribute (#176).
The get_token_stream()-method for regions and matrix objects will now
accept a logical argument split. If TRUE, a list of character vectors is
returned. The envisaged use case is a fast decoding of sentences (#176).
A encoding() method has been defined if argument object is missing.
Calling encoding() will return the session character set. If it cannot be
determined using localeToCharset(), a UTF-8 session charset will be assumed.
Internally, encoding() replaces a direct call of localeToCharset() to avoid
errors that have occurred on GitHub Actions with Ubuntu 20.04 (#188).
If the session character set cannot be guessed by localeToCharset() (NA
return value), a startup message will issue a warning that 'UTF-8' is assumed
(#188).
The size() method is now able to handle nested s-attributes.
The trim() method for context objects will now accept a matrix with ranges
a positivelist argument.
The highlight() method now acceps matrix objects as elements of the list
of items to be highlighted. It is treated as a set of regions, such as resulting
from cpos(). Thus it is possible to highlight matches for CQP queries.
The package now requires at least RcppCWB v0.5.2, which includes a much more
efficient worker for token contexts for the context() method.
The count()-method for partition_bundle objects failed with an opaque
error message if there were no query matches at all. There is now a check for
this scenario and the expected table is returned (zero values throughout.)
The corpus class is now a superclass for the textstat class, starting to
create a more coherent class structure in general. This is an important
preparatory step to be able to keep all registry files in the temporary registry
directory. To avoid a confusion in the class system resulting from the coerce
method from partition to corpus objects, this coerce method (defined by
setAs()) has been removed. The get_template()-method for partition objects
using this coerce method has been removed - as it inherits the method anyway, it
is not needed any more. See #201.
The kwic tab of the shiny app included in the package exposes the improved
capabilities to determine the context of a query match based on an s-attribute
(argument region) and to consider the changing value of an s-attribute as
a boundary of a context (argument boundary). New menu "boundary" and radio
buttons, conditional on presence of s-attributes "s" and/or "p".

Minor Improvements

If arguments sAttribute or pAttribute (instead of s_attribute and
p_attribute) are still used with dispersion() method, a warning is issued
declaring that the argument is deprecated.
Examples in packages that depend on polmineR would have faced the issue that
loading/re-loading the package in several examples would not be posssible as the
mechanism of cleaning up between examples would trigger a removal of polmineR's
temporary directories but not the re-creation. Removing temporary files is now
moved from polmineR's .onDetach() to .onUnload() (#164).
Significant improvement of the performance of the as.phrases() method (#172).
The as.corpusEnc() auxiliary function will now check whether non-convertible
characters lead to an NA result and issue a warning how this warning can be
avoided (#151).
Significant performance improvement of the context() method for matrix
objects if arguments left and right are named integer vectors. All
context() benefit from the improved performance of this worker for creating
contexts for query matches.
New coerce-method to derive matrix with ranges from a context object.
The enrich() method for context objects will now perform an in-place
operation when adding new s-attributes.
The as.cqp() function includes arguments check and warn for running
check_cqp_query() on queries.
The context() method for matrix objects includes a new argument boundary
and relies on a new functionRcppCWB::region_matrix_context().
Default value of argument verbose of context()-methods is now FALSE.
The as.corpusEnc() auxiliary function now includes a test whether input
character vector includes unexpected encodings and issues a warning if this is
the case.
The cpos() method will now check for accidental leading and/or trailing
whitespace and remove it for token lookup. Note that hits(), count() and
dispersion() will report queries without removing whitespace.
Internals of the count()-method for partition_bundle objects will be much
more efficient when many columns with zero matches need to be added. The
implementation avoids a data.table warning when the bulk action of adding new
columns exceeds the number of columns reserved by data.table objects.
The DESCRIPTION files does not state "LazyData: yes" any more, as the package
does not have a data directory.
Typo in messages of trim() is removed (#197).
encoding() relies on l10n_info() before using localeToCharset() as a
matter of performance and robustness (#196).
Class corpus has a new slot registry_dir. This is a preparatory step that
will facilitate managing corpora described by registry files in different
registry directories.
Constructor corpus() for corpus-class objects has an argument
registry_dir that will be required to distinguish corpora described by
registry files in different registry directories.
The package now relies on the the fs package to handle directories and paths.
Slots in S4 classes are not fs_path classes.
Internally, functions registry_get_home() and registry_get_encoding() have
been replaced by RcppCWB functions cl_charset_name() and corpus_data_dir()
with equivalent result, but faster due to immediate access to C representation
of the corpus.
The corpus() method will deduce the registry directory from the C representation
of the corpus if possible.
An inefficiency in the implementation of as.markdown() has been removed,
making fulltext display (using read() or html()) much faster.
Calling corpus() without any arguments now returns an expanded data.frame
reporting all slots of the corpus class objects, skipping only the data
directory of the corpus.
The cpos() method for matrix objects that turns a matrix with corpus
positions into a vector of integer values now relies on a C-level
implementation newly included in the RcppCWB package, that is significantly
faster than the best possible implementation in R.
The table generated by kwic() shows row numbers, which is convenient
when referring to specific rows (#184).
The as.cqp() now checks whether argument query meets the expectation that
it is a query (#191).
The method make_region_matrix(), which has been used internally only, has
been removed. RcppCWB::s_attr_regions() replaces the functionality.
The as.speeches() method had not yet been implemented for nested corpora. A
limited rewrite makes this work now (#198).
Inconsistencies and unnecessary limitations of the get_token_stream() method
for partition_bundle objects have been addressed: Multiple p-attributes can be
used without providing phrases at the same time (#142) and using the subset
argument does not depend on using phrases either (#141).
The as.sparseMatrix() method is now also defined for DocumentTermMatrix
objects (was available previously ony for TermDocumentMatrix objects).
If a vector of queries is named, theses named are now used consistently by the
hits() method (#195).
get_type() for subcorpus_bundle returns NULL if no type is defined as a
matter of consistency (#169).
If an expression for subsetting a corpus/subcorpus includes invalid
s-attributes, the warning is telling and NULL is returend (#179).
The cooccurrences option...

Assets 2

29 Sep 15:04

PolMine

v0.8.5

5da8861

Putty Knive

New Features

A new decode() method for data.table objects shall serve as a more user-friendly access to the efficiency of the RcppCWB::cl_cpos2str() function.
The data.frame returned when calling corpus() will now include a column with the encoding of the corpus.

Bug fixes

The warn argument of the get_template()-method remained unused, resulting in a warning message even if warn was FALSE, resulting in a set of warning messages when calling corpus(). The argument is used as intended now and defaults to FALSE.
The as.markdown()-method for subcorpus objects now uses an (internal) default template accessible via polmineR:::default_template, if no template is defined for a corpus.
The registry_get_encoding() function returned a length-one character vector if the regular expression to extract the charset corpus property did not yield a match. To prevent errors, it now returns "latin1" as the CWB standard encoding (#159).

Assets 2

23 Jul 11:14

PolMine

v0.8.4

3262d21

Unicorn Dream

Minor Improvements

The knit_print()-method for textstat objects does not accept the three dots argument any more. As an installation of pandoc is necessary to include resulting htmlwidget in an html document, the method will check now whether pandoc is available. If not, a formatted data.table is returned.
The knit_print()-method for kwic objects does not have the pagelength argument any more as it has been unused. The pagelength is controlled by the option polmineR.pagelength. Internally, the method will call the method for the textstat superclass of the kwic class, which is newly robust against a missing installation of pandoc.
Any Unicode characters that could be detected have been removed from the documentation to avoid warnings on the CRAN Solaris test machine (#156).

Bug Fixes

The chisquare() method needs to increase the number of digits temporarily, but failed to revert to the original value as expected. One implication was, that rounding the values in data.table objects would fail, and rounding in general yielded very strange results (#155). Fixed.

Assets 2

18 Dec 09:03

PolMine

v0.8.0

6abb27c

Caterpillar Mambo

New Features

The corpus class has been put in a shape to become the default point of
departure of most workflows. All core methods are now available for the
corpus class, and have been implemented newly if necessary, e.g. show()
and size()-method. The constructor method for a corpus object, the
corpus() method, will now check whether the character vector with the corpus
ID refers to an available corpus, whether all letters are upper case and
issue informative warnings and error messages.
The s_attributes()-method for corpus objects has been reworked: It will decode
binary files directly, without reliance on the corpus library functions, which is
significantly faster.
The Corpus reference class is now obsolete after the introduction of the
S4 corpus class. To maintain the functionality not covered otherwise,
new generics get_info and show_info have been introduced and defined
for the corpus class.
Methods available for the subcorpus class have been expanded so that this
class can supersede the partition class: Methods newly available are
cpos(), count(), p_attributes(), s_attributes() get_token_stream(),
and size(). Technically, there is virtual slice-class, from which
subcorpus inherits (methods called via callNextMethod()).
A new subset()-method for the corpus and subcorpus classes to generate subcorpora
(i.e. subcorpus objects) has been introduced. It outperforms the
partition() method. The subset()-method for corpus and subcorpus objects
will be the default way to work with non standard evaluation in a manner that
feels "R-ish" (#40).
The zoom()-method that has been introduced experimentally has
been dropped again in favor of the subset()-method to get subcorpus objects
from corpus and subcorpus objects. A set of experimental methods for an
initial check of the feasibility of a non-standard evaluation approach to
the generation of subcorpora has been dropped (methods $, ==, !=,
zoom for corpus-class).
To facilitate the transition from the partition class (inheriting from
the textstat class) to the subcorpus class (inheriting from the textstat
class), there is a new coerce()-method to turn a partition object into
a subcorpus object.
A new remote_corpus-class is the basis for accessing remote
corpora. A remote_subcorpus can be derived from a remote_corpus. Methods
available for remote corpora und subcorpora remain limited at this stage.
Consolidation of the class system: For all the S4 classes in the package, multiple
contains have been checked, and multiple contains have been removed.
The subcorpus_bundle class now inherits from partition_bundle. This is not
intended to be a long-term solution, but facilitates the implementation of new
workflows based on the subcorpus class rather than the partition class.
Calling the polmineR shiny app via polmineR did not have safeguards if
the suggested packages shiny and shinythemes were not installed. Now
there will be a conditional installation of the packages required for running
the shiny app.
The somewhat odd class CorpusOrSubcorpus has been removed. The ngrams-method
now applies for corpus and subcorpus objects.
The pipe operator of the magrittr package is imported now, and magrittr has moved
from a suggested package to a required package.
The label()-method, present for a while, is superseded by a edit()-method now.
It will call a shiny gadget either using DataTables or Handsontable. The former
Labels reference class has been turned into a S4 class, because the
desired reference logic can also be achieved with a data.table in a slot of
the labels class.
The table-slot of the kwic class has been renamed as stat slot (a data.table),
so that the kwic class can now inherit from the textstat class. The
enrich()-method for objects of class kwic now includes a new argument
extra that will add extra tokens to the left of the windows for concordances so
that qualitative inspections for query hits can work with more context.
The as.TermDocumentMatrix() and the as.DocumentTermMatrix()-methods are now
also defined for kwic objects. They work exactly the same as for the context
class. To avoid having to write new methods, a new neighborhood virtual class has
been introduced. The aforementioned methods are defined for the virtual class and
are available for context and kwic class objects.
Added CQP functionality to count tab in shiny app, and to the dispersion tab.
There is now a basic implementation of get_token_stream() for a partition_bundle
object.
The Cooccurrences()-method is now available for subcorpus-objects (#88).
There is a new coerce method to turn a kwic-object into a context-object.
The neighborhood virtual class could be discarded again, and a bug could be removed
that left an enrich()-operation for kwic objects (argument p_attribute)
ineffectual (#103).

Minor changes

Added a new argument regex to the cpos()-method (for corpus objects), which
will interpret argument query as a regular expression. This may be faster than
taking query as an outright CQP query.
The configure-script in the package that would adjust paths in the registry files
for the corpora included in the package for documentation and testing purposes has
been removed. Having switched to a temporary registry directory, it has lost
its function.
The version of the data.table package now required is 1.12.2, because previous
versions did not allow adding columns to a new data.table.
Implemented the possibility to use multiple queries in dispersion-method (#92).
To keep up with the renaming of functions and arguments in the package, "sAttributes"
and "pAttributes" in the polmineR shiny app have been renamed ("s_attributes",
and "p_attributes", respectively).
The shiny app module for kwic output will not show p_attribute and positivelist
by default.
The format()-method is used to create proper output in the cooccurrences of the
shiny app.
User names that include non-ASCII characters were a persistent problem on Windows
machines (#66). The solution now is to check for non-ASCII characters in the path
to the data directory, and to use the "old" short DOS path if necessary. The worker is
a modified registry()-function.
The ordering of the table for ll-method had been somewhat mixed up, which is repaired
now. Tokens with NA values for the ll-test will show up at the end of the table.
The registry_move()-function, used only internally at this stage, is exported now
so that it can be used by other packages.
The return value of the get_token_stream()-method for regions objects was a
data.table. The behavior is now in line with the other get_token_stream() methods
The tempcorpus()-method and the tempcorpus class have been removed from the package,
having become utterly deprecated.
The summary()-method for partition-class objects has been turned into a method
for the count-class, to eliminate an inconsistency. The example of a workflow has been
moved to the documentation object for the count-class.
The browse()-method has not proven to be useful and has been removed from the package.
A new browse()-function is introduced to throw a warning, if browse should be
called nevertheless.
A refactoring of the split()-method for partition-objects improved the readability
of the code, but the performance gain is minimal.
A new kwic_bundle-class has been introduced, a list of kwic objects can be turned
into this new class using as.bundle.
The context()-method will now take again as input character vectors for the arguments
left and right to expand to the left and right boundaries of the designated
region (#87).
Rework of the way messages are printed to make it easy to implement notifications in
the shiny environment.
Default highlighting when a positivelist is supplied has been removed from the
kwic()-method. This ensures that subsequent highlighting operations can assign
new colors (#38).
Implemented feature request for dispersion() that results are reported for all
values of structural attributes, including those with zero matches. (#104)
Performance improved for the cpos-method for matrix which unfolds a matrix with regions
of corpus positions, useful for operations that require many calls.
The count-method for partition_bundle has been reworked and is much faster and more
memory efficient.
as.TermDocumentMatrix() for partition_bundle optimized to work efficiently
with large corpora.
Introduction of a context,matrix-method to have a unified auxiliary function
to create contexts.
The as.corpusEnc()-function uses the localeToCharset()-function from the utils
package to determine the charset of input strings. On RStudio Server, we have seen
cases when the return value is NA. Then it will be assumed that the locale is UTF-8.
Functionality to highlight terms in kwic display has been restored for the shiny app.

Bug fixes

Removed a bug in the context()/kwic() method that led to superfluous words in the
right context.
Removed a bug that occurred with the as.data.frame()-method for kwic-objects
when no metadata were added.
The count()-method for partition_bundle-objects did not perform iconv() if
necessary - this has been corrected.
Indexing the concord...

Assets 2

15 Jan 12:24

PolMine

v0.7.11

80eb455

Bright Side

polmineR 0.7.11

NEW FEATURES

A Cooccurrences()-method and a Cooccurrences-class have been migrated from the (experimental) polmineR.graph package to polmineR to generate and manage all cooccurrences in a corpus/partition. A cooccurrenes()-method produces a subset of Cooccurrences-class object and is the basis for ensuring that results are identical.
New functionality to make using corpora more robust when paths include special characters: There is now a temporary data directory which is a subdirectory of the per-session temporary directory. A new function data_dir() will return this temporary data directory. The use()-function will now check for non-ASCII characters in the path to binary corpus data and move the corpus data to the temporary data directory (a subdirectory of the directory returned by data_dir()), if necessary. An argument tmp added to use() will force using a temporary directory. The temporary files are removed when the package is detached.
Experimental functionality for a non-standard evaluation approach to create subcorpora via a zoom()-method. See documentation for (new) corpus-class (?"corpus-class") and extended documentation for partition-class (?"partition-class"). A new corpus()-method for character vector serves as a constructor. This is a beginning of somewhat re-arranging the class structure: The regions-class now inherits from the new corpus-class, and a new subcorpus-class inherits from the regions-class.
A new function check_cqp_query() offers a preliminary check whether a CQP query may be faulty. It is used by the cpos()-method, if the new argument check is TRUE. All higher-level functions calling cpos() also include this new argument. Faulty queries may still cause a crash of the R session, but the most common source is prevent now, hopefully.
A format()-method is defined for textstat, cooccurrences, and features, moving the formatting of tables out of the view(), and print()-methods. This will be useful when including tables in Rmarkdown documents.

MINOR IMPROVEMENTS

Startup messages reporting the package version of polmineR and the registry path are omitted now.
The functions registry() and data_dir() now accept an argument pkg. The functions will return the path to the registry directory / the data directory within a package, if the argument is used.
The data.table-package used to be imported entirely, now the package is imported selectively. To avoid namespace conflicts, the former S4 method as.data.table() is now a S3 method. Warnings appearing if the data.table package is loaded after polmineR are now omitted.
The coerce()-methodes to turn textstat, cooccurrences, features and kwic objects into htmlwidgets now set a pageLength.
New methods for partition_bundle objects: [[<-, $, $<-
Rework of indexing textstat objects.
A slot p_attribute has been added to the kwic-class; kwic()-methods and methods to process kwic-objects are now able to use the attribute thus indicated, and not just the p-attribute "word".
A new size()-method for context-objects will return the size of the corpus of interest (coi) and the reference corpus (ref).
New encoding()-method for character vector.
New name()-method for character vector.
A new count()-method for context-objects will return the data.table in the stat-slot with the counts for the tokens in the window.
The decode()-function replaces a decode()-method and can be applied to partitions. The return value is a data.table which can be coerced to a tibble, serving as an interface to tidytext (#37).
The ngrams()-method will work for corpora, and a new show()-method for textstat-object generates a proper output (#27).

BUG FIXES

Any usage of tempdir() is wrapped into normalizePath(..., winslash = "/"), to avoid mixture of file separators in a path, which may cause problems on Windows systems.
In the calculation of cooccurrences, the node has previously been included in the window size. This has been corrected.
The kwic()-method for corpora returned one surplus token to the left and to the right of the query. The excess tokens are not removed.
The object returned by the kwic()-method for character-objects method did not include the correct position of matches in the cpos slot. Corrected.
Bug removed that occurrs when context window reaches beyond beginning or end of a corpus (#48).
When generating a partition_bundle using the as.speeches()-method, an error could occur when an empty partition has been generated accidentaly. Has been removed. (#50)
The as.VCorpus()-method is not available if the tm-package has been loaded previously. A coerce method (as(OBJECT, "VCorpus")) solves the issue. The as.VCorpus()`-method is still around, but serves as a wrapper for the formal coerce-method (#55).
The argument verbose as used by the use()-method did not have any effect. Now, messages are not reported as would be expected, if verbose is FALSE. On this occasion, we took care that corpora that are activated are now reported in capital letters, which is consistent with the uppercase logic you need to follow when using corpora. (#47)
A new check prevents an error that has occurred when a token queried by the context()-method would occurr at the very beginning or very end of a corpus and the window would transgress the beginning / end of the corpus without being checked (#44).
The as.speeches()-function caused an error when the type of the partition was not defined. Solved (#57).
To deal with issues resulting from an unset locale, there is a check during startup whether the locale is unset (i.e. 'C') (#39).
There was a difficulty to generate a TermDocumentMatrix from a partition_bundle if the partitions in the partition_bundle were not named. The fix is to assign integer numbers as names to the partitions (#58).

DOCUMENTATION FIXES

Substantial rework of the documentation of the ll(), and chisquare()-methods to make the statistical procedure used transparent.
Expanded documentation for cooccurrences()-method to explain subsetting results vs applying positivelist/negativelist (#28).
Wrote some documentation for the round()-method for textstat-objects that will show up in documentation of textstat class.
Improved documentation of the mail()-method (#31).
In the examples for the decode()-function, using the REUTERS corpus replaces the usage
of the GERMAPARLMINI corpus, to reduce time consumed when checking the package.

Assets 2

01 Oct 18:00

PolMine

v0.7.10

28e5da8

Bachelor's Delight

polmineR 0.7.10

NEW FEATURES

The package now offers a simplified and seamless workflow for dictionary-based sentiment analysis: The weigh()-method has been implemented for the classes count and count_bundle. Via inheritance, it will also be available for the partition- and partition_bundle-classes. Then, a new summary()-method for partition-class objects is introduced. If the object has been weighed, the list that is returned will include a report on weights. There is an example that explains the workflow.
The partition_bundle-method for context-objects has been reworked entirely (and is working again);
a new partition-method for context-objects has been introduced. Buth steps are intended for workflows for dictionary-based sentiment analysis.
The highlight()-method is now implemented for class kwic. You can highlight words in the neighborhood of a node that are part of a dictionaty.
A new knit_print()-method for textstat- and kwic-objects offers a seamless inclusion of analyses in Rmarkdown documents.
A coerce()-method to turn a kwic-object into a htmlwidget has been singled out from the show()-method for kwic-objects. Now it is possible to generate a htmlwidget from a kwic object, and to include the widget into a Rmarkdown document.
A new coerce()-method to turn textstat-objects into an htmlwidget (DataTable), very useful for Rmarkdown documents such as slides.
A new argument height for the html()-method will allow to define a scroll box. Useful to embed a fulltext output to a Rmarkdown document.

MINOR IMPROVEMENTS

The partition_bundle-class, rather than inheriting from bundle-class directly, will now inherit from the count_bundle-class
The use()-function is limited now to activating the corpus in data packages. Having introduced the session registry, switching registry directories is not needed any more.
The as.regions()-function has been turned into a as.regions()-method to have a more generic tool.
Some refactoring of the context-method, so that full use of data.table speeds up things.
The highlight()-method allows definitions of terms to be highlighted to be passed in via three dots (...);
no explicit list necessary.
A new as.character()-method for kwic-class objects is introduced.

BUG FIXES

The size_coi-slot (coi for corpus of interest) of the context-object included the node; the node (i.e. matches for queries) is excluded now from the count of size_coi.
When calling use(), the registry directory is reset for CQP, so that the corpora in the package that have been activated can be used with CQP syntax.
The script configure.win has been removed so that installation works on Windows without an installation of Rtools.
Bug removed from s_attributes()-method for partition-objects: "fast track" was activated without preconditions.
Bug removed that would swallow metadata/s-attributes to be displayed in kwic-output after highlighting.
As a matter of consistency, the argument meta has been renamed to s_attributes for the kwic()-method for context-objects, and for the enrich()-method for kwic-objects.
To avoid confusion (with argument s_attributes), the argument s_attribute to check for integrity within
a struc has been renamed into boundary.

DOCUMENTATION FIXES

Documentation for kwic-objects has been reworked thoroughly.

Assets 2

09 Jul 14:17

PolMine

v0.7.9

457771f

Jeanne d'Arc

The most visible change of polmineR v0.7.9 may be that the packages moves to a snake_case coding style. This is increasingly the state-of-the-art, and feels much more intuitive when working with the arguments 's_attributes' and 'p_attributes' (rather than pAttributes, and sAttributes). Functions/methods are fully backwards compatible, so old code should not break.

The package now uses a session registry directory, which is a subdirectory of the temporary session directory. This has become mandatory, because CRAN policies do not allow to reset paths within a package, once it has been installed. But it is very useful, because now, switching registry directories can be avoided. The use()-function will now add the corpora in a R data package to the session registry. So this is a good start to work with multiple corpora wrapped in various packages. This involves a set of new functions:

A (new) registry_move()-function is used to copy files to the tmp registry;
The (new) registry()-function will get the temporary registry directory;

A set of changes makes working with bundle objects more versatile and robust:

There is a new as.list()-method for bundle objects, to access the list in the slot objects;
as.bundle() is more generic now, so that any kind of object can be coerced to a bundle now;
The as.speeches()-method turned into function that allows a partition or a corpus as input;

The new version upgrades the count-class. So the count()-method will serve as a constructor for a count object, if no query is provided. This is particularly useful when working with count_bundle-objects.

Minor new features

There is a new is.partition()-function (a logical check);
A new argument 'type' has been added to partition_bundle()-method;
A new method get_type() introduced to make getting corpus type more robust.
A new partition_bundle()-method for partition_bundle-objects has been introduced;

Bug fixes

s_attributes() for partition-objects in line with RcppCWB requirements (no negative values of strucs);
count() repaired for muliple p-attributes;
bug removed causing a crash for as.markdown()-method when cutoff is larger than number of tokens;
a bug removed that has prevented the name<- method to work properly for bundle objects
for count() for partition_bundle-objects, the column 'partition' will be a character vector now (not factor)
bug removed that has caused a crash when cutoff is larger than number of tokens in a partition when calling get_token_stream

Enjoy!

Assets 2

18 May 16:26

PolMine

v0.7.8

9f52a3c

Panda Belly

upon loading the package, new check that data directories are set correctly in registry files to make sure that sample data in pre-compiled packages can be used
startup messages adjusted slightly
first version that works with sample data without complications

Assets 2

04 Oct 19:57

PolMine

v0.7.5

0838088

v0.7.5

class 'Regions' renamed to class 'regions' as a matter of consistency
data type of slot cpos of class 'regions' is a matrix now
rework and improved documentation for decode- and encode-methods
new functions copy.corpus and rename.corpus
as.DocumentTermMatrix-method checks for strucs with value -1
improved as.speeches-method: reordering of speeches, default values
blapply-method: verbose output will be suppressed of progress is TRUE

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features

Minor improvements

Bug fixes

Documentation

New Features

Minor Improvements

New Features

Bug fixes

Minor Improvements

Bug Fixes

New Features

Minor changes

Bug fixes

polmineR 0.7.11

NEW FEATURES

MINOR IMPROVEMENTS

BUG FIXES

DOCUMENTATION FIXES

polmineR 0.7.10

NEW FEATURES

MINOR IMPROVEMENTS

BUG FIXES

DOCUMENTATION FIXES

Minor new features

Bug fixes

Releases: PolMine/polmineR

Nested Boxes

New features

Minor improvements

Bug fixes

Documentation

Yellow Submarine

New Features

Minor Improvements

Putty Knive

New Features

Bug fixes

Unicorn Dream

Minor Improvements

Bug Fixes

Caterpillar Mambo

New Features

Minor changes

Bug fixes

Bright Side

polmineR 0.7.11

NEW FEATURES

MINOR IMPROVEMENTS

BUG FIXES

DOCUMENTATION FIXES

Bachelor's Delight

polmineR 0.7.10

NEW FEATURES

MINOR IMPROVEMENTS

BUG FIXES

DOCUMENTATION FIXES

Jeanne d'Arc

Minor new features

Bug fixes

Panda Belly

v0.7.5