docsimil()
forlist
objects robust when char filtering results in empty docs.
- Method
detect_duplicates()
renamed todocsimil()
. - Function
as_docgroups()
renamed todocgroups()
.
- Function
minimize_vocabulary()
more generic and renamed ascharfilter()
. - Method
nchars()
renamed tocharcount()
. - Function
duplicates_get_groups()
renamed toas_docgroups()
. - Argument
s_attribute
of methoddetect_duplicates()
used generically. A new column with the name of the the s-attribute to be used as metadata will be added. - Dropped method
duplicates_encode()
- it is better to usecwbtools::s_attribute_encode()
without wrapper.
- Bug removed of
nchars()
-method forcorpus
objects. Unit test added.
- Significant performance improvement for
nchars()
-method for corpus objects.
Duplicates$get_comparisons()
dropped. Was necessary when computing similarities was much less parsimonious. Irrelevant due to the switch toproxyC::simil()
.