Releases: SebKrantz/collapse
collapse version 2.0.14
-
Updated 'collapse and sf' vignette to reflect the recent support for units objects, and added a few more examples.
-
Fixed a bug in
join()
where a full join silently became a left join if there are no matches between the tables (#574). Thanks @D3SL for reporting. -
Added function
group_by_vars()
: A standard evaluation version offgroup_by()
that is slimmer and safer for programming, e.g.data |> group_by_vars(ind1) |> collapg(custom = list(fmean = ind2, fsum = ind3))
. Or, using magrittr:
library(magrittr)
set_collapse(mask = "manip") # for fgroup_vars -> group_vars
data %>%
group_by_vars(ind1) %>% {
add_vars(
group_vars(., "unique"),
get_vars(., ind2) %>% fmean(keep.g = FALSE) %>% add_stub("mean_"),
get_vars(., ind3) %>% fsum(keep.g = FALSE) %>% add_stub("sum_")
)
}
-
Added function
as_integer_factor()
to turn factors/factor columns into integer vectors.as_numeric_factor()
already exists, but is memory inefficient for most factors where levels can be integers. -
join()
now internally checks if the rows of the joined datasets match exactly. This check, usingidentical(m, seq_row(y))
, is inexpensive, but, ifTRUE
, saves a full subset and deep copy ofy
. Thusjoin()
now inherits the intelligence already present in functions likefsubset()
,roworder()
andfunique()
- a key for efficient data manipulation is simply doing less. -
In
join()
, ifattr = TRUE
, thecount
option tofmatch()
is always invoked, so that the attribute attached always has the same form, regardless ofverbose
orvalidate
settings. -
roworder[v]()
has optional settingverbose = 2L
to indicate ifx
is already sorted, making the call toroworder[v]()
redundant.
collapse version 2.0.13
-
collapse now explicitly supports xts/zoo and units objects and concurrently removes an additional check in the
.default
method of statistical functions that called the matrix method ifis.matrix(x) && !inherits(x, "matrix")
. This was a smart solution to account for the fact that xts objects are matrix-based but don't inherit the"matrix"
class, thus wrongly calling the default method. The same is the case for units, but here, my recent more intensive engagement with spatial data convinced me that this should be changed. For one, under the previous heuristic solution, it was not possible to call the default method on a units matrix, e.g.,fmean.default(st_distance(points_sf))
calledfmean.matrix()
and yielded a vector. This should not be the case. Secondly, aggregation e.g.fmean(st_distance(points_sf))
orfmean(st_distance(points_sf), g = group_vec)
yielded a plain numeric object that lost the units class (in line with the general attribute handling principles). Therefore, I have now decided to remove the heuristic check within the default methods, and explicitly support zoo and units objects. For Fast Statistical Functions, the methods areFUN.zoo <- function(x, ...) if(is.matrix(x)) FUN.matrix(x, ...) else FUN.default(x, ...)
andFUN.units <- function(x, ...) if(is.matrix(x)) copyMostAttrib(FUN.matrix(x, ...), x) else FUN.default(x, ...)
. While the behavior for xts/zoo remains the same, the behavior for units is enhanced, as now the class is preserved in aggregations (the.default
method preserves attributes except for ts), and it is possible to manually invoke the.default
method on a units matrix and obtain an aggregate statistic. This change may impact computations on other matrix based classes which don't inherit from"matrix"
(mts does inherit from"matrix"
, and I am not aware of any other affected classes, but user code likem <- matrix(rnorm(25), 5); class(m) <- "bla"; fmean(m)
will now yield a scalar instead of a vector. Such code must be adjusted to eitherclass(m) <- c("bla", "matrix")
orfmean.matrix(m)
). Overall, the change makes collapse behave in a more standard and predictable way, and enhances its support for units objects central in the sf ecosystem. -
fquantile()
now also preserves the attributes of the input, in line withquantile()
.
collapse version 2.0.12
- Fixes some issues with signed int overflows inside hash functions and possible protect bugs flagged by RCHK. With few exceptions these fixes are cosmetic to appease the C/C++ code checks on CRAN.
collapse version 2.0.11
-
An article on collapse has been submitted to the Journal of Statistical Software. The preprint is available through arXiv.
-
Removed magrittr from most documentation examples (using base pipe).
-
Improved
plot.GRP
a little bit - on request of JSS editors.
collapse version 2.0.10
-
Fixed a bug in
fmatch()
when matching integer vectors to factors. This also affectedjoin()
. -
Improved cross-platform compatibility of OpenMP flags. Thanks @kalibera.
-
Added
stub = TRUE
argument to the grouped_df methods of Fast Statistical Functions supporting weights, to be able to remove or alter prefixes given to aggregated weights columns ifkeep.w = TRUE
. Globally, users can setst_collapse(stub = FALSE)
to disable this prefixing in all statistical functions and operators.
collapse version 2.0.9
-
Added functions
na_locf()
andna_focb()
for fast basic C implementations of these procedures (optionally by reference).replace_na()
now also has atype
argument which supports options"locf"
and"focb"
(default"const"
), similar todata.table::nafill
. The implementation also supports character data and list-columns (NULL/empty
elements). Thanks @BenoitLondon for suggesting (#489). I note thatna_locf()
exists in some other packages (such as imputeTS) where it is implemented in R and has additional options. Users should utilize the flexible namespace i.e.set_collapse(remove = "na_locf")
to deal with this. -
Fixed a bug in weighted quantile estimation (
fquantile()
) that could lead to wrong/out-of-range estimates in some cases. Thanks @zander-prinsloo for reporting (#523). -
Improved right join such that join column names of
x
instead ofy
are preserved. This is more consistent with the other joins when join columns inx
andy
have different names. -
More fluent and safe interplay of 'mask' and 'remove' options in
set_collapse()
: it is now seamlessly possible to switch from any combination of 'mask' and 'remove' to any other combination without the need of setting them toNULL
first.
collapse version 2.0.8
-
In
pivot(..., values = [multiple columns], labels = "new_labels_column", how = "wieder")
, if the columns selected throughvalues
already have variable labels, they are concatenated with the new labels provided through"new_labels_col"
using" - "
as a separator (similar tonames
where the separator is"_"
). -
whichv()
and operators%==%
,%!=%
now properly account for missing double values, e.g.c(NA_real_, 1) %==% c(NA_real_, 1)
yieldsc(1, 2)
rather than2
. Thanks @eutwt for flagging this (#518). -
In
setv(X, v, R)
, if the type ofR
is greater thanX
e.g.setv(1:10, 1:3, 9.5)
, then a warning is issued that conversion ofR
to the lower type (real to integer in this case) may incur loss of information. Thanks @tony-aw for suggesting (#498). -
frange()
has an optionfinite = FALSE
, likebase::range
. Thanks @MLopez-Ibanez for suggesting (#511). -
varying.pdata.frame(..., any_group = FALSE)
now unindexes the result (as should be the case).
collapse version 2.0.7
-
Fixed bug in full join if
verbose = 0
. Thanks @zander-prinsloo for reporting. -
Added argument
multiple = FALSE
tojoin()
. Settingmultiple = TRUE
performs a multiple-matching join where a row inx
is matched to all matching rows iny
. The defaultFALSE
just takes the first matching row iny
. -
Improved recode/replace functions. Notably,
replace_outliers()
now supports optionvalue = "clip"
to replace outliers with the respective upper/lower bounds, and also has optionsingle.limit = "mad"
which removes outliers exceeding a certain number of median absolute deviations. Furthermore, all functions now have aset
argument which fully applies the transformations by reference. -
Functions
replace_NA
andreplace_Inf
were renamed toreplace_na
andreplace_inf
to make the namespace a bit more consistent. The earlier versions remain available.
collapse version 2.0.6
-
Fixed a serious bug in
qsu()
where higher order weighted statistics were erroneous, i.e. wheneverqsu(x, ..., w = weights, higher = TRUE)
was invoked, the 'SD', 'Skew' and 'Kurt' columns were wrong (ifhigher = FALSE
the weighted 'SD' is correct). The reason is that there appears to be no straightforward generalization of Welford's Online Algorithm to higher-order weighted statistics. This was not detected earlier because the algorithm was only tested with unit weights. The fix involved replacing Welford's Algorithm for the higher-order weighted case by a 2-pass method, that additionally uses long doubles for higher-order terms. Thanks @randrescastaneda for reporting. -
Fixed some unexpected behavior in
t_list()
where names 'V1', 'V2', etc. were assigned to unnamed inner lists. It now preserves the missing names. Thanks @orgadish for flagging this.
collapse version 2.0.5
-
In
join
, the ify
is an expression e.g.join(x = mtcars, y = subset(mtcars, mpg > 20))
, then its name is not extracted but just set to"y"
. Before, the name ofy
would be captured asas.character(substitute(y))[1] = "subset"
in this case. This is an improvement mainly for display purposes, but could also affect code if there are duplicate columns in both datasets andsuffix
was not provided in thejoin
call: before, y-columns would be renamed using a (non-sensible)"_subset"
suffix, but now using a"_y"
suffix. Note that this only concerns cases wherey
is an expression rather than a single object. -
Small performance improvements to
%[!]in%
operators:%!in%
now usesis.na(fmatch(x, table))
rather thanfmatch(x, table, 0L) == 0L
, and%in%
, if exported usingset_collapse(mask = "%in%"|"special"|"all")
isas.logical(fmatch(x, table, 0L))
instead offmatch(x, table, 0L) > 0L
. The latter are faster because comparison operators>
,==
with integers additionally need to check forNA
's (= the smallest integer in C).