Major overhaul from last version (1.3.2, last updated 2016-10-06).
Visible differences are support for BEDMatrix and fewer cases in which association p-values are NA
.
Internally there was major code restructuring, and added unit tests for all functions.
-
User-facing changes: Functions
gcat
/gcatest
/gcat.stat
- added support for BEDMatrix objects for the genotype matrix
X
.- This consumes lower memory when the number of loci
m
is very large, so it enables analysis of larger datasets.
- This consumes lower memory when the number of loci
- Fixed some cases where the test statistic (the delta deviance) and ultimately the p-values were
NA
orNaN
and are no longer missing.- One common case is when fitted probabilities were zero or one, which used to lead to
NaN
deviances when their correct contribution was instead zero (because the limit ofp*log(p)
asp
goes to zero is zero, not0 * (-Inf) = NaN
). - Other
NA
andNaN
cases are avoided in thelfa
functionaf_snp
(fixed in lfa 2.0.0.9000, 2020-09-18) used to estimate the individual-specific allele frequencies used here to compute the delta deviance. However, in rare cases the logistic regression inaf_snp
fails to converge or there are other problems, resulting inNA
values propagated to GCATest's test statistic and p-values. - Otherwise, the new delta deviance code (function
delta_deviance_snp
) is more numerically-stable than before.
- One common case is when fitted probabilities were zero or one, which used to lead to
- added support for BEDMatrix objects for the genotype matrix
-
Internal changes
- Separated R functions into one source file each.
- Added more input checks to all functions.
- Added
.gitignore
files from another project. - Added unit tests for all functions using
testthat
. - Removed internal
assoc
C code- Previously only used for genotype data without missingness (so practically not on real datasets)
- Was entirely redundant with
lfa::af_snp
, which is now called in all cases instead. - Had bugs concerning handling of p == 0 or 1 cases that are better handled in
assoc_snp
R code
- Minor scattered changes solely to pass latest
R CMD check
requirements.
- Documentation updates:
- Fixed links to functions, in many cases these were broken because of incompatible mixed Rd and markdown syntax (now markdown is used more fully).
- Added internal tests for deviance calculations against
stats::glm
. - Deviance code (internal
delta_deviance_snp
) now returnsNA
instead of stopping when an "impossible" case is encountered (when the genotypex
is non-zero but the fitted probabilities under either null or alternative model are zero, or the alternative allele dosage (x-2
) has the same problem). These cases are clearly model fitting failures, and can arise for common ill-defined problems, particularly under binaryadjustment
variables passed togcat
together with rare variants; these individual cases are not handled any better bystats::glm
, so it seemed most sensible to returnNA
at such loci and not stop.
- Added function
delta_deviance_lf
, which calculates the delta deviance from two logistic models and the genotype matrix data. This function is a more general version ofgcat.stat
(which uses the new function internally), to essentially consider models that differ by more than one degree of freedom. It was written in particular for an external application in mind, namely thejackstraw
package. - Internal function
assoc_snp
was renamed todelta_deviance_snp_lf
and its last argument changed to match that ofdelta_deviance_lf
(alternative logistic factors instead of trait).
- Function
delta_deviance_lf
debugged case where eitherLF0
orLF1
is a column matrix. Previously these 1-column matrices were getting dropped to a vector incorrectly, which resulted in the mysterious error message "Error: argument is of length zero". This 1-column case is not typically observed ingcatest
, but is common in the reverse-dependentjackstraw
package.
- Lots of minor changes for Bioconductor update.
- DESCRIPTION:
- Updated to
Authors@R
. - Lengthened "Description" paragraph.
- Increased R dependency from 3.2 to 4.0.
- Updated to
- Reformatted this
NEWS.md
slightly to improve its automatic parsing. - Added examples for function
delta_deviance_lf
. - Updated vignette to reflect that
lfa::read.bed
has been deprecated in favor ofgenio::read_plink
andBEDMatrix
objects. - Updated
README.md
, including corrections to examples. - Updated citations:
README.md
: only had GCATest paper link, now has full citation and also full LFA citation.- Vignette: used to point to LFA arXiv preprint, now points to published paper.
inst/CITATION
: didn't exist! Now includes both LFA and GCATest papers.
- Added
LICENSE.md
. - Internal changes:
- All unexported functions are now prefixed with a period.
- Replaced
1:x
withseq_len(x)
several functions. - Reformatted all code with package
reformatR
and otherwise match Bioconductor guidelines.
- DESCRIPTION:
README.md
upgraded links from http to https- Minor doc reformatting automatically performed by
roxygen2
.
- Version bump for bioconductor devel.
- Commented out various excessive tests against
glm
, which differ more often than expected due to poor or lack of convergence. - Removed unused LaTeX package dependencies from vignette to prevent errors restricted to specific testing platforms.
- Fixed
..density..
deprecation warning in vignette plot.
- Commented out two more strict tests (for non-negative deviances) that fail too often on bioconductor.
- Commented out one more strict test (NA deviances) that fail too often on bioconductor.