Skip to content

Commit

Permalink
Merge pull request #341 from lindbrook/release
Browse files Browse the repository at this point in the history
update DESCRIPTION and README
  • Loading branch information
lindbrook committed Oct 10, 2023
2 parents b70a2a6 + 1decf2f commit f1cf06e
Show file tree
Hide file tree
Showing 3 changed files with 171 additions and 17 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
@@ -1,8 +1,8 @@
Package: packageRank
Type: Package
Title: Computation and Visualization of Package Download Counts and Percentiles
Version: 0.8.1.9008
Date: 2023-10-08
Version: 0.8.2
Date: 2023-10-10
Authors@R: person("lindbrook", email = "lindbrook@gmail.com",
role = c("aut", "cre"))
Maintainer: lindbrook <lindbrook@gmail.com>
Expand Down
78 changes: 73 additions & 5 deletions README.Rmd
Expand Up @@ -26,8 +26,9 @@ You can read more about the package the sections below:
* [II Download Rank Percentiles](#ii---download-rank-percentiles) describes how `packageRank()` makes use of rank percentiles. This nonparametric statistic computes the percentage of packages that with fewer downloads than yours (e.g., your package is in the 74th percentile). This facilitates comparison and helps you to locate you packaged in the overall distribution of [CRAN](https://CRAN.R-project.org/) package downloads.
* [III Inflation Filters](#iii---inflation-filters) describes the five different filter functions used to remove software and behavioral artifacts that inflate _nominal_ download counts. This functionality is offered in `packageRank()` and `packageLog()` but _not_, for computational reasons, in `cranDownloads()`.
* [IV Availability of Results](#iv---availability-of-results) discusses when results become available and how to use `logInfo()` to check the availability of today's results.
* [V Data Fixes](#v---data-fixes) discusses two functions,`fixDate_2012()` and `fixCranlogs()`, which address data problems with logs from 2012 and 2013.
* [VI Et Cetera](#vi---et-cetera) discusses country code top-level domains (e.g., countryPackage() and packageCountry()), the use of memoization, the effect of time zones, the internet connection time out problem, and the recent (and ongoing) spikes in the download of the Windows version of the R application on Sundays and Wednesdays.
* [V Data Fixes A](#v---data-fixes-a) discusses two functions,`fixDate_2012()` and `fixCranlogs()`, which address data problems with logs from 2012 and 2013.
* [VI Data Fixes B](#vi---data-fixes-b) discusses a "doubling" of package and R application download counts that appeared in the second half of September through the beginning of October 2023. By default, a fix is incorporated in `packageRank::cranDownloads()`.
* [VII et cetera](#vii---et-cetera) discusses country code top-level domains (e.g., `countryPackage()` and `packageCountry()`), the use of memoization, the effect of time zones, the internet connection time out problem, and the spikes in the download of the Windows version of the R application on Sundays and Wednesdays between 06 November 2022 and 19 March 2023.

### getting started

Expand Down Expand Up @@ -731,10 +732,11 @@ $status
[1] "Today's log is typically posted by 09:00 PST (01 Feb 17:00 GMT)."
```

### V - data fixes
### V - data fixes A

For the historically minded, there are two data fixes to note. The first stems from problems with the logs when Posit/RStudio first began posting them. The second stems from how ['cranlogs'](https://CRAN.R-project.org/package=cranlogs) works. The fixes work and are documented in two functions:
For the historically minded, there are two data fixes to note. The first stems from problems with the logs when Posit/RStudio first began posting them. The second stems from how ['cranlogs'](https://CRAN.R-project.org/package=cranlogs) works.

The fixes are coded in two functions:

#### `fixDate_2012()`

Expand All @@ -756,7 +758,73 @@ Because ['cranlogs'](https://CRAN.R-project.org/package=cranlogs) relies on the

I've patched this overcounting problem in [`packageRank::cranDownloads()`](https://github.com/lindbrook/packageRank/blob/master/R/cranDownloads.R) via [fixCranlogs()](https://github.com/lindbrook/packageRank/blob/master/R/fixCranlogs.R). This function recomputes the data using the actual logs when any of the eight problematic dates are requested. The details about the 8 days and `fixCranlogs()` can be found [here](https://github.com/lindbrook/packageRank/blob/master/docs/logs.md).

### VI - et cetera
### VI - data fixes B

Recently, two additional data problems have emerged. First, from 2023-09-19 through 2023-10-01, the download counts for R packages (and the total number of CRAN downloads, computed via `packages = NULL`) returned by `cranlogs::cran_downloads()` is twice what one would expect when looking at the actual log(s):

For example:

```{r, cranlogs_2x}
cranlogs::cran_downloads(packages = "sna", from = "2023-09-19", to = "2023-09-19")
```

The corresponding Posit/RStudio log shows half the number of downloads:

```{r, packagRank_1x}
nrow(packageRank::packageLog(packages = "sna", date = "2023-09-19"))
```

This is an across-the-board effect for all packages. Here's the overview for the total CRAN download counts:

```
date Posit cranlogs ratio
1 2023-09-15 6479353 6479353 1
2 2023-09-16 3516904 3516904 1
3 2023-09-17 3534662 3534662 1
4 2023-09-18 7309822 7309822 1
5 2023-09-19 7608886 15217772 2
6 2023-09-20 7488178 14976356 2
7 2023-09-21 6862071 13724142 2
8 2023-09-22 6410593 12821186 2
9 2023-09-23 4011634 8023268 2
10 2023-09-24 3548594 7097188 2
11 2023-09-25 6845864 13691728 2
12 2023-09-26 7204419 14408838 2
13 2023-09-27 7188019 14376038 2
14 2023-09-28 6526022 13052044 2
15 2023-09-29 5653322 11306644 2
16 2023-09-30 3165387 6330774 2
17 2023-10-01 3277506 6555012 2
18 2023-10-02 6268556 6268556 1
19 2023-10-03 6732379 6732379 1
```

Details and code for replication can be found in issue [#68](https://github.com/r-hub/cranlogs/issues/68) in the `cranlogs` GitHub repository.

Second, from 2023-09-13 through 2023-10-02, the download counts for the R application returned by `cranlogs::cran_downloads(packages = "R")`, is also twice what one would expect when looking at the actual log(s). There are, however, two exceptions: 1) on 2023-09-28 the counts are identical but for a "rounding error" possibly due to an NA value and 2) on 2023-09-30 there is actually a three-fold difference.

Here are the respective count ratios:

```
2023-09-12 2023-09-13 2023-09-14 2023-09-15 2023-09-16 2023-09-17 2023-09-18 2023-09-19
osx 1 2 2 2 2 2 2 2
src 1 2 2 2 2 2 2 2
win 1 2 2 2 2 2 2 2
2023-09-20 2023-09-21 2023-09-22 2023-09-23 2023-09-24 2023-09-25 2023-09-26 2023-09-27
osx 2 2 2 2 2 2 2 2
src 2 2 2 2 2 2 2 2
win 2 2 2 2 2 2 2 2
2023-09-28 2023-09-29 2023-09-30 2023-10-01 2023-10-02 2023-10-03
osx 1.000000 2 3 2 2 1
src 1.000801 2 3 2 2 1
win 1.000000 2 3 2 2 1
```

Details and code for replication can be found in issue [#69](https://github.com/r-hub/cranlogs/issues/69).

Assuming the logs are "correct", why these two problems emerged is unclear. For now, `packageRank::cranDownloads()` fixes both by default via the `fix.cranlogs = TRUE` argument.

### VII - et cetera

For those interested in directly using the [Posit/RStudio download logs](http://cran-logs.rstudio.com/), this section describes some issues that may be of use.

Expand Down
106 changes: 96 additions & 10 deletions README.md
@@ -1,7 +1,7 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->
[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/packageRank)](https://cran.r-project.org/package=packageRank)
[![GitHub\_Status\_Badge](https://img.shields.io/badge/GitHub-0.8.1.9008-red.svg)](https://github.com/lindbrook/packageRank/blob/master/NEWS.md)
[![GitHub\_Status\_Badge](https://img.shields.io/badge/GitHub-0.8.2-red.svg)](https://github.com/lindbrook/packageRank/blob/master/NEWS.md)
## packageRank: compute and visualize package download counts and rank percentiles

[‘packageRank’](https://CRAN.R-project.org/package=packageRank) is an R
Expand Down Expand Up @@ -32,14 +32,19 @@ You can read more about the package the sections below:
- [IV Availability of Results](#iv---availability-of-results) discusses
when results become available and how to use `logInfo()` to check the
availability of today’s results.
- [V Data Fixes](#v---data-fixes) discusses two
- [V Data Fixes A](#v---data-fixes-a) discusses two
functions,`fixDate_2012()` and `fixCranlogs()`, which address data
problems with logs from 2012 and 2013.
- [VI Et Cetera](#vi---et-cetera) discusses country code top-level
domains (e.g., countryPackage() and packageCountry()), the use of
- [VI Data Fixes B](#vi---data-fixes-b) discusses a “doubling” of
package and R application download counts that appeared in the second
half of September through the beginning of October 2023. By default, a
fix is incorporated in `packageRank::cranDownloads()`.
- [VII et cetera](#vii---et-cetera) discusses country code top-level
domains (e.g., `countryPackage()` and `packageCountry()`), the use of
memoization, the effect of time zones, the internet connection time
out problem, and the recent (and ongoing) spikes in the download of
the Windows version of the R application on Sundays and Wednesdays.
out problem, and the spikes in the download of the Windows version of
the R application on Sundays and Wednesdays between 06 November 2022
and 19 March 2023.

### getting started

Expand Down Expand Up @@ -901,13 +906,14 @@ logInfo()
$status
[1] "Today's log is typically posted by 09:00 PST (01 Feb 17:00 GMT)."

### V - data fixes
### V - data fixes A

For the historically minded, there are two data fixes to note. The first
stems from problems with the logs when Posit/RStudio first began posting
them. The second stems from how
[‘cranlogs’](https://CRAN.R-project.org/package=cranlogs) works. The
fixes work and are documented in two functions:
[‘cranlogs’](https://CRAN.R-project.org/package=cranlogs) works.

The fixes are coded in two functions:

#### `fixDate_2012()`

Expand Down Expand Up @@ -963,7 +969,87 @@ eight problematic dates are requested. The details about the 8 days and
`fixCranlogs()` can be found
[here](https://github.com/lindbrook/packageRank/blob/master/docs/logs.md).

### VI - et cetera
### VI - data fixes B

Recently, two additional data problems have emerged. First, from
2023-09-19 through 2023-10-01, the download counts for R packages (and
the total number of CRAN downloads, computed via `packages = NULL`)
returned by `cranlogs::cran_downloads()` is twice what one would expect
when looking at the actual log(s):

For example:

``` r
cranlogs::cran_downloads(packages = "sna", from = "2023-09-19", to = "2023-09-19")
> date count package
> 1 2023-09-19 1524 sna
```

The corresponding Posit/RStudio log shows half the number of downloads:

``` r
nrow(packageRank::packageLog(packages = "sna", date = "2023-09-19"))
> [1] 762
```

This is an across-the-board effect for all packages. Here’s the overview
for the total CRAN download counts:

date Posit cranlogs ratio
1 2023-09-15 6479353 6479353 1
2 2023-09-16 3516904 3516904 1
3 2023-09-17 3534662 3534662 1
4 2023-09-18 7309822 7309822 1
5 2023-09-19 7608886 15217772 2
6 2023-09-20 7488178 14976356 2
7 2023-09-21 6862071 13724142 2
8 2023-09-22 6410593 12821186 2
9 2023-09-23 4011634 8023268 2
10 2023-09-24 3548594 7097188 2
11 2023-09-25 6845864 13691728 2
12 2023-09-26 7204419 14408838 2
13 2023-09-27 7188019 14376038 2
14 2023-09-28 6526022 13052044 2
15 2023-09-29 5653322 11306644 2
16 2023-09-30 3165387 6330774 2
17 2023-10-01 3277506 6555012 2
18 2023-10-02 6268556 6268556 1
19 2023-10-03 6732379 6732379 1

Details and code for replication can be found in issue
[\#68](https://github.com/r-hub/cranlogs/issues/68) in the `cranlogs`
GitHub repository.

Second, from 2023-09-13 through 2023-10-02, the download counts for the
R application returned by `cranlogs::cran_downloads(packages = "R")`, is
also twice what one would expect when looking at the actual log(s).
There are, however, two exceptions: 1) on 2023-09-28 the counts are
identical but for a “rounding error” possibly due to an NA value and 2)
on 2023-09-30 there is actually a three-fold difference.

Here are the respective count ratios:

2023-09-12 2023-09-13 2023-09-14 2023-09-15 2023-09-16 2023-09-17 2023-09-18 2023-09-19
osx 1 2 2 2 2 2 2 2
src 1 2 2 2 2 2 2 2
win 1 2 2 2 2 2 2 2
2023-09-20 2023-09-21 2023-09-22 2023-09-23 2023-09-24 2023-09-25 2023-09-26 2023-09-27
osx 2 2 2 2 2 2 2 2
src 2 2 2 2 2 2 2 2
win 2 2 2 2 2 2 2 2
2023-09-28 2023-09-29 2023-09-30 2023-10-01 2023-10-02 2023-10-03
osx 1.000000 2 3 2 2 1
src 1.000801 2 3 2 2 1
win 1.000000 2 3 2 2 1

Details and code for replication can be found in issue
[\#69](https://github.com/r-hub/cranlogs/issues/69).

Assuming the logs are “correct”, why these two problems emerged is
unclear. For now, `packageRank::cranDownloads()` fixes both by default
via the `fix.cranlogs = TRUE` argument.

### VII - et cetera

For those interested in directly using the [Posit/RStudio download
logs](http://cran-logs.rstudio.com/), this section describes some issues
Expand Down

0 comments on commit f1cf06e

Please sign in to comment.