Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May need to scale histogram of observed distances when there are unequal bins #110

Open
lenthomas opened this issue Oct 6, 2021 · 1 comment

Comments

@lenthomas
Copy link
Member

Distance 1.0.4.9002, mrds 2.2.5.9000, R 4.1.1.

When the data are binned, and there are unequal bin widths, plotting the histogram without detection function/pdf (plot.ds using option which = 1) gives count frequency on the y-axis:

library(Distance)
data("wren_snapshot")
bin.cutpoints.100m <- bin.cutpoints <- c(0, 10, 20, 30, 40, 60, 80, 100)
conversion.factor <- convert_units("meter", NULL, "hectare")
wrensnap.hn.t100 <- ds(data=wren_snapshot, key="hn", adjustment=NULL, 
                       transect="point", cutpoints=bin.cutpoints.100m,
                       convert.units=conversion.factor)
plot(wrensnap.hn.t100, which = 1, pdf = TRUE)

image

However, it is not clear to me that this is the correct thing to do, as when we plot with the detection function superimposed, the y-axis is scaled to account for the bin width:

plot(wrensnap.hn.t100, which = 2, pdf = TRUE)

image

In this circumstance, base R gives a warning message:

hist(x, breaks = c(0, 10, 20, 30, 40, 60, 80, 100), freq = TRUE)

Gives

Warning message:
In plot.histogram(r, freq = freq1, col = col, border = border, angle = angle,  :
  the AREAS in the plot are wrong -- rather use 'freq = FALSE'

and if you use the default freq=FALSE then it plots the correct density. So, perhaps we should either issue a warning if people choose which = 1 and the bin widths aren't all the same, or we should change and plot density on the y-axis not count?

@dill
Copy link
Contributor

dill commented Oct 7, 2021

I've thought previously we should just remove which=1 plotting. I've rarely seen it used in the wild.

There are a few reasons the bins look different between these two plots, one is probably down to this hist error you mention but there's also scaling to do the scaling between the area under the detection function and the area of the histogram (see the fairly heinous internals of plot.ds for details).

Including the warning seems fine, though I think people might read which=1 effectively as a bar chart and expect that bins with more observations to be taller (than they would be accounting for uneven bin size).

@dill dill self-assigned this Apr 4, 2022
@dill dill added this to the CRAN 1.0.8 milestone Nov 3, 2022
@dill dill removed their assignment Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants