Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stratum names which come after "Total" alphabetically cause NA's in bootstrap dht output #158

Closed
erex opened this issue Feb 25, 2023 · 2 comments
Assignees
Labels

Comments

@erex
Copy link
Member

erex commented Feb 25, 2023

Comment by Laura: the anomaly in the effort associated with transect PPWS24-2016 was due to the transect both being included in the data with observations and without as though it had been surveyed but nothing was seen. The NA values for the YEAR2016 strata turned out to be a separate issue to do with alphabetical ordering.

Burrowing down into a bootdht anomaly sparked by a user and reported in Distance issue #157 , I viewed the data frame generated in the bootstrap resample by bootdht_resample_data. Survey is stratified design, problem seems to arise for the first strata, but not remaining strata.

Here is a bit of the data frame, note the effort associated with transect PPWS24-2016:

image

For unknown reasons, the dataframe contains 143 records for transect PPWS24-2016 for 2 detections on that transect. An error of non-unique effort associated with this transect is trapped, and the estimate of abundance for the stratum (2016) is set to NA.

Hypothesis is that the true bug exists in the function bootdht_resample_data that draws the sample. Indeed, bootdht_resample_data creates a superabundance of transect PPWS24-2016, each with different lengths; which does not happen for other transects.

Dataset causing this problem belongs to user, can be provided upon request.

Milou reports success when taking numbers out of Region.Labels:

In addition to changing the Sample.Labels to unique values within each Region.Label, I also replaced the Region.Labels from YEAR2016, YEAR2018,... etc. to A,B,C,D in the original flatfile csv's This seemed to do the trick. I tried it on other combinations of data as well (other species and other PA/year combinations), and it all worked and seems to produce sensible results. For example, for BSD in PPWS:

  Dhat2016 Dhat2018 Dhat2020 Dhat2022
Mean from bootstrap 1.71 0.95 2.33 3.11
Median from bootstrap 1.69 0.94 2.31 3.09
Point Estimate from model 1.80 0.94 2.31 3.08
@lenthomas lenthomas added the bug label Mar 7, 2023
@lenthomas lenthomas added this to the Distance 1.0.8 milestone Mar 7, 2023
@LHMarshall
Copy link
Member

LHMarshall commented Mar 14, 2023

@lenthomas @erex Ok I found the bug... it wasn't the numbers in the stratum names it was the start letter "Y" for YEAR comes after "T" for Total and at some point the Total value gets put above the stratum values and this is unexpected in line 65 of the bootit function (see bootdht_bootit.R)

image

As this user has found a workaround is to use stratum names that start with letters before T in the alphabet. Once the stratum names have been modified the bootstrap results are consistent with the model estimates from the initial model fit.

@LHMarshall LHMarshall changed the title Resampling transects for bootstrap error Stratum names which come after "Total" alphabetically cause NA's in bootstrap dht output Mar 14, 2023
@erex
Copy link
Member Author

erex commented Mar 14, 2023

Good that you found this ideosyncracy. Do we need to document that users cannot have stratum names with letters that follow "T" in the alphabet, or can the code be modified to prevent this problem?

LHMarshall added a commit that referenced this issue Dec 14, 2023
Re-order the values to match the table
LHMarshall added a commit that referenced this issue Dec 14, 2023
LHMarshall added a commit that referenced this issue Dec 14, 2023
Re-order the values to match the table
LHMarshall added a commit that referenced this issue Dec 14, 2023
LHMarshall added a commit that referenced this issue Dec 22, 2023
Re-order the values to match the table
LHMarshall added a commit that referenced this issue Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants