Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrapping a stratified design fails under usual conditions #157

Open
erex opened this issue Feb 24, 2023 · 5 comments
Open

Bootstrapping a stratified design fails under usual conditions #157

erex opened this issue Feb 24, 2023 · 5 comments
Assignees
Labels

Comments

@erex
Copy link
Member

erex commented Feb 24, 2023

There appears to be a tacit assumption by bootdht regarding naming of Sample.Label. Based upon question from the list, bootdht will return a data frame with 0 columns and 0 rows when Sample.Label names are not unique across strata.

Usual numbering for transects in two strata might be 1, ..., 20 in Stratum 1 and 1, ..., 20 in Stratum 2. If this is the nature of the data provided to bootdht it will fail with the resul of 0 columns and 0 rows in the returned data frame.

Workaround solution is to append the Region.Label onto the Sample.Label, as in

easy$Sample.Label <- paste0(easy$Sample.Label, "-", substr(easy$Region.Label, 5, 8))

Nevertheless, this behaviour either needs to be changed or documented. I encountered this problem elsewhere and mentioned it in this issue

@LHMarshall
Copy link
Member

@erex do you have code to reproduce this example? I tried to reproduce it based on your description but was unsuccessful. I couldn't see code to reproduce it in the other referenced issues either but maybe I missed something. I have also tested when both Region.Label and Sample.Label are either both numeric or both character and it runs fine under both scenarios. Thanks.

# generate some data to test on
set.seed(753)
dat <- data.frame(object = 1:60, Sample.Label = rep(1:10,6),
                  Area = 100, Effort = 1000)
dat$Region.Label <- c(rep("StrataA", 30), rep("StrataB", 30))
dat$distance <- abs(rnorm(nrow(dat), 0, 25))
dat$size <- rpois(nrow(dat), 20)

# Check we have repeated Sample.Label's over strata as described above
unique(dat[,c("Region.Label", "Sample.Label")])

conversion.factor <- convert_units("meter", "meter", "square kilometer")

fit.ds <- ds(data=dat,
             truncation=50,
             key="hn",
             adjustment=NULL,
             convert_units=conversion.factor,
             formula=~size)

set.seed(225)
  easybootn <- bootdht(model=fit.ds,
                                        flatfile=dat,
                                        summary_fun=bootdht_Nhat_summarize,
                                        convert_units=conversion.factor,
                                        sample_fraction=1,
                                        nboot=3, cores=1,
                                        progress_bar = "none")
  
  # Make check table
  check.tab <- easybootn %>% dplyr::group_by(Label) %>%
    dplyr::summarize(LCI=quantile(Nhat,probs=0.025,na.rm=TRUE),
                     UCI=quantile(Nhat,probs=0.975,na.rm=TRUE))

check.tab
# A tibble: 3 × 3
  Label       LCI     UCI
  <chr>     <dbl>   <dbl>
1 StrataA  64763. 106952.
2 StrataB  60080. 100407.
3 Total   124844. 207359.

@LHMarshall
Copy link
Member

This situation does cause issues but not the issue described above. When the sampler ID's are not distinct across strata then object ID's are all associated with each occurrence of the sampler ID irrelevant of which strata the object was associated with even if it differs from the sampler ID.

LHMarshall added a commit that referenced this issue Dec 20, 2023
Initial steps to building testing. I do not know what effects there might be of changing the labelling of transects at this stage. Possibly nothing as replicated are renamed any way.
@LHMarshall
Copy link
Member

Add some documentation to bootdht to say that transect ID's need to be unique across all strata (not just within strata)

LHMarshall added a commit that referenced this issue Dec 20, 2023
@LHMarshall
Copy link
Member

When this is fixed remember to remove the additional text from the manual.

@LHMarshall
Copy link
Member

Demonstrating the code failing

# These are the resampled transects
Browse[2]> samples
$StrataA
 [1]  3  3 10  2  6  5  4  6  9 10

$StrataB
 [1]  5  3  9  9  9  3  8 10  7 10

Browse[2]> n
debug at C:/Users/lhm/Documents/Github/Distance/R/bootdht_resample_data.R#44: obs_per_sample <- obs_per_sample[unlist(samples)]
Browse[2]> n
debug at C:/Users/lhm/Documents/Github/Distance/R/bootdht_resample_data.R#47: obs <- obs_per_sample

#These are the object ID's extracted for each of the resampled transects. 

Browse[2]> obs_per_sample
$`3`
[1]  3 13 23 33 43 53

$`3`
[1]  3 13 23 33 43 53

$`10`
[1] 10 20 30 40 50 60

$`2`
[1]  2 12 22 32 42 52

$`6`
[1]  6 16 26 36 46 56

$`5`
[1]  5 15 25 35 45 55

$`4`
[1]  4 14 24 34 44 54

$`6`
[1]  6 16 26 36 46 56

$`9`
[1]  9 19 29 39 49 59

$`10`
[1] 10 20 30 40 50 60

$`5`
[1]  5 15 25 35 45 55

$`3`
[1]  3 13 23 33 43 53

$`9`
[1]  9 19 29 39 49 59

$`9`
[1]  9 19 29 39 49 59

$`9`
[1]  9 19 29 39 49 59

$`3`
[1]  3 13 23 33 43 53

$`8`
[1]  8 18 28 38 48 58

$`10`
[1] 10 20 30 40 50 60

$`7`
[1]  7 17 27 37 47 57

$`10`
[1] 10 20 30 40 50 60

# We can see that objects associated with the transect ID in both strata are being selected

Browse[2]> bootdat[bootdat$object %in% obs_per_sample$'3',]
   object Sample.Label Area Effort Region.Label  distance size ref.object
3       3            3  100   1000      StrataA 52.208953   25          3
13     13            3  100   1000      StrataA  5.462644   26         13
23     23            3  100   1000      StrataA 31.673722   17         23
33     33            3  100   1000      StrataB 19.394133   16         33
43     43            3  100   1000      StrataB  6.637318   18         43
53     53            3  100   1000      StrataB 32.102037   13         53

LHMarshall added a commit that referenced this issue Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants