Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance analysing binned data using arguments instead of distend / distbegin when distance is not in dataset #144

Open
LHMarshall opened this issue Nov 8, 2022 · 6 comments
Assignees
Labels

Comments

@LHMarshall
Copy link
Member

When both distbegin / distend area supplied in the dataset along with the arguments cutpoints and width in the function call you get a warning message saying that distbegin / distend are being ignored. In this case there was no column distance hence it was unclear what the detection function was being fitted to.

# There is no distance column
> dat[1,]
  Region.Label Area Sample.Label Species Effort distbegin distend size object
1       forest    1        FP123    YBBU      3        20      30    1      1

> # Don't know why this works as there is no column distance so shouldn't work
> x1<-ds(data = dat, transect = "point", formula=~1, key = "hn", 
+       adjustment = NULL, truncation =list(left=0,right=30), 
+       cutpoints = c(0,5,10,15,20,30), convert_units = conversion.factor)
data already has distend and distbegin columns, removing them and appling binning as specified by cutpoints.
Fitting half-normal key function
AIC= 167.049

Also results are inconsistent between the following 2 models when they should be identical

> # Try to achieve analysis with truncation of 30
> dat2 <- dat
> # Make a distance column
> dat2$distance <- (dat$distbegin+dat$distend)/2
> # Re-cut data as per bins
> x3<-ds(data = dat2, transect = "point", formula=~1, key = "hn", 
+        adjustment = NULL, truncation =list(left=0,right=30), 
+        cutpoints = c(0,5,10,15,20,30), convert_units = conversion.factor)
data already has distend and distbegin columns, removing them and appling binning as specified by cutpoints.
Fitting half-normal key function
AIC= 167.049
> plot(x3)
> # Now try same analysis with using distbegin / distend
> # Need to subset data
> dat3 <- dat[dat$distend <= 30,]
> View(dat3)
> x4<-ds(data = dat3, transect = "point", formula=~1, key = "hn", 
+        adjustment = NULL, convert_units = conversion.factor)
Columns "distbegin" and "distend" in data: performing a binned analysis...
Fitting half-normal key function
AIC= 159.573
> plot(x4)

x3 model plot:
image

x4 model plot (not the strange additional point at distance 5)
image

@LHMarshall LHMarshall added this to the CRAN 1.0.7 milestone Nov 8, 2022
@LHMarshall LHMarshall self-assigned this Nov 8, 2022
@LHMarshall LHMarshall added the bug label Nov 8, 2022
@LHMarshall LHMarshall modified the milestones: CRAN 1.0.7, CRAN 1.0.8 Nov 8, 2022
@lenthomas
Copy link
Member

My suggestion is that we check input data and not allow users to have a distance column and distbegin + distend in the same data frame they pass in to ds or ddf (in mrds). Once we check for and eliminate this, we will solve a bunch of problems. This may also help solve issue #147.

@LHMarshall
Copy link
Member Author

Checking the data for distance and distbegin and distend columns doesn't do anything to fix this scenario as there was no distance column in the data to start with. Early on in the ds function if there is no distance column it is created using the distend and distbegin columns. It was this column that was then used with the specified cutpoints to make new distbegin and distend columns in the data.

@lenthomas
Copy link
Member

OK good point. My suggestion then is that we should not be adding a distance column to the dataset. No doubt it's being done so some other code works - but I think (without looking into the details) we're better to change that other code so that it's robust to not having a distance column. Having a fake distance column puts us in danger that it will be analyzed somewhere as an exact distance when it is not. I appreciate this will be more work and so puts this issue down the priority list. (Still think we should check for distance and distend/disbegin when users pass in data frames and not allow both, as well as the above.)

@lenthomas
Copy link
Member

Just to note that Laura mentioned for this particular dataset, the distbegin and distend are related to the same underlying set of cutpoints for all observations. This is not a case where there are different distance intervals for each observation - although clearly our code needs to be robust to that.
Given that all the bins are the same in this dataset, I don't know why the third code example produces a different result.

@lenthomas
Copy link
Member

One short-term thing to do here is to add documentation under distbegin and distend to discourage users from using this when they have a fixed set of cutpoints that apply to the whole survey.

LHMarshall added a commit that referenced this issue Dec 5, 2023
Also amended tests to comply with this.
Reference #144
LHMarshall added a commit that referenced this issue Dec 14, 2023
@LHMarshall
Copy link
Member Author

I have updated the documentation but as the next step is a big fix I have moved this to the next release milestone.

LHMarshall added a commit that referenced this issue Dec 22, 2023
Also amended tests to comply with this.
Reference #144
LHMarshall added a commit that referenced this issue Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants