Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify_SingleGrainData(): Bugs and wishes #113

Open
DirkMittelstrass opened this issue Jan 2, 2023 · 9 comments
Open

verify_SingleGrainData(): Bugs and wishes #113

DirkMittelstrass opened this issue Jan 2, 2023 · 9 comments
Assignees

Comments

@DirkMittelstrass
Copy link

Hi Sebastian! First of all, I like to thank you for all the effort you put as maintainer into the Luminescence package for more than a decade now. Your package makes it so much easier to analyse usual and unusual luminescence dating data sets.

However, I encountered some issues with the verify_SingleGrainData() function I like to adresss:

1. In the Details section is an error. There is stated that the validation criterion is:

abs(mean(x) - var(x)) > threshold

while the true criterion is actually:

var(x) / mean(x) > threshold

Either the documentation or the code is outdated and needs revision.

2. The clean up procedure does not work as expected, at least for my data set. If set the arguments cleanup = TRUE, cleanup_level = "aliquot" I get always back:

[verify_SingleGrainData()] RLum.Analysis object reduced to records: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

But nothing was deleted, even if I select a very high threshold which leads to plenty of fails. If I set cleanup_level = "curve", only the records are deleted as expected. Please mail me, if you need the said data set.

3. A good grain may fail the validation check if either the grain has no natural dose or if the grain emits no residual signal during zero-dose measurements (i.e. recuperation test). Thus, it should be possible to allow some records to fail the check without removing the whole grain/aliquot from the data set. The simpliest solution would be to add an argument allowed_fails = 2 or an argument ignore_zero_dose = TRUE. The ignore_zero_dose argument could check for info$IRR_TIME == 0.

4. It would be cool to not get hundreds of plots back if plot = TRUE. A histogram or a result matrix (X = log(ratio), Y = grain no.) might be more convenient.

Btw: I'm ok with it if you assign this change request to me :-).

Session info

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8    LC_MONETARY=German_Germany.utf8
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gridExtra_2.3          OSLdecomposition_1.0.0 knitr_1.40             Luminescence_0.9.20   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9        digest_0.6.30     grid_4.2.2        R6_2.5.1          gtable_0.3.1      evaluate_0.18     highr_0.9        
 [8] httr_1.4.4        cli_3.4.1         rlang_1.0.6       rstudioapi_0.14   data.table_1.14.4 rmarkdown_2.17    tools_4.2.2      
[15] fastmap_1.1.0     xfun_0.34         parallel_4.2.2    compiler_4.2.2    htmltools_0.5.3  

@RLumSK
Copy link
Member

RLumSK commented Jan 2, 2023

1. In the Details section is an error. There is stated that the validation criterion is:

abs(mean(x) - var(x)) > threshold

while the true criterion is actually:

var(x) / mean(x) > threshold

This might be indeed a documentation error. I do remember that I had first a different criterion and then went back to the ratio, but obviously forgot about the documentation.

@RLumSK
Copy link
Member

RLumSK commented Jan 2, 2023

2. The clean up procedure does not work as expected, at least for my data set. If set the arguments cleanup = TRUE, cleanup_level = "aliquot" I get always back:

[verify_SingleGrainData()] RLum.Analysis object reduced to records: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

But nothing was deleted, even if I select a very high threshold which leads to plenty of fails. If I set cleanup_level = "curve", only the records are deleted as expected. Please mail me, if you need the said data set.

Here I am not sure what kind of data you have. The selection should work. At least from the code, I cannot see an obvious reason.

Update (2023-01-03): OK, I did not get this yesterday: This behaviour makes sense because you don't want to remove curves if you are interested in analysing the entire sequence. Otherwise, the sequence is not anymore complete. This means, if `cleanup_level = 'aliquot'" all curves from one aliquot would need to fail, otherwise the algorithm retains the aliquot.

@RLumSK
Copy link
Member

RLumSK commented Jan 2, 2023

3. A good grain may fail the validation check if either the grain has no natural dose or if the grain emits no residual signal during zero-dose measurements (i.e. recuperation test). Thus, it should be possible to allow some records to fail the check without removing the whole grain/aliquot from the data set. The simpliest solution would be to add an argument allowed_fails = 2 or an argument ignore_zero_dose = TRUE. The ignore_zero_dose argument could check for info$IRR_TIME == 0.

This is not the case, if for one grain/positon pair, a valid signal was found, nothing is discarded (depending on the cleanup level).

Update (2023-01-03): Please see my previous comment. It is related to the cleanup level aliquot vs curve.

@RLumSK
Copy link
Member

RLumSK commented Jan 2, 2023

4. It would be cool to not get hundreds of plots back if plot = TRUE. A histogram or a result matrix (X = log(ratio), Y = grain no.) might be more convenient.

The function returns a single plot with a threshold line and points. I admit that a histogram, however, would be more sensible. However, if the function is called on a list of objects, indeed, then you will get one for each analysis object one plot, which might be too much. So we would need a different plot for the self-call option. Would like to implement that?

@RLumSK
Copy link
Member

RLumSK commented Jan 2, 2023

Btw: I'm ok with it if you assign this change request to me :-).

Yes, I would like that. However, first, I would like to see code to reproduce the behaviour (except for the documentation error) so that we can write tests for those scenarios.

For a pull request, please check out branch 113-verify_singlegraindata-bugs-and-wishes and create a pull request against it.

Update (2023-01-03): If I am not mistaken, only point 1 needs to be fixed, point 4 is nice to have, and I would appreciate you taking over to implement it.

@RLumSK RLumSK linked a pull request Jan 2, 2023 that will close this issue
@RLumSK RLumSK removed a link to a pull request Jan 2, 2023
@DirkMittelstrass
Copy link
Author

Update (2023-01-03): OK, I did not get this yesterday: This behaviour makes sense because you don't want to remove curves if you are interested in analysing the entire sequence. Otherwise, the sequence is not anymore complete. This means, if `cleanup_level = 'aliquot'" all curves from one aliquot would need to fail, otherwise the algorithm retains the aliquot.

Well, that makes sense. However, I see still multiple issues here:

  1. The console output text does not say this. I would fix/improve the output text
  2. From a sole application point of view, I would expect that a grain needs at least sufficient test dose signals and higher-dose regenerative dose signals. With the algorithm as it is, it is hard to delete insufficient but existing grains. My idea would be to allow an allowed fails numeric input as allowed parameter for cleanup_level.
  3. The resulting data structure in case a RLum.Analysis object is processed is inconsistent, see the screenshot below. I would fix that and maybe keep an empty record slot, what do you think?
    bug_data_structure

@RLumSK
Copy link
Member

RLumSK commented Jan 3, 2023

Thanks @DirkMittelstrass. I appreciate the detailed discussion!

  • The console output text does not say this. I would fix/improve the output text

It is written in the documentation, but I do agree that a better terminal output may help to better understand the behaviour of the function without reading the manual.

  • From a sole application point of view, I would expect that a grain needs at least sufficient test dose signals and higher-dose regenerative dose signals.

This was an intensively discussed question before the current implementation happened. It sounds odd, but you may have very weird grains or technical issues in single-grain measurements, causing a ratio to fall below the threshold, but you still want to retain the grain. Therefore the current implementation.

With the algorithm as it is, it is hard to delete insufficient but existing grains. My idea would be to allow an allowed fails numeric input as allowed parameter for cleanup_level.

This is, however, a sensible suggestion and makes sense. The default should be NULL (current behaviour) to not break other people's code.

  • The resulting data structure in case a RLum.Analysis object is processed is inconsistent, see the screenshot below. I would fix that and maybe keep an empty record slot, what do you think?

It is just easier to process after, but I do agree it is more logical to have an RLum.Analysis object with zero records instead of an RLum.Results object that comes out of the blue. Please write a unit test before to make sure that we do not break functionality depending on it. This means extending tests/testthat/test_verify_SingleGrainData.R.

@RLumSK
Copy link
Member

RLumSK commented Feb 9, 2023

@DirkMittelstrass Just double-checking. Have I spooked you off, or do you need help with the code changes?

@DirkMittelstrass
Copy link
Author

@RLumSK No, you had not spooked me off. Just other things being more urgent. It is still on my list!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants