Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicate QC flag #393

Open
6 tasks
cristinamullin opened this issue Jan 22, 2024 · 0 comments
Open
6 tasks

Replicate QC flag #393

cristinamullin opened this issue Jan 22, 2024 · 0 comments
Labels
Future Improvement Minimum viable function complete, issue includes potential future improvements Module 1 QAQC

Comments

@cristinamullin
Copy link
Collaborator

cristinamullin commented Jan 22, 2024

Is your feature request related to a problem? Please describe.

Users of TADA have noted that it would be useful to incorporate replicate field samples into water quality data analysis by flagging routine field sample measurements whose associated replicate field sample measurements are outside of a user-defined window of precision (relative percent difference or absolute difference). A two-stage data-quality-indicator, where low values should be within the absolute difference limit and high values within the Relative Percent Difference (RPD) limit, may be appropriate. RPD is the calculated difference (RPD) between the routine sample result and its associated replicate sample result. For example, if the RPD/CV exceeds 20% some water quality, analysts consider that to be a potentially concerning lack of precision, especially for non-particulate analytes. However, depending on the characteristic being analyzed and the sampling method, acceptable RPDs can vary widely. Therefore, it is best for the user to define their own level of RPD acceptability. In addition, a tiered approach may be more appropriate, where the widely used 20% RPD for measurements can be used for results above XX-times the detection limit, but also an absolute difference approach can be used for those result-values near the detection limit, or lower than the detection limit (e.g., phosphorus). An absolute difference approach is more appropriate when implementing RPD for samples close to the detection limit, as even small absolute differences might show up as large relative percent differences that "fail" the 20% RPD test.

For example, when nutrient concentrations are close to detection limit, it becomes impossible to have a low RPD. In this scenario, high RPD's are acceptable because if you stand back and look at ALL the data, and not just the replicates, these data may be agreeing perfectly well that nutrients are very low. DO NOT throw out data if RPD is >20%, unless you have good reason, or you will potentially bias your data toward high concentrations. QA procedures should not bias statistical analyses of the data. Note that a modest error in a measurement will have a much smaller effect than implementing a QA process that builds in bias.

Describe the solution you'd like

Write new function to flag paired replicates using a tiered approach, where the widely used 20% RPD for measurements can be used for results above XX-times the detection limit, but also an absolute difference approach can be used for those result-values near the detection limit, or lower than the detection limit (e.g., phosphorus). An absolute difference approach is more appropriate when implementing RPD for samples close to the detection limit, as even small absolute differences might show up as large relative percent differences that "fail" the 20% RPD test.

Additional context

What are replicate samples and how are they used in water analyses?

Replicate field samples are samples taken to assess the reproducibility of the sampling technique or analytical method. They are independently carried through all the steps of the sampling and measurement process in an identical manner to their associated routine field sample and used to measure the precision of the total sampling method.

Theoretically, the analysis of a replicate field sample should yield a very similar result as its associated routine field sample. If the results are not the same or acceptably similar, it could signal possible contamination or other issues in the sampling chain. However, water quality can vary at very small scales. So, the field replicate can mix up analytical precision with small scale variability. Field replicates tell you the potential for your method to yield the same results at a single time and place, to the extent that you are actually in exactly the same place, and the few seconds (or any defined time window) from one sample to the next does not matter, and the water isn’t moving. Be careful about labeling data as imprecise or bad based on this alone.

See Issue Paper: https://usepa.sharepoint.com/:w:/r/sites/AutomatedDataAnalysisWorkingGroup/_layouts/15/Doc.aspx?sourcedoc=%7B12716121-CFA6-4845-88B0-F1C88070B29C%7D&file=IssuePaper_ReplicateSamples_July2023.docx&action=default&mobileredirect=true

And notes: https://usepa.sharepoint.com/:w:/r/sites/AutomatedDataAnalysisWorkingGroup/_layouts/15/Doc.aspx?sourcedoc=%7B4151F130-8C47-4A57-9D56-9A90D92FF74A%7D&file=TADAWorkingGroup_Jul2023.docx&action=default&mobileredirect=true

Reminders for TADA contributors addressing this issue

New features should include all of the following work:

  • Create the function/code.

  • Document all code using comments to describe what is does.

  • Create tests in tests folder.

  • Create help file using roxygen2 above code.

  • Create working examples in help file (via roxygen2).

  • Add to appropriate vignette (or create new one).

@cristinamullin cristinamullin added Module 1 Future Improvement Minimum viable function complete, issue includes potential future improvements labels Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Future Improvement Minimum viable function complete, issue includes potential future improvements Module 1 QAQC
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants