227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240

gracezhihuizhao · 2024-04-18T20:21:21Z

Working on updating the Data Processing vignette as part of the larger ticket #210

tcplfit2 related content updates:

Added description on the BMD bounds
Added a statement about the default t distribution for error
Mentioned the poly2 biphasic updates

Plots and table updates:

Added biphasic poly2 to Table 7 and Figure 2
Updated Figure 3, Table 9, and Figure 4 with updated mc_vignette object.

… Added explanations of new poly2 function and default error function.

…nto 227-tcpl-data-processing-vignette-updates-accounting-for-updates-to-tcplfit2

madison-feshuk · 2024-05-20T19:45:08Z

@gracezhihuizhao getting the following error when knitting:

processing file: Data_processing.Rmd
|.............................. | 58% [unnamed-chunk-35] [unnamed-chunk-17]
Quitting from lines 725-866 [unnamed-chunk-35] (Data_processing.Rmd)

Error:
! object 'conc' not found
Backtrace:

mc3[, :=(logc, log10(conc))]
data.table:::[.data.table(mc3, , :=(logc, log10(conc)))
base::eval(jsub, SDenv, parent.frame())
base::eval(jsub, SDenv, parent.frame())
Execution halted

madison-feshuk · 2024-05-20T19:58:20Z

Looks like the mc3 data was not updated with latest invitrodb data pull. "data-raw/db_cred.R" does not exist in the package... https://github.com/USEPA/CompTox-ToxCast-tcpl/blob/dev/data-raw/mc_vignette.R#L23

Consider removing that line and updating to as: tcplConf(user = "_dataminer", pass = "pass", db = invitrodb, host = "ccte-mysql-res.epa.gov", drvr = "MySQL")
This connection can then be commented out after new files are pulled. Not a fan of the acronym "ivtdb" for continuity

madison-feshuk · 2024-05-20T20:20:55Z

mc_vignette.Rout

This Rout needs to be removed from main tree

madison-feshuk · 2024-05-20T20:24:45Z

vignettes/Data_processing.Rmd

@@ -955,7 +960,7 @@ htmlTable(output,

 ```

-Each of these models assumes the background response is zero and the absolute response (or initial response) is increasing.  Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within <font face="CMTT10"> tcplfit2_core </font> function from <font face="CMTT10"> tcplFit2 </font>.  Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned.  Finally, in cases where 'nofit' was set to TRUE (within <font face="CMTT10"> tcplFit2::tcplfit2_core </font>) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA.  A complete list of model output parameters is provided in Table 8 below.
+Most of these models assumes the background response is zero and the absolute response (or initial response) is increasing. In other words, most of these models are able to fit a monotonic curve for either direction. Polynomial 2 model is an exception because it has two parameterizations. By default, the biphasic parameterization will be used. Biphasic Polynomial 2 is able to fit curve to responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). In applications in which biphasic responses are not reasonable, polynomial 2 can be fitted using the monotonic only parameterization. Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within <font face="CMTT10"> tcplfit2_core </font> function from <font face="CMTT10"> tcplFit2 </font>.  Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned.  Finally, in cases where 'nofit' was set to TRUE (within <font face="CMTT10"> tcplFit2::tcplfit2_core </font>) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA.  A complete list of model output parameters is provided in Table 8 below.


Please see updated text with minor edits: Most of these models assume the background response is zero and the absolute response (or initial response) is increasing. In other words, most of these models are able to fit a monotonic curve for either direction. The polynomial 2 model is an exception because it has two parameterizations. By default, the biphasic parameterization will be used in tcpl. A biphasic polynomial 2 model is able to fit a curve to responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). In applications in which biphasic responses are not reasonable, polynomial 2 can be fit using the monotonic only parameterization.

Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within tcplfit2_core function from tcplFit2 . Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned. Finally, in cases where 'nofit' was set to TRUE (within tcplFit2::tcplfit2_core ) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA. A complete list of model output parameters is provided in Table 8 below.

madison-feshuk · 2024-05-20T20:30:07Z

vignettes/Data_processing.Rmd

@@ -1269,8 +1275,7 @@ tcplMthdAssign(

 As described previously, since the continuous hit call is the product of three proportional weights, and the resulting value is between 0 and 1.  The higher the hitcall (i.e. close to 1) the more plausible the concentration-response series indicates true biological activity in the measured response (i.e. 'active' hit).

-For each concentration series several point-of-departure (POD) estimates are calculated for the winning model.  The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50\%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10\%$ of the maximal response, and (5) the concentration at $5\%$ of the maximal response.  Though there are several other potency estimates calculated as part of the level 5 pipeline these five are the major POD estimates. The POD estimates mentioned in here are summarized in Figure 4.
-
+For each concentration series several point-of-departure (POD) estimates are calculated for the winning model.  The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50\%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10\%$ of the maximal response, and (5) the concentration at $5\%$ of the maximal response.  Though there are several other potency estimates calculated as part of the level 5 pipeline these five are the major POD estimates. The POD estimates mentioned in here are summarized in Figure 4. It is to note that the winning model can return a $\mathit{bmd}$ estimate that fails far out of the test concentration range, so bounds are placed to censor the estimate values. The lower and upper bounds for $\mathit{bmd}$ estimates are $0.1*\text{the lowest test concentration}$ and $10*\text{the the highest test concentration}$, respectively. If the calculated $\mathit{bmd}$ estimate is below or above the lower or the upper bounds, the value at the bound will be returned as the $\mathit{bmd}$ estimate instead.


Consider updated text: For each concentration series, several potency or point-of-departure (POD) estimates are calculated on the winning model. The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10%$ of the maximal response, and (5) the concentration at $5%$ of the maximal response. There are several additional potency and uncertainty estimates included as part of the level 5. The POD estimates mentioned in here are summarized in Figure 4. The winning model may return a $\mathit{bmd}$ estimate that falls outside of the tested concentration range, so bounds are placed to censor the estimate values. The lower and upper bounds for $\mathit{bmd}$ estimates are $0.1*\text{the lowest test concentration}$ and $10*\text{the the highest test concentration}$, respectively. If the calculated $\mathit{bmd}$ estimate is below or above the lower or the upper bounds, the value at the bound will be returned as the bounded $\mathit{bmd}$ estimate instead.

…nd bmd bounds

madison-feshuk

This looks great! Additional data processing vignette updates will be made as described in this ticket: #210

gracezhihuizhao added 3 commits April 12, 2024 15:47

Added description and plot of biphasic poly2 to table 7 and figure 2;…

31b8129

… Added explanations of new poly2 function and default error function.

Merge branch 'dev' of https://github.com/USEPA/CompTox-ToxCast-tcpl i…

261b7ef

…nto 227-tcpl-data-processing-vignette-updates-accounting-for-updates-to-tcplfit2

Updated the plottings with updated mc_vignette object.

33bfe6f

gracezhihuizhao requested review from kpaulfriedman, brown-jason, sedavid01 and madison-feshuk April 18, 2024 20:21

gracezhihuizhao linked an issue Apr 18, 2024 that may be closed by this pull request

tcpl Data Processing Vignette Updates Accounting for Updates to tcplfit2 #227

Open

11 tasks

Re-created mc_vignette example data sets.

b2d3c24

madison-feshuk reviewed May 20, 2024

View reviewed changes

mc_vignette.Rout Outdated

Copy link

Collaborator

madison-feshuk May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Rout needs to be removed from main tree

madison-feshuk reviewed May 20, 2024

View reviewed changes

gracezhihuizhao added 2 commits May 21, 2024 13:28

remove this file from the main tree

6c951dc

made some textual edits to the paragraphs describing biphasic poly2 a…

d812443

…nd bmd bounds

madison-feshuk approved these changes May 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240

227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240

gracezhihuizhao commented Apr 18, 2024

madison-feshuk commented May 20, 2024

madison-feshuk commented May 20, 2024 •

edited

madison-feshuk May 20, 2024

madison-feshuk May 20, 2024

madison-feshuk May 20, 2024

madison-feshuk left a comment

227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240

Are you sure you want to change the base?

227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240

Conversation

gracezhihuizhao commented Apr 18, 2024

madison-feshuk commented May 20, 2024

madison-feshuk commented May 20, 2024 • edited

madison-feshuk May 20, 2024

Choose a reason for hiding this comment

madison-feshuk May 20, 2024

Choose a reason for hiding this comment

madison-feshuk May 20, 2024

Choose a reason for hiding this comment

madison-feshuk left a comment

Choose a reason for hiding this comment

madison-feshuk commented May 20, 2024 •

edited