Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

gracezhihuizhao
Copy link

Working on updating the Data Processing vignette as part of the larger ticket #210

tcplfit2 related content updates:

  • Added description on the BMD bounds
  • Added a statement about the default t distribution for error
  • Mentioned the poly2 biphasic updates

Plots and table updates:

  • Added biphasic poly2 to Table 7 and Figure 2
  • Updated Figure 3, Table 9, and Figure 4 with updated mc_vignette object.

… Added explanations of new poly2 function and default error function.
…nto 227-tcpl-data-processing-vignette-updates-accounting-for-updates-to-tcplfit2
@madison-feshuk
Copy link
Collaborator

@gracezhihuizhao getting the following error when knitting:

processing file: Data_processing.Rmd
|.............................. | 58% [unnamed-chunk-35] [unnamed-chunk-17]
Quitting from lines 725-866 [unnamed-chunk-35] (Data_processing.Rmd)

Error:
! object 'conc' not found
Backtrace:

  1. mc3[, :=(logc, log10(conc))]
  2. data.table:::[.data.table(mc3, , :=(logc, log10(conc)))
  3. base::eval(jsub, SDenv, parent.frame())
  4. base::eval(jsub, SDenv, parent.frame())
    Execution halted

@madison-feshuk
Copy link
Collaborator

madison-feshuk commented May 20, 2024

Looks like the mc3 data was not updated with latest invitrodb data pull. "data-raw/db_cred.R" does not exist in the package... https://github.com/USEPA/CompTox-ToxCast-tcpl/blob/dev/data-raw/mc_vignette.R#L23

Consider removing that line and updating to as: tcplConf(user = "_dataminer", pass = "pass", db = invitrodb, host = "ccte-mysql-res.epa.gov", drvr = "MySQL")
This connection can then be commented out after new files are pulled. Not a fan of the acronym "ivtdb" for continuity

mc_vignette.Rout Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Rout needs to be removed from main tree

@@ -955,7 +960,7 @@ htmlTable(output,

```

Each of these models assumes the background response is zero and the absolute response (or initial response) is increasing. Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within <font face="CMTT10"> tcplfit2_core </font> function from <font face="CMTT10"> tcplFit2 </font>. Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned. Finally, in cases where 'nofit' was set to TRUE (within <font face="CMTT10"> tcplFit2::tcplfit2_core </font>) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA. A complete list of model output parameters is provided in Table 8 below.
Most of these models assumes the background response is zero and the absolute response (or initial response) is increasing. In other words, most of these models are able to fit a monotonic curve for either direction. Polynomial 2 model is an exception because it has two parameterizations. By default, the biphasic parameterization will be used. Biphasic Polynomial 2 is able to fit curve to responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). In applications in which biphasic responses are not reasonable, polynomial 2 can be fitted using the monotonic only parameterization. Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within <font face="CMTT10"> tcplfit2_core </font> function from <font face="CMTT10"> tcplFit2 </font>. Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned. Finally, in cases where 'nofit' was set to TRUE (within <font face="CMTT10"> tcplFit2::tcplfit2_core </font>) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA. A complete list of model output parameters is provided in Table 8 below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see updated text with minor edits: Most of these models assume the background response is zero and the absolute response (or initial response) is increasing. In other words, most of these models are able to fit a monotonic curve for either direction. The polynomial 2 model is an exception because it has two parameterizations. By default, the biphasic parameterization will be used in tcpl. A biphasic polynomial 2 model is able to fit a curve to responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). In applications in which biphasic responses are not reasonable, polynomial 2 can be fit using the monotonic only parameterization.

Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within tcplfit2_core function from tcplFit2 . Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned. Finally, in cases where 'nofit' was set to TRUE (within tcplFit2::tcplfit2_core ) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA. A complete list of model output parameters is provided in Table 8 below.

@@ -1269,8 +1275,7 @@ tcplMthdAssign(

As described previously, since the continuous hit call is the product of three proportional weights, and the resulting value is between 0 and 1. The higher the hitcall (i.e. close to 1) the more plausible the concentration-response series indicates true biological activity in the measured response (i.e. 'active' hit).

For each concentration series several point-of-departure (POD) estimates are calculated for the winning model. The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50\%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10\%$ of the maximal response, and (5) the concentration at $5\%$ of the maximal response. Though there are several other potency estimates calculated as part of the level 5 pipeline these five are the major POD estimates. The POD estimates mentioned in here are summarized in Figure 4.

For each concentration series several point-of-departure (POD) estimates are calculated for the winning model. The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50\%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10\%$ of the maximal response, and (5) the concentration at $5\%$ of the maximal response. Though there are several other potency estimates calculated as part of the level 5 pipeline these five are the major POD estimates. The POD estimates mentioned in here are summarized in Figure 4. It is to note that the winning model can return a $\mathit{bmd}$ estimate that fails far out of the test concentration range, so bounds are placed to censor the estimate values. The lower and upper bounds for $\mathit{bmd}$ estimates are $0.1*\text{the lowest test concentration}$ and $10*\text{the the highest test concentration}$, respectively. If the calculated $\mathit{bmd}$ estimate is below or above the lower or the upper bounds, the value at the bound will be returned as the $\mathit{bmd}$ estimate instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider updated text: For each concentration series, several potency or point-of-departure (POD) estimates are calculated on the winning model. The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10%$ of the maximal response, and (5) the concentration at $5%$ of the maximal response. There are several additional potency and uncertainty estimates included as part of the level 5. The POD estimates mentioned in here are summarized in Figure 4. The winning model may return a $\mathit{bmd}$ estimate that falls outside of the tested concentration range, so bounds are placed to censor the estimate values. The lower and upper bounds for $\mathit{bmd}$ estimates are $0.1*\text{the lowest test concentration}$ and $10*\text{the the highest test concentration}$, respectively. If the calculated $\mathit{bmd}$ estimate is below or above the lower or the upper bounds, the value at the bound will be returned as the bounded $\mathit{bmd}$ estimate instead.

Copy link
Collaborator

@madison-feshuk madison-feshuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Additional data processing vignette updates will be made as described in this ticket: #210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tcpl Data Processing Vignette Updates Accounting for Updates to tcplfit2
2 participants