-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240
base: dev
Are you sure you want to change the base?
227 tcpl data processing vignette updates accounting for updates to tcplfit2 #240
Conversation
… Added explanations of new poly2 function and default error function.
…nto 227-tcpl-data-processing-vignette-updates-accounting-for-updates-to-tcplfit2
@gracezhihuizhao getting the following error when knitting: processing file: Data_processing.Rmd Error:
|
Looks like the mc3 data was not updated with latest invitrodb data pull. "data-raw/db_cred.R" does not exist in the package... https://github.com/USEPA/CompTox-ToxCast-tcpl/blob/dev/data-raw/mc_vignette.R#L23 Consider removing that line and updating to as: tcplConf(user = "_dataminer", pass = "pass", db = invitrodb, host = "ccte-mysql-res.epa.gov", drvr = "MySQL") |
mc_vignette.Rout
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Rout needs to be removed from main tree
vignettes/Data_processing.Rmd
Outdated
@@ -955,7 +960,7 @@ htmlTable(output, | |||
|
|||
``` | |||
|
|||
Each of these models assumes the background response is zero and the absolute response (or initial response) is increasing. Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within <font face="CMTT10"> tcplfit2_core </font> function from <font face="CMTT10"> tcplFit2 </font>. Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned. Finally, in cases where 'nofit' was set to TRUE (within <font face="CMTT10"> tcplFit2::tcplfit2_core </font>) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA. A complete list of model output parameters is provided in Table 8 below. | |||
Most of these models assumes the background response is zero and the absolute response (or initial response) is increasing. In other words, most of these models are able to fit a monotonic curve for either direction. Polynomial 2 model is an exception because it has two parameterizations. By default, the biphasic parameterization will be used. Biphasic Polynomial 2 is able to fit curve to responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). In applications in which biphasic responses are not reasonable, polynomial 2 can be fitted using the monotonic only parameterization. Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within <font face="CMTT10"> tcplfit2_core </font> function from <font face="CMTT10"> tcplFit2 </font>. Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned. Finally, in cases where 'nofit' was set to TRUE (within <font face="CMTT10"> tcplFit2::tcplfit2_core </font>) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA. A complete list of model output parameters is provided in Table 8 below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see updated text with minor edits: Most of these models assume the background response is zero and the absolute response (or initial response) is increasing. In other words, most of these models are able to fit a monotonic curve for either direction. The polynomial 2 model is an exception because it has two parameterizations. By default, the biphasic parameterization will be used in tcpl. A biphasic polynomial 2 model is able to fit a curve to responses that are increasing first and then decreasing, and vice versa (assuming the background response is zero). In applications in which biphasic responses are not reasonable, polynomial 2 can be fit using the monotonic only parameterization.
Upon completion of the model fitting, each model gets a success designation: 1 if the model optimization converges, 0 if the optimization fails, and NA if 'nofit' was set to TRUE within tcplfit2_core function from tcplFit2 . Similarly, if the Hessian matrix was successfully inverted then 1 is returned to indicate a successful covariance calculation (cov); otherwise 0 is returned. Finally, in cases where 'nofit' was set to TRUE (within tcplFit2::tcplfit2_core ) or the model fit failed the Akaike information criterion (aic), root mean squared error (rme), model estimated responses (modl), model parameters (parameters), and the standard deviation of model parameters (parameter sds) are set to NA. A complete list of model output parameters is provided in Table 8 below.
vignettes/Data_processing.Rmd
Outdated
@@ -1269,8 +1275,7 @@ tcplMthdAssign( | |||
|
|||
As described previously, since the continuous hit call is the product of three proportional weights, and the resulting value is between 0 and 1. The higher the hitcall (i.e. close to 1) the more plausible the concentration-response series indicates true biological activity in the measured response (i.e. 'active' hit). | |||
|
|||
For each concentration series several point-of-departure (POD) estimates are calculated for the winning model. The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50\%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10\%$ of the maximal response, and (5) the concentration at $5\%$ of the maximal response. Though there are several other potency estimates calculated as part of the level 5 pipeline these five are the major POD estimates. The POD estimates mentioned in here are summarized in Figure 4. | |||
|
|||
For each concentration series several point-of-departure (POD) estimates are calculated for the winning model. The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) ($\mathit{bmd}$), (2) the activity concentration at $50\%$ of the maximal response ($\mathit{ac50}$), (3) the activity concentration at the efficacy cutoff ($\mathit{acc}$), (4) the activity concentration at $10\%$ of the maximal response, and (5) the concentration at $5\%$ of the maximal response. Though there are several other potency estimates calculated as part of the level 5 pipeline these five are the major POD estimates. The POD estimates mentioned in here are summarized in Figure 4. It is to note that the winning model can return a $\mathit{bmd}$ estimate that fails far out of the test concentration range, so bounds are placed to censor the estimate values. The lower and upper bounds for $\mathit{bmd}$ estimates are $0.1*\text{the lowest test concentration}$ and $10*\text{the the highest test concentration}$, respectively. If the calculated $\mathit{bmd}$ estimate is below or above the lower or the upper bounds, the value at the bound will be returned as the $\mathit{bmd}$ estimate instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider updated text: For each concentration series, several potency or point-of-departure (POD) estimates are calculated on the winning model. The major estimates include: (1) the activity concentration at the specified benchmark response (BMR) (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Additional data processing vignette updates will be made as described in this ticket: #210
Working on updating the Data Processing vignette as part of the larger ticket #210
tcplfit2 related content updates:
Plots and table updates: