Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added options immedate & replace to eng_SQL #2128

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

AWKruijt
Copy link

@AWKruijt AWKruijt commented May 18, 2022

As discussed here: fixes rstudio/rmarkdown#2241

I haven't added test files as these changes require access to an sql server with permission to create temporay tables. Haven't found that in an open database.

I'm excited about this! :)

Anne-Wil added 3 commits May 17, 2022 14:50
add initial attempt at SQL chunk option immediate
Improved handling of options$immediate and additional options$replace - these edits work in a knitr::knit_engines$set( ) style test. 

Immediate = T passes param immediate = T to dbExecute() / dbGetQuery(). Immedate is not implemented for query's with options$max.print set. 
I can't be sure if options$immediate also works icw options$params - I can't get params to run at all on my server. Passing values from local environment using ?value works however, also icw the immediate option. 

Replace = T adds code to drop an existing temporary table before running the into # statement. I thought this might be handy but leave it to others to decide on whether this is truly needed (I also don't really know if this implementation works on other servers than the one I use).
I've also included an error message to catch attempted replacement of non-temporary tables but error handling doesn't work yet as expected - when attempting to replace a table with a name not starting with #, the generic server error message appears: ([SQL Server]CREATE TABLE permission denied in database 'DWH'. ) - perhaps someone else will now how to fix, though perhaps we'll not want to implement the replace option at all. 

Example code: 
```{r qvalue}
StadiumCode <- '0A'
```


```{sql, immediate = T, replace = T}
SELECT  * 
into #tum
FROM statistiek.vw_dimStadium
where StadiumCode = ?StadiumCode

```

```{sql fetch}
SELECT   *
FROM #tum
```
Fixed error handling on temporary table check. What a difference a space makes...
@AWKruijt
Copy link
Author

AWKruijt commented May 18, 2022

reptable = gsub('(^.into[[:space:]]+)(#.+)([[:space:]]+from.$)', '\\2', sql, ignore.case = T)
on line 618 fixes the error handling on attempts to replace non-temporary tables.

@CLAassistant
Copy link

CLAassistant commented May 18, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Anne-Wil seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@cderv cderv added next Issues/PRs considered for the next release feature Feature requests labels May 18, 2022
@AWKruijt
Copy link
Author

AWKruijt commented May 18, 2022

Just checking (I'm really inexperienced >.<): I'm seeing this super inviting 'Update branch' button (with alternate update with rebase) - do I push that or do I just wait now? :)

@cderv
Copy link
Collaborator

cderv commented May 18, 2022

You can push that to update but that is not necessary. We'll do it ourself, or just merge the branch. Your change target in code part that has not been modified in some time, and tests are passing ok. So you can leave that way, and just wait for us to review. I added that in my TODO list. Thanks !

Copy link
Owner

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know much about SQL, so I only reviewed the coding style. BTW, the changes with pure white spaces in this PR need to be discarded (if you don't know how, we can handle this before we merge).

Thanks!

R/engine.R Outdated
Comment on lines 612 to 617
immediate= F
if(isTRUE(options$immediate)) {
immediate= T
replace= F
if(isTRUE(options$replace)) {
replace= T
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
immediate= F
if(isTRUE(options$immediate)) {
immediate= T
replace= F
if(isTRUE(options$replace)) {
replace= T
immediate = isTRUE(options$immediate)
replace = isTRUE(options$replace)
if (immediate && replace) {

R/engine.R Show resolved Hide resolved
Copy link
Collaborator

@cderv cderv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I made a few comments to share a few thoughts.
Currently, with this change :

  • We are passing immediate no matter what
  • We are setting the default value to FALSE whereas it seems to be NULL in DBI which has a different meaning.

I don't really know which backend that would impact or not, but I think it would be safer to prevent this somehow using more conditional logic. I can help tweak the PR if you do not know how to do it.

Thanks

R/engine.R Outdated
Comment on lines 613 to 617
if(isTRUE(options$immediate)) {
immediate= T
replace= F
if(isTRUE(options$replace)) {
replace= T
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would use sql. prefix on those new options, to go with existing sql.print and sql.show_interpolated.

@yihui I don't know if we can end up with a rule on under which condition which should use engine.opts, bare chunk option name or prefixed option name. I think we should try being consistent. sql.show_interpolated is not really documented so we could also support show_interpolated to be consistent.

Probably than the chunk engine is enough to scope the variable (advanced usage of chunk options hooks would require checking the engine, and then checking the option in case of same option name used by two different engine with different meaning - may never happen though).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's use the sql. prefix for the new options here.

I don't know if we can end up with a rule on under which condition which should use engine.opts, bare chunk option name or prefixed option name. I think we should try being consistent.

I agree. It has been a mess, which I don't like, but I feel it's tough to decide which is better:

```{sql, engine.opts = list(immediate = TRUE, replace = TRUE)}
```{sql, sql.immediate = TRUE, sql.replace = TRUE}

chunk engine is enough to scope the variable (advanced usage of chunk options hooks would require checking the engine, and then checking the option in case of same option name used by two different engine with different meaning - may never happen though).

That's also what I think.

data = tryCatch({
if (is_sql_update_query(query)) {
DBI::dbExecute(conn, query)
DBI::dbExecute(conn, query, immediate = immediate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to take precaution here.
DBI::dbExecute() is a generic and immediate argument, is indeed among the specifications. However, immediate is not part of the generic argument to improve compatibility with the different backends.
https://dbi.r-dbi.org/reference/dbexecute#additional-arguments-1

The following arguments are not part of the dbExecute() generic (to improve compatibility across backends) but are part of the DBI specification:

  • params (default: NULL)
  • immediate (default: NULL)

They must be provided as named arguments. See the "Specification" sections for details on their usage

This is valid for other calls too.

data = tryCatch({
if (is_sql_update_query(query)) {
DBI::dbExecute(conn, query)
DBI::dbExecute(conn, query, immediate = immediate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to take precaution here.
DBI::dbExecute() is a generic and immediate argument, is indeed among the specifications. However, immediate is not part of the generic argument to improve compatibility with the different backends.
https://dbi.r-dbi.org/reference/dbexecute#additional-arguments-1

The following arguments are not part of the dbExecute() generic (to improve compatibility across backends) but are part of the DBI specification:

  • params (default: NULL)
  • immediate (default: NULL)

They must be provided as named arguments. See the "Specification" sections for details on their usage

This is valid for other calls too.

R/engine.R Outdated
@@ -608,13 +609,26 @@ eng_sql = function(options) {
max.print = -1
sql = one_string(options$code)
params = options$params

immediate= F
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBI seems to default to NULL for dbExecute and dbGetQuery, which means
https://dbi.r-dbi.org/reference/dbexecute#specification-for-the-immediate-argument-1

The default NULL means that the backend should choose whatever API makes the most sense for the database

So setting FALSE, will change behavior for all the DB that backends that will use TRUE by default.

Maybe we should stick to that, and pass immediate = TRUE or FALSE when the chunk option is explicitly set ?

R/engine.R Outdated
@@ -608,13 +609,26 @@ eng_sql = function(options) {
max.print = -1
sql = one_string(options$code)
params = options$params

immediate= F
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBI seems to default to NULL for dbExecute and dbGetQuery, which means
https://dbi.r-dbi.org/reference/dbexecute#specification-for-the-immediate-argument-1

The default NULL means that the backend should choose whatever API makes the most sense for the database

So setting FALSE, will change behavior for all the DB that backends that will use TRUE by default.

Maybe we should stick to that, and pass immediate = TRUE or FALSE when the chunk option is explicitly set ?

- added prefixes sql. to immediate and replace options
- set NULL as default for sql.immediate
- handling of user input FALSE as well as TRUE for sql.immediate
- added stop when attempting to replace (sql.replace == T) when not executing immediately (sql.immedate == F) 
- removed all-whitespaced lines as well as I could 

To be discussed: implement non-passing of parameter immediate when options$sql.immediate is not set.
@AWKruijt
Copy link
Author

Sorry for the radio silence - things where happening. Thanks for all the input! I've implemented as much as I could - see commit 358d552.

A big one that is open for debate still is whether/how to make it so that paramater immediate is NOT passed to dbGetQuery when the user did not set it. I suppose that's what we want, given that it's cleanest if we also don't pass the NULL value if sql.immediate is not actively invoked. For option params the code has a split based on length(params) == 0 and separately defined calls to DBI::dbGetQuery() with and without passing of params. We could expand that into four calls (with/without immediate and/or params), but perhaps something along the lines of a do.call with a list of arguments could be preferable? I did try every trick I could think of along the lines of DBI::dbGetQuery(conn, sql, if(exists('immedate'){immedate = immediate}) but no joy :/

@cderv cderv self-assigned this Jun 15, 2022
@cderv cderv self-requested a review June 15, 2022 14:38
Fix regex for lifting temporary table name to replace when sql.replace =T + a correction to the getQuery call with params where  `sql` was used instead of `query`.
@AWKruijt
Copy link
Author

Alas, I notice that in practice the replace option doesn't always work flawlessly due to the regex to lift the tempary table name from the query also picking up new lines (/n) immediately after the #table name. Fixed the regex and made another correction to the getQuery call with params where I had made it use sql instead of query.

Copy link
Collaborator

@cderv cderv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why we need the exist() for retrieving the option. I commented about that.

We could expand that into four calls (with/without immediate and/or params), but perhaps something along the lines of a do.call with a list of arguments could be preferable? I did try every trick I could think of along the lines of DBI::dbGetQuery(conn, sql, if(exists('immedate'){immedate = immediate}) but no joy :/

After looking closer to DBI, I think the most important thing is not changing the default. So that

dbExecute(con, immediate = options$sql.immediate)

correctly pass NULL value. I don't think that methods that don't have immediate will break because of the ... - it will just be unused argument in this. So we may be fine passing it anyway.

Otherwise, the do.call alternative seems like a good one, with a list of argument containing immediate or not depending on the option. Did you try that already ?

Also, did you come up with test document for this PR ? Just curious to avoid making a new one.

Thanks

R/engine.R Outdated
Comment on lines 613 to 617
if(isTRUE(options$immediate)) {
immediate= T
replace= F
if(isTRUE(options$replace)) {
replace= T
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would use sql. prefix on those new options, to go with existing sql.print and sql.show_interpolated.

@yihui I don't know if we can end up with a rule on under which condition which should use engine.opts, bare chunk option name or prefixed option name. I think we should try being consistent. sql.show_interpolated is not really documented so we could also support show_interpolated to be consistent.

Probably than the chunk engine is enough to scope the variable (advanced usage of chunk options hooks would require checking the engine, and then checking the option in case of same option name used by two different engine with different meaning - may never happen though).

Comment on lines +611 to +612
immediate = NULL
if(exists("sql.immediate", where = options) ) { immediate = options$sql.immediate }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need exist here ?

Suggested change
immediate = NULL
if(exists("sql.immediate", where = options) ) { immediate = options$sql.immediate }
immediate = options$sql.immediate

would be enough, wouldn't it ? If sql.immediate is not set then it will be NULL

options = list()
options$sql.immediate
#> NULL

Did I missed something ?

@@ -608,13 +608,26 @@ eng_sql = function(options) {
max.print = -1
sql = one_string(options$code)
params = options$params
immediate = NULL
if(exists("sql.immediate", where = options) ) { immediate = options$sql.immediate }
if(exists('sql.replace', where = options) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This is not a usual pattern. I am not sure I see why we can't retrieve the options and then check ?

immediate = options$sql.immediate
replace = options$sql.replace

if (isTRUE(replace) && isTRUE(immediate)) {
 do_somethings
}
... 

What did I miss ?

immediate = NULL
if(exists("sql.immediate", where = options) ) { immediate = options$sql.immediate }
if(exists('sql.replace', where = options) ) {
if(!isTRUE(immediate)) knitr:::stop2("To replace a temprary table, option sql.immediate has to be set to TRUE).")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(!isTRUE(immediate)) knitr:::stop2("To replace a temprary table, option sql.immediate has to be set to TRUE).")
if(!isTRUE(immediate)) knitr:::stop2("To replace a temporary table, option sql.immediate has to be set to TRUE).")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature requests next Issues/PRs considered for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Run SQL chunk with DBI option immediate = TRUE ?
4 participants