Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'rows affected' output when using sql engine #2051

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

edalfon
Copy link

@edalfon edalfon commented Sep 25, 2021

Adds a default output for sql chunks where is_sql_update_query evaluates to TRUE, as proposed in #2050.

It does it simply by:

  • Commit 1: propagating DBI::DBI::dbExecute's result to data, so that users can set output.var and get the number of affected rows assigned to a variable.
  • Commit 2: adding a default output, indicating the number of rows affected.

I included a line in NEWS.md describing this. However, I did not include a test. I was not familiar with knitr's testing procedures, but I learned from this recent PR (#1987) that there is one major test for knitr's sql engine in another repository (https://raw.githubusercontent.com/yihui/knitr-examples/master/115-engine-sql.Rmd). This already covers the changes in this PR. I used it to test the changes and everything works. So I didn't really think it was necessary to include additional tests.

A final note: as discussed in knitr's issue #1637, I think a similar change would be needed on RStudio's side, to make it also work when running chunks interactively (RStudio's Run current chunk, Run all, etc.). I haven't looked into that, though, and I have no idea who to poke for this, but I'd be happy to take a look.

Huge thanks for knitr and hope this helps a tiny bit.

also return data ("scalar numeric that specifies the number of rows affected") in case of an is_sql_update_query, and not just SELECT-like queries
and print by default a message with number of rows affected
@CLAassistant
Copy link

CLAassistant commented Sep 25, 2021

CLA assistant check
All committers have signed the CLA.

@cderv cderv linked an issue Sep 27, 2021 that may be closed by this pull request
3 tasks
Copy link
Collaborator

@cderv cderv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look. This looks fine overall but my main concern is about knitr writing text output. It is not the same to write a resulting table than writing text including a data result. I believe we must not presuppose what a user would like to have written as body text in its output document. Also, it seems the output text is not really relevant. Let's look into that.

First, I am all for saving data for the result of dbExecute() so that it can be saved in output_var and then used later.

Regarding printing, I think it would be ok to have like a informative message() printed, written in english and formatted as we want, but that would output like code result by default. knitr already give such messages in several places. However, it is not the same to return as body text (using results = 'asis'), all the more a formatted text in English with no way to customize. See my inline comments for why.

If we want to stay with the idea of outputting body text (results = 'asis'), we could offer configuration so that it can be customized. This is done already for the table when no sql.print function is provided. options$tab.cap can be specified to customize the table caption. Otherwise a default caption is shown.
See line 611 in the engine.R file for the SQL engine.
However, a table caption is not the same a text in a report. Personally, I would prefer we output a log-like message, like a message.

If we think customization, a user could save the result in output_var, and format itself the text he would like to show, as he would do with any other chunk output I believe, using the variable created and inline r chunks. So is this important to print a formatted output asis ?

Also, regarding the textual output, I am thinking about your comment

it seems SQLite does not actually return the number of rows affected (apparently always returns 0). Yet, this is pretty standard in most RDBMS and it is also the default in DBI::dbExecute, so I guess this is just a quirk specific to SQLite.

If we are not sure that all RDBMS outputs a number of rows, I guess it would be puzzling to have written in your report

Number of affected rows: 0

after a update query... 🤔

From your RStudio projects (https://rstudio.cloud/project/2931829) (thanks for this by the way, that is awesome!), it seems that not all requests are returning a number of rows. For example,

DROP TABLE IF EXISTS toy;

after having created and computed on toy also returns 0 which writes in the output as Markdown text that 0 zeros are affected. It could be puzzling to users.

image

I have published your HTML document into R Pubs to easily see it: https://rpubs.com/cderv/knitr-pr-2051

So overall, about the SQL engine, my thinking is :

  • We should stay clause to what the DB specialized tools are doing.
  • DBI::DBexecute() returns a value, and it seems fair that we allow it to be accessible to the user
  • DBI::DBexecute() does not print log message, and just return value. So we would in knitr do more than the tools offers. However, it makes sense to a publishing to tool to format some output (like we do with tables). But we must do this properly without assuming too much on what a user would like.

After this long review, does anyone of you two @edalfon or @yihui are sharing my concerns ?

@@ -1,3 +1,7 @@
- The `sql` engine now produces an output for update-like SQL queries, indicating the number of rows affected (as returned by DBI::dbExecute). So far, `knitr` produces a nicely formatted output for queries returning a result set (typically SELECT...), but gives no feedback at all for queries making changes to the DB (e.g. INSERT|UPDATE|DELETE|CREATE|DROP; see not exported function knitr:::is_sql_update_query). For such queries, `knitr` now shows the number of affected rows. You can also set the chunk option `output.var` to assign the number of affected rows to a variable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so you know, this should be under the last header 1 you see.
1.35 is the next version so this should be moved under.

I'll do it before merging.

Comment on lines +636 to +640
if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
options$results = 'asis'
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value
output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow the style of the previous output assignment, and making it more clear about the different output possible, I would use a else here for the previous if

Suggested change
if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
options$results = 'asis'
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value
output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))
}
else if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
options$results = 'asis'
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value
paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))
}

@@ -633,6 +633,11 @@ eng_sql = function(options) {

} else print(display_data) # fallback to standard print
})
if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
options$results = 'asis'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know if we want to force results = 'asis' here. The user would have not control over it.

I understand why it makes sens to have it as markdown text in the output document, but it may not please everyone who would prefer to have it code output.

I don't really know but usually users have control over how the output is show. 🤔

@yihui what do you think ?

if (is.numeric(data) && length(data) == 1 && is.null(varname)) {
options$results = 'asis'
# format = "fg" instead of "d". Row counts on DB may be greater than integer max value
output = paste0("Number of affected rows: ", formatC(data, format = "fg", big.mark = ','))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to previous comment, just wondering about internationalization:

  • We are enforcing a english sentence to be output in a report. This is not perfect for non English report. With result = 'asis', this sentence would be shown as usual text. If I am writing a French report, that would be weird to have such English output in the middle of other text, wouldn't it ?
    That is why at least, not enforcing asis output seems better to me
  • Also, about big.mark = ",", I am just wondering if this is necessary. It is not something usual worldwide, and could even be confusing for some international user. For example, we don't have that in Europe, we often have no big mark or a space. A comma is used as decimal too. However, the context of number of rows makes 12,500 clear enough for a european even if it can be confusing. We would use 12 500 or just 12500. Is it something common for such update query output ?

Maybe we could / should make at least all that configurable ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!

What I did was trying to keep it similar to how the caption is formatted (line 614 on the same file)

rows_formatted = formatC(rows, format = "d", big.mark = ',')

but you are right that in that case, the user has options to configure the output

@edalfon
Copy link
Author

edalfon commented Sep 27, 2021

Many thanks for reviewing this. I agree with pretty much everything you said. In particular, that saving data for the result of dbExecute() is key. And although I share your concerns on printing, perhaps I just feel less strongly about them.

And the reason I feel less strongly about those concerns is that I think the most common use cases for this output are:

  • to get quick feedback while interactively trying out queries
  • to leave documented the result of data processing on the DB side

But I did not think such an output would be mainly used in -let's say- a public facing document (e.g. final report or the like). In that case, as you said, the users can set output.var and format and use it as they wish.

But for the two use cases above, I thought a quick default without configuring anything else would be useful (the user simply keep using a {sql} chunk and there you go). Therefore, I put a quick-and-dirty approach in that second commit.

I totally share your concerns. It's not cool forcing results = 'asis', hard-coding output in English and a particular number format. Ideally, the user should be able to customize all that. But I was not sure it was worth doing it. After all, if the users do want a particular format for that info, they can do it via output.var. So, I guess one should weigh the benefits of doing that vs. the costs in terms of code complexity, maintainability, and so on. Perhaps I perceived that the costs outweigh the benefits. But you are for sure in a better position to make such assessment.

@edalfon
Copy link
Author

edalfon commented Sep 27, 2021

a note on this

Also, regarding the textual output, I am thinking about your comment

it seems SQLite does not actually return the number of rows affected (apparently always returns 0). Yet, this is pretty standard in most RDBMS and it is also the default in DBI::dbExecute, so I guess this is just a quirk specific to SQLite.

If we are not sure that all RDBMS outputs a number of rows, I guess it would be puzzling to have written in your report

My bad here!!! SQLite do also return the number of rows affected. I wrote that after a couple of failed attempts, but I cannot really reproduce them now (perhaps I just looked at the results of DROP TABLE, which are in fact usually 0).

Having said that, DBI::dbExecute can indeed return puzzling results. For example, when you combine several queries, I've seen weird numbers returned by DBI::dbExecute with a Postgres backend (PostgreSQL do let you send multiple statements at once, but duckdb, for example, throws an error. SQLite, I think, only processes the first statement and ignores the rest). But my take on this is that's DBI's deal and usage, and knitr cannot and should not solve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

produce output for UPDATE-like queries using sql engine
3 participants