Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain NULLs when teaching count #324

Closed
wants to merge 2 commits into from

Conversation

chrisroat
Copy link

The removes a small oversight in the current explanation, that columns with NULL values are not counted. It also teases the fuller explanation of NULL and aggregation that comes later.

Closes #305

The removes a small oversight in the current explanation, that columns with NULL values are not counted.  It also teases the fuller explanation of NULL and aggregation that comes later.
@henrykironde
Copy link
Contributor

@r4space @remram44 any input on this?

@remram44
Copy link
Contributor

"how many values there are" is still a bit ambiguous, it could be read as "how many distinct values" (count(distinct reading)).

- counts rows, not values
- be more specific about the NULL in the person column of the dataset
@chrisroat
Copy link
Author

Good point -- changed it to rows. Also better explained (I think) the NULL in the person column.

@chrisroat
Copy link
Author

Any additional suggestions?

1 similar comment
@chrisroat
Copy link
Author

Any additional suggestions?

@chrisroat
Copy link
Author

If this change is not desired, I can drop this pull request.

@PaulHancock
Copy link

I would suggest noting that all these aggregation functions ignore null values in their computations (otherwise avg, sum, would result in null).
For the most part we get exactly what we intend when running these functions, with the exception of count.

The count function will count the number of non-null entries when run on a column, but will count the number of rows when run with count(*) (even if the rows contain nulls).

So count(colname) should be used when you want to know the number of non-null entries, where as count(*) should be used to count the number of rows total.

The current text:

We used count(reading) here, but we could just as easily have counted quant or any other field in the table, or even used count(*), since the function doesn’t care about the values themselves, just how many values there are.

Is therefore not correct, and should be updated.

I think the changes in 4e1bcb5, and 7fff12b should be accepted.

@zkamvar
Copy link
Contributor

zkamvar commented May 8, 2023

Thank you for your contribution. This lesson has migrated to use The Carpentries Workbench, but unfortunately, due to various factors, the Maintainers of this lesson were unable to address this pull request before the transition. Because of this, we had to close your pull request.

Please note that this does not mean that your contribution was not valued. There are many reasons why a pull request is not merged. It's important to remember that the Maintainers are first and foremost people---people who maintain this lesson on a voluntary basis. Sometimes pull requests become stale because other responsibilities take precedence. Thank you for taking the time to open the pull request in the first place.

If you wish to contribute again, you will need to delete and re-fork your repository.

How to contribute

If you wish to contribute, you will need to use the following steps to delete,
re-fork, and re-create your pull request (aka the burn it all down
strategy
):

📹 How to update (delete) your fork https://carpentries.github.io/workbench/faq.html#update-fork-from-styles

  1. Save your edits on locally or in a scratch space.
  2. Delete your fork
  3. Create a new fork or use the "edit" button on the page you wish to edit.
  4. Apply your changes (NOTE The Workbench uses a different syntax. Here is a Transition Guide from Styles to Workbench for your reference).

Questions

If you have any questions or would like assistance, please contact @core-team-curriculum (email: curriculum@carpentries.org) or you can respond to this message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ep. 6: Explanation of count() could be improved.
5 participants