Skip to content

Commit

Permalink
updated notes for data anlaysis pipelines
Browse files Browse the repository at this point in the history
  • Loading branch information
ttimbers committed Mar 4, 2024
1 parent 56f24e7 commit 6cb4ef6
Show file tree
Hide file tree
Showing 18 changed files with 2,861 additions and 1,105 deletions.
184 changes: 179 additions & 5 deletions docs/_sources/materials/lectures/08-reproducible-reports.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,16 @@
"<img src=\"img/viz-md-button.png\" width=125>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Quarto in VS Code\n",
"\n",
"You can also use Quarto in VS Code with R or Python. If you decide to use this editor, it is highly recommended that you use the [VS Code Quarto extension](https://marketplace.visualstudio.com/items?itemName=quarto.quarto). This will allow you to preview the rendered document, similar to how this can be done in RStudio.\n",
"In VS Code, to preview the rendered document you click \"Preview\" (instead of \"Render\" as in RStudio), which is located at the top right-hand side of the document you are working on."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -397,7 +407,7 @@
"\n",
"In particular, with the assigning figure numbers, we want to do this in a automated way so that if you change the order that figures show up in your report during the editing process, you do not have to manually renumber them.\n",
"\n",
"The syntax below takes our Markdown figure and gives it a caption (via adding the caption between the `[ ]`), gives it a label so that Quarto will do automated numbering of the figure and so you can cross reference it in the text (via `#fig-some_name`), and specifies the size of the figure to be 50% (via `width=50%`).\n",
"The syntax below takes our Markdown figure and gives it a caption (via adding the caption between the `[ ]`), gives it a label so that Quarto will do automated numbering of the figure and so you can cross reference it in the text (via `#fig-some-name`), and specifies the size of the figure to be 50% (via `width=50%`).\n",
"\n",
"```\n",
"![The Banff International Research Station campus in February 2024.](img/banff.png){#fig-banff width=50%}\n",
Expand Down Expand Up @@ -464,7 +474,7 @@
"\n",
"````\n",
"```{r}\n",
"#| label: tbl-my_data\n",
"#| label: tbl-my-data\n",
"#| tbl-cap: Some relevant description about the data in my table.\n",
"\n",
"table_to_display <- read_csv(\"my_data.csv\")\n",
Expand All @@ -474,7 +484,7 @@
"\n",
"> Note that we only showed the example in one language because the code chunk options for this are the same in both R and Python.\n",
"\n",
"To cross reference the table in the narrative of the report, we write `@tbl-my_data` when we want to refer to it. That will change to Table 1 in the rendered report, if the table named `tbl-my_data` is the first table embedded in the report. This works in both R and Python."
"To cross reference the table in the narrative of the report, we write `@tbl-my-data` when we want to refer to it. That will change to Table 1 in the rendered report, if the table named `tbl-my-data` is the first table embedded in the report. This works in both R and Python."
]
},
{
Expand Down Expand Up @@ -552,7 +562,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## A helpful hint for successfully working with R Markdown documents\n",
"## A helpful hint for successfully working with Quarto documents\n",
"\n",
"Given that you need to render the entire document to see your Markdown and LaTeX rendered,\n",
"it is important to render often as you make changes.\n",
Expand All @@ -564,6 +574,170 @@
"and then will be able to easily identify and fix your errors."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Quarto reports in subdirectories\n",
"\n",
"When working with a more complex project, \n",
"it is a good practice to split our files up into subdirectories.\n",
"This means that our Quarto documents often end up in a directory\n",
"called `reports` or `analysis` or something similar.\n",
"This best practice can lead to headaches when rendering the document,\n",
"as the relative paths for loading in data artifacts (figures and tables)\n",
"can change depending on where the document is executed from.\n",
"\n",
"In this course we will adopt the strategy that Quarto documents \n",
"will be executed from the subdirectory that live in \n",
"(e.g., `reports` or `analysis` or something similar).\n",
"This is actually Quarto's default when using the `quarto render ...` command.\n",
"To ensure that the preview works correctly with this setup, however,\n",
"we need to add an empty `_quarto.yml` file to our project root directory.\n",
"This can be done by running `touch _quarto.yml` in the terminal in your project root.\n",
"This document can be used to set many kinds of Quarto configurations \n",
"(see all [here](https://quarto.org/docs/projects/quarto-projects.html#project-metadata)),\n",
"including saving the rendered documents to different directories \n",
"(which can be useful if you plan to serve up your report as a nice, \n",
"human readable HTML document on the web as [documented here](https://quarto.org/docs/publishing/github-pages.html))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Strategies for collaborative writing with Quarto\n",
"\n",
"Although Quarto is wonderful for incorporating code, \n",
"and code-generated research artifacts in a reproducible manner, \n",
"it is not as user-friendly for collaborative writing as other tools \n",
"(e.g., Google docs). \n",
"In particular, the lack of real-time collaboration \n",
"and possibilities for merge conflicts that cannot be automatically resolved\n",
"are the greatest challenges.\n",
"Thus, to make collaborative writing efficient and effective with Quarto,\n",
"we need to take a thoughtful approach to using it for this purpose.\n",
"Clear communication helps, in particular with avoiding merge conflicts,\n",
"however we can do more than that to make this work even better.\n",
"We outline below, two strategies we can incorporate, \n",
"which function to help reduce the number of merge conflicts \n",
"that cannot be automatically resolved,\n",
"and create a cohesive document, even with multiple authors.\n",
"\n",
"### Child documents\n",
"\n",
"When multiple authors are editing a Quarto document\n",
"at the same time in collaborative writing, \n",
"there ends up being a good chance that a merge conflict will occur,\n",
"and potentially one that cannot be automatically resolved.\n",
"One way to avoid this is using communication to assign different authors\n",
"to work on different sections of the document - \n",
"reducing the chance that they will be editing the same lines at the same time.\n",
"With this strategy however, \n",
"this still can happen with sections that are adjacent to each other\n",
"as Git uses the line numbers, \n",
"not section headers to decide which changes are overappling changes.\n",
"\n",
"Thus, an even better strategy is the use of child documents.\n",
"Child documents are documents that are sourced into the main (parent)\n",
"document when the document is rendered (you get one single document rendered, \n",
"which includes all the content from the parent and child documents). \n",
"A good idea in collaborative writing is to split each section into a \n",
"separate child document. That way, each person working on a section is working on a separate document - minimizing greatly the number of potential merge conflicts.\n",
"\n",
"The [Includes syntax](https://quarto.org/docs/authoring/includes.html) is the way to incorporate a child document \n",
"into a parent document in Quarto.\n",
"Below we show an example parent document (`breast_cancer_predictor_report.qmd`) \n",
"which uses child documents to split the sections of the report into separate documents.\n",
"\n",
"```\n",
"---\n",
"title: \"Predicting breast cancer from digitized images of breast mass\"\n",
"author: \"Tiffany A. Timbers, Joel Ostblom & Melissa Lee\"\n",
"format:\n",
" pdf\n",
" toc: true\n",
" toc-depth: 2\n",
"bibliography: references.bib\n",
"execute:\n",
" echo: false\n",
" warning: false\n",
"---\n",
"\n",
"## Abstract\n",
"\n",
"{{< include _abstract.qmd >}}\n",
"\n",
"## Introduction\n",
"\n",
"{{< include _introduction.qmd >}}\n",
"\n",
"## Methods\n",
"\n",
"{{< include _methods.qmd >}}\n",
"\n",
"## Results & Discussion\n",
"\n",
"{{< include _results-and-discussion.qmd >}}\n",
"\n",
"## References\n",
"```\n",
"\n",
"This document would live in a project structure something like this\n",
"(only zooming in on the reports sub-directory:\n",
"\n",
"```\n",
"project/ \n",
"├── data/ \n",
"├── reports/\n",
"│ ├── _abstract.qmd\n",
"│ ├── _introduction.qmd\n",
"│ ├── _methods.qmd\n",
"│ ├── _results-and-discussion.qmd\n",
"│ └── breast_cancer_predictor_report.qmd \n",
"├── src/ \n",
"├── doc/ \n",
"├── README.md\n",
"└── Dockerfile\n",
"```\n",
"\n",
"And when rendered, \n",
"(by running `quarto render reports/breast_cancer_predicotr_report.qmd`)\n",
"a single PDF document would be rendered from the 5 separate `.qmd` files.\n",
"\n",
"### Smoothing\n",
"\n",
"One downside to splitting collaborative writing into sections,\n",
"is the the resulting document reads like it was written by several authors\n",
"(because it was)!\n",
"Additionally, \n",
"the transitions between different sections of the report are often quite disjoint and abrupt (Stout, 2022).\n",
"This is really undesirable\n",
"as it makes it more difficult for the reader to understand the report.\n",
"To counteract this effect of drafting a manuscript collaboratively by sections,\n",
"we need to employ a smoothing process to blend the writing styles from the various \n",
"authors, so that in the end, it will read as one consistently styled document,\n",
"similar to documents written by a single author (Stout, 2022).\n",
"\n",
"Smoothing is a step, after the initial draft is generated by separate authors,\n",
"where the authors trade section assignments and edit and revise a different section\n",
"than which they initially drafted. \n",
"It is also advisable after this step of smoothing, \n",
"to do another step, where the entire manuscript is read by at least 2-3 authors \n",
"where they pay particular attention to section transitions.\n",
"This extra work, results in a more cohesive, clear and easy to read document, \n",
"and should be used in any collaborative writing project (Stout, 2022).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"Sara Stoudt (2022) Collaborative Writing Workflows in the Data-Driven Classroom: A Conversation Starter, Journal of Statistics and Data Science Education, 30:3, 282-288, DOI: 10.1080/26939169.2022.2082602"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -591,7 +765,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
"version": "3.12.1"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 6cb4ef6

Please sign in to comment.