Badge + docs updates (#348)

* Badge on docs updates * default to CCDS template * Style updates * darken links a bit * block quotes * formatting and bare ccds * Apply suggestions from code review Co-authored-by: Chris Kucharczyk <chris@drivendata.org> --------- Co-authored-by: Chris Kucharczyk <chris@drivendata.org>
drivendataorg · Mar 16, 2024 · 6b9eb7c · 6b9eb7c
1 parent e51553f
commit 6b9eb7c
Show file tree

Hide file tree

Showing 9 changed files with 267 additions and 26 deletions.
diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ pip install cookiecutter-data-science
 To start a new project, run:
 
 ```bash
-ccds https://github.com/drivendata/cookiecutter-data-science
+ccds
 ```
 
 [![asciicast](https://asciinema.org/a/244658.svg)](https://asciinema.org/a/244658)

diff --git a/ccds/__main__.py b/ccds/__main__.py
@@ -23,7 +23,18 @@
 from cookiecutter import cli
 from cookiecutter import main as api_main  # noqa: F401 referenced by tests
 
-main = cli.main
+
+def default_ccds_main(f):
+    """Set the default for the cookiecutter template argument to the CCDS template."""
+
+    def _main(*args, **kwargs):
+        f.params[1].default = "https://github.com/drivendata/cookiecutter-data-science"
+        return f(*args, **kwargs)
+
+    return _main
+
+
+main = default_ccds_main(cli.main)
 
 
 if __name__ == "__main__":

diff --git a/docs/docs/ccds.png b/docs/docs/ccds.png
diff --git a/docs/docs/css/extra.css b/docs/docs/css/extra.css
@@ -1,3 +1,23 @@
+:root {
+    --md-primary-fg-color:        #328F97;
+    --md-primary-fg-color--light: #328F97;
+    --md-primary-fg-color--dark:  #328F97;
+
+    --md-accent-fg-color:         #328F97;
+
+    --md-footer-bg-color: white;
+    --md-footer-fg-color: #222;
+    --md-footer-fg-color--light: #222;
+    --md-footer-fg-color--lighter: #222;
+}
+
+.md-typeset {
+    -webkit-print-color-adjust: exact;
+    color-adjust: exact;
+    font-size: 0.85rem;
+    line-height: 1.4;
+}
+
 .md-typeset h1 {
     font-weight: 800;
     color: #222;
@@ -10,6 +30,47 @@
     color: #222;
 }
 
+.md-typeset a {
+    color: #297c82;
+    word-break: break-word;
+}
+
+.md-typeset code {
+    font-size: .8em;
+    background-color: #f5f5f5;
+    color: #193d3d;
+}
+
+.md-typeset .admonition.info,
+.md-typeset details.info {
+  border-color: #328F97;
+}
+.md-typeset .info > .admonition-title,
+.md-typeset .info > summary {
+  background-color: #328F9726;
+}
+
+.md-typeset .info > .admonition-title::before,
+.md-typeset .info > summary::before {
+  background-color: #328F97;
+}
+
+.md-header__title {
+    font-family: "Space Mono";
+    font-weight: 400;
+    font-style: normal;
+    font-size: 0.9rem;
+}
+
+.md-typeset > p, .md-typeset > ul, .md-typeset > ol, .md-typeset > blockquote, .md-typeset > div.admonition {
+    max-width: 35rem;
+}
+
+.md-typeset blockquote {
+    font-size: 1.0rem;
+    font-weight: 300;
+}
+
 #termynal {
     /* 40 lines of 2ex */
     height: 80ex !important;
@@ -31,4 +92,17 @@
 .inline-input,
 .default-text {
     display: inline-block !important;
-}
+}
+
+.md-logo img {
+    height: 3rem !important;
+}
+
+.md-header, .md-footer, .md-footer-meta {
+    color: #222;
+    background-color: white;
+}
+
+.md-nav__link--active {
+    font-weight: 600;
+}
diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -2,21 +2,21 @@
 
 _A logical, flexible, and reasonably standardized project structure for doing and sharing data science work._
 
-[![tests](https://github.com/drivendata/cookiecutter-data-science/workflows/tests/badge.svg?branch=v2)](https://github.com/drivendata/cookiecutter-data-science/actions/workflows/tests.yml?query=branch%3Av2)
+<a target="_blank" href="https://cookiecutter-data-science.drivendata.org/">
+    <img src="https://img.shields.io/badge/CCDS-Project%20template-328F97?logo=cookiecutter" />
+</a>
 
 ## Quickstart
 
-!!! info "Changes in v2"
-
-    Cookiecutter Data Science v2 now requires installing the new `cookiecutter-data-science` Python package, which extends the functionality of the [`cookiecutter`](https://cookiecutter.readthedocs.io/en/stable/README.html) templating utility. Use the provided `ccds` command-line program instead of `cookiecutter`.
+Cookiecutter Data Science v2 requires Python 3.7+. Since this is a cross-project utility application, we recommend installing it with [pipx](https://pypa.github.io/pipx/). Installation command options:
 
 === "With pipx (recommended)"
 
     ```bash
     pipx install cookiecutter-data-science
 
     # From the parent directory where you want your project
-    ccds https://github.com/drivendata/cookiecutter-data-science
+    ccds
     ```
 
 === "With pip"
@@ -25,7 +25,7 @@ _A logical, flexible, and reasonably standardized project structure for doing an
     pip install cookiecutter-data-science
     `
     # From the parent directory where you want your project
-    ccds https://github.com/drivendata/cookiecutter-data-science
+    ccds
     ```
 
 === "With conda (coming soon!)"
@@ -34,7 +34,7 @@ _A logical, flexible, and reasonably standardized project structure for doing an
     # conda install cookiecutter-data-science -c conda-forge
 
     # From the parent directory where you want your project
-    # ccds https://github.com/drivendata/cookiecutter-data-science
+    # ccds
     ```
 
 === "Use the v1 template"
@@ -46,33 +46,37 @@ _A logical, flexible, and reasonably standardized project structure for doing an
     cookiecutter https://github.com/drivendata/cookiecutter-data-science -c v1
     ```
 
-## Installation
-
-Cookiecutter Data Science v2 requires Python 3.7+. Since this is a cross-project utility application, we recommend installing it with [pipx](https://pypa.github.io/pipx/). Installation command options:
+!!! info "Changes in v2"
 
-```bash
-# With pipx from PyPI (recommended)
-pipx install cookiecutter-data-science
+    Cookiecutter Data Science v2 now requires installing the new `cookiecutter-data-science` Python package, which extends the functionality of the [`cookiecutter`](https://cookiecutter.readthedocs.io/en/stable/README.html) templating utility. Use the provided `ccds` command-line program instead of `cookiecutter`.
 
-# With pip from PyPI
-pip install cookiecutter-data-science
-
-# With conda from conda-forge (coming soon)
-# conda install cookiecutter-data-science -c conda-forge
-```
 
 ## Starting a new project
 
 Starting a new project is as easy as running this command at the command line. No need to create a directory first, the cookiecutter will do it for you.
 
 ```bash
-ccds https://github.com/drivendata/cookiecutter-data-science
+ccds
 ```
 
+The `ccds` commandline tool defaults to the Cookiecutter Data Science template, but you can pass your own template as the first argument if you want.
+
+
 ## Example
 
 <!-- TERMYNAL OUTPUT -->
 
+
+Now that you've got your project, you're ready to go! You should do the following:
+
+ - **Check out the directory structure** below so you know what's in the project and how to use it.
+ - **Read the [opinions](opinions.md)** that are baked into the project so you understand best practices and the philosophy behind the project structure.
+ - **Read the [using the template](using-the-template.md) guide** to understand how to get started on a project that uses the template.
+
+
+ Enjoy!
+
+
 ## Directory structure
 
 The directory structure of your new project will look something like this (depending on the settings that you choose):

diff --git a/docs/docs/using-the-template.md b/docs/docs/using-the-template.md
@@ -0,0 +1,136 @@
+# Using the template
+
+You've [created](index.md#starting-a-new-project) your project. You've [read the opinions section](opinions.md). You're ready to start doing some work.
+
+Here's a quick guide of the kinds of things we do once our project is ready to go. We'll walk through this example using git and GitHub for version control and jupyter notebooks for exploration, but you can use whatever tools you like.
+
+## Set up version control
+
+Often, we start by initializing a `git` repository to track the code we write in version control and collaborate with teammates. At the command line, you can do this with the following commands which do the following: turn the folder into a git repository, add all of the files and folders created by CCDS into source control (except for what is in the `.gitignore` file), and then make a commit to the repository.
+
+```bash
+# From inside your newly created project directory
+git init
+git add .
+git commit -m "CCDS defaults"
+```
+
+We usually commit the entire default CCDS structure so it is easy to track the changes we make to the structure in version history.
+
+Now that the default layout is committed, you should push it to a shared repository. You can do this through the interface of whatever source control platform you use. This may be GitHub, GitLab, Bitbucket, or something else.
+
+If you use GitHub and have the [gh cli tool](https://cli.github.com/) you can easily create a new repository for the project from the commandline.
+
+```bash
+gh repo create
+```
+
+You'll be asked a series of questions to set up the repository on GitHub. Once you're done you'll be able to push the changes in your local repository to GitHub. 
+
+## Create a Python virutal environment
+
+We often use Python for our data science projects. We use a virtual environment to manage the packages we use in our project. This is a way to keep the packages we use in our project separate from the packages we use in other projects. This is especially important when we are working on multiple projects at the same time.
+
+Cookicutter Data Science supports [a few options](opinions.md#build-from-the-environment-up) for Python virtual environment management, but no matter which you choose, you can create an environment with the following commands:
+
+```bash
+make create_environment
+```
+
+Once the environment is created, you'll want to make sure to activate it. You'll have to do this following the instructions for your specific environment manager. We recommend using a shell prompt that shows you which environment you are in, so you can easily tell if you are in the right environment, for example [starship](https://starship.rs/). You can also use the command `which python` to make sure that your shell is pointing to the version of Python associated with your virtual environment.
+
+Once you are sure that your environment is activated in your shell, you can install the packages you need for your project. You can do this with the following command:
+
+```bash
+make requirements
+```
+
+## Add your data
+
+There's no universal advice for how to manage your data, but here are some recommendations for starting points depending on where the data comes from:
+
+ - **Flat files (e.g., CSVs or spreadsheets) that are static** - Put these files into your `data/raw` folder and then run `make sync_data_up` to push the raw data to your cloud provider.
+ - **Flat files that change and are extracted from somewhere** - Add a Python script to your source module in `data/make_dataset.py` that downloads the data and puts it in the `data/raw` folder. Then you can use this to get the latest and push it up to your cloud host as it changes (be careful not to [override your raw data](opinions.md/#data-analysis-is-a-directed-acyclic-graph)).
+ - **Databases you connect to with credentials** - Store your credentials in `.env`. We recommend adding a `db.py` file or similar to your `data` module that connects to the database and pulls data. If your queries generally fit into memory, you can just have functions in the `db.py` to load data that you use in analysis. If not, you'll want to add a script like above to download the data to the `data/raw` folder.
+
+## Check out a branch
+
+We'll talk about code review later, but it's a good practice to use feature branches and pull requests to keep your development organized. Now that you have source control configured, you can check out a branch to work with:
+
+```
+git checkout -b initial-exploration
+```
+
+## Open a notebook
+
+!!! note 
+
+    The following assumes you're using a Jupyter notebook, but while the specific commands for another notebook tool may look a little bit different, the process guidance still applies.
+
+Now you're ready to do some analysis! Make sure that your project-specific environment is activated (you can check with `which jupyter`) and run `jupyter notebook notebooks` to open a Jupyter notebook in the `notebooks/` folder. You can start by creating a new notebook and doing some exploratory data analysis. We often name notebooks with a scheme that looks like this:
+
+```
+0.01-pjb-data-source-1.ipynb
+```
+
+ - `0.01` - Helps leep work in chronological order. The structure is `PHASE.NOTEBOOK`. `NOTEBOOK` is just the Nth notebook in that phase to be created. For phases of the project, we generally use a scheme like the following, but you are welcome to design your own conventions:
+    - `0` - Data exploration - often just for exploratory work
+    - `1` - Data cleaning and feature creation - often writes data to `data/processed` or `data/interim`
+    - `2` - Visualizations - often writes publication-ready viz to `reports`
+    - `3` - Modeling - training machine learning models
+    - `4` - Publication - Notebooks that get turned directly into reports
+- `pjb` - Your initials; this is helpful for knowing who created the notebook and prevents collisions from people working in the same notebook.
+- `data-source-1` - A description of what the notebook covers 
+
+Now that you have your notebook going, start your analysis!
+
+## Refactoring code into shared modules
+
+As your project goes on, you'll want to refactor your code in a way that makes it easy to share between notebooks and scripts. We recommend creating a module in the `{{ cookiecutter.module_name }}` folder that contains the code you use in your project. This is a good way to make sure that you can use the same code in multiple places without having to copy and paste it.
+
+Because the default structure is a Python package and is installed by default, you can do the following to make that code available to you within a Jupyter notebook.
+
+First, we recommend turning on the `autoreload` extension. This will make Jupyter always go back to the source code for the module rather than caching it in memory. If your notebook isn't reflecting the latest changes from your changes to a `.py` file, try restarting the kernel and make sure `autoreload` is on. We add a cell at the top of the notebook with the following:
+
+```
+%load_ext autoreload
+%autoreload 2
+```
+
+Now all your code should be importable. At the start of the CCDS project, you picked a module name. It's the same name as the folder that is in the root project directory. For example, if the module name were `my_project` you could use code by importing it like:
+
+```python
+from my_project.data import make_dataset
+
+data = make_dataset()
+```
+
+Now it should be easy to do any refactoring you need to do to make your code more modular and reusable.
+
+
+## Make your code reviewable
+
+We try to review every line of code written at DrivenData. Data science code in particular has the risk of executing without erroring, but not being "correct" (for example, you use standard deviation in a calculation rather than variance). We've found the best way to catch these kinds of mistakes is a second set of eyes looking at the code.
+
+Right now on GitHub, it is hard to observe and comment on changes that happen in Jupyter notebooks. We develop and maintain a tool called [`nbautoexport`](https://nbautoexport.drivendata.org/stable/) that automatically exports a `.py` version of your Jupyter noteobok every time you save it. This means that you can commit both the `.ipynb` and the `.py` to source control so that reviewers can leave line-by-line comments on your notebook code. To use it, you will need to add `nbautoexport` to your requirements file and then run `make requirements` to install it.
+
+Once `nbautoexport` is installed, you can setup the nbautoexport tool for your project with the following commands at the commandline:
+
+```
+nbautoexport install
+nbautoexport configure notebooks
+```
+
+Once you're done with your work, you'll want to add it to a commit and push it to GitHub so you can open a pull request. You can do that with the following commandline commands
+
+```
+git add .  # stage all changed files to include them in the commit
+git commit -m "Initial exploration"  # commit the changes with a message
+git push  # publish the changes
+```
+
+Now you'll be able to [create a Pull Request in GitHub](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request).
+
+## Changing the `Makefile`
+
+There's no magic in the Makefile. We often add project-specific commands or update the existing ones over the course of a project. For example, we've added scripts to generate reports with pandoc, build and serve documentation, publish static sites from assets, package code for distribution, and more.
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -9,14 +9,20 @@ theme:
   features:
     - navigation.instant
     - toc.integrate
-  logo: logo.svg
+  logo: ccds.png
   name: material
+  custom_dir: overrides
   palette:
-    primary: black
+    primary: custom
+    accent: custom
+  font:
+    text: Work Sans
+    code: Space Mono
 nav:
   - Home: index.md
   - Why ccds?: why.md
   - Opinions: opinions.md
+  - Using the template: using-the-template.md
   - Contributing: contributing.md
   - Related projects: related.md
   - v1 Template: v1.md

diff --git a/docs/overrides/main.html b/docs/overrides/main.html
@@ -0,0 +1,7 @@
+{% extends "base.html" %}
+
+{% block extrahead %}
+<link rel="preconnect" href="https://fonts.googleapis.com">
+<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+<link href="https://fonts.googleapis.com/css2?family=Space+Mono:ital,wght@0,400;0,700;1,400;1,700&display=swap" rel="stylesheet">
+{% endblock %}