Skip to content

shawnsanto/computing-bootcamp-2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Duke University :: Department of Statistical Science Computing Bootcamp 2019

This is a five hour computing bootcamp for incoming Ph.D. and M.S. students to the Department of Statistical Science at Duke University. These materials are adapted from the 2018 bootcamp by Mine Çetinkaya-Rundel and Colin Rundel.

The workshop will cover the following topics:

Introduction to the DSS and Duke computing eco-systems

  • Account activation and access to departmental servers
  • Discussion of how to responsibly use distributed computing resources
  • Docker containers and Duke VM
    • RStudio
    • Jupyter Notebook

Introduction to reproducible research

  • Recognize the problems that reproducible research helps address, featuring a brief discussion of case studies gone wrong and how reproducible research could have possibly helped
  • Identify pain points in getting your analysis to be reproducible
  • The role of documentation, sharing, automation, and organization in making your research more reproducible
  • Introduce some tools to solve these problems, specifically R / RStudio / R Markdown

Organizing your project to facilitate reproducible research

  • Organize projects and folders to enable reproducibility and reusability
  • Understand the structure of data files and the importance of documenting all changes made
  • Create a reproducible project workflow using R / RStudio / R Markdown

Version control

  • Introduce git and GitHub.
  • Initiate a project directory, understand the git workflow, and create a pull request to a remote repository
  • Discuss the role of version control in reproducibility
  • Discuss version control best practices

R / RStudio and R Markdown

  • Navigate R Markdown and RStudio
  • Analyze data and create graphics with package tidyverse
  • Discuss workflow

Python and Jupyter notebook

  • Navigate Jupyter notebooks
  • Introduce Python basics, control flow, and functions
  • Discuss popular Python packages including: NumPy, SciPy, pandas, matplotlib, seaborn, scikit-learn, and TensorFlow
  • Discuss similarities and differences between Python and R
  • Discuss how to leverage the best of R and Python

Acknowledgments

Git / GitHub

Other

Python

R

About

Duke University :: Department of Statistical Science Computing Bootcamp 2019

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published