Skip to content

mwpennell/ubc-biol548o-w2020

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ubc-biol548o-w2020

BIOL 548O is a short module designed to help students work with datasets more effectively and efficiently.

Instructor

Dr. Matthew Pennell
Assistant Professor, Department of Zoology
Email: pennell@zoology.ubc.ca
Office hours: By appointment Office: Biodiversity 208

Logistics

The module will run from February 4th to March 5th 2020. Note that this differs slightly from the dates in the UBC Course calendar. There will be no classes during Reading Week.

Classes will be in Biosciences 4223 on Tuesdays and Thursdays from 1500-1630.

Grades

Grade breakdown

80% Homework assignments
20% In-class participation

A note on grading philosophy

This module is focused on skill development. I recognize that different students will be coming from very different academic backgrounds have various levels of experience with the tools we are working with. And that's great -- we are all here to learn! As such, assessment for this module will be primary about the process (are you putting effort into developing your skillset?) and not the product (how elegant is your code?).

Course content and schedule

This module is designed to be primarily a "workshop"-style course. I will expect you to have read the assigned materials beforehand. During class, I will review some key points and we'll work through problems together.

It is not necessary to bring your own datasets to work with; I know that many of you might be just starting your studies or otherwise, do not currently have datasets that are in need of cleaning up. However, if you already have data, either from your own thesis work or perhaps some other lab project, please bring it along -- it is far more motivating and interesting to work with data you really care about.

Note: Much of the course material is adapted from the Data Carpentry for Biologists course developed by Ethan White and Zachary Brym.

Lecture 1 - Feb 4

In the first lecture, we are going to:

  1. Run through the objectives of the module so you can get a sense of where we are going;

  2. Discuss the data management challenges that you face (or will likely face) when working with data specific to your research topic;

  3. take a brief tour of RStudio + git/GitHub and learn how we can make them talk to one another.

In preparation for the first lecture, I would ask you to please do the following:

  1. Download and install the R base system and the RStudio Desktop IDE. Both are needed. Note that installing RStudio will not automatically install R;

  2. Download and install git;

  3. If you haven't already, set up an account on GitHub and send your username to the Instructor (pennell@zoology.ubc.ca).

Lecture 2 - Feb 6

Topic: Best practices for version control and project organization

Readings:
Git Basics in RStudio

Lecture

Assignment:
Complete Exercises 1-4 in the Lecture.

Additional Readings:
Happy git with R (ebook) by UBC Stat 545 instructors

Lecture 3 - Feb 11

Topic: Principles of tidy data

Readings:
R for Data Science - Tidy Data

Lecture

Additional Readings:
Data organization in Spreadsheets (general paper)
Data organization in Spreadsheets (for Ecologists)

Lecture 4 - Feb 13

Topic: Transforming Data in R

Readings:
R for Data Science - Transforming Data

Lecture

Additional readings:
Data Carpentry: dplyr

Lecture 5 - Feb 25

Topic: Relational databases

Readings:
R for Data Science - Relational Data

Lecture

Additional readings:
Data Carpentry - Working with SQL databases in R

Lecture 6 - Feb 27

Topic: Working with text (or, why regular expressions are your best friends)

Lecture

Readings:
R for Data Science - Strings

Lecture 7 - Mar 3

Topic: Scripting - Part I - Functions

Readings:
R for Data Science - Functions

Lecture

Lecture 8 - Mar 5

Topic: Scripting - Part II - Conditionals and Iteration

Readings:
R for Data Science - Iteration

Lecture

Additional resources

Books

I regularly refer to all 3 of the following books and cannot recommend them strongly enough.

  1. Hadley Wickham Advanced R (ebook)

  2. Garrett Grolemund and Hadley Wickham R for Data Science (ebook)

  3. Stephen Haddock and Casey Dunn. Practical Computing for Biologists

About

course materials for BIOL 548O Dealing with Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published