Skip to content

Course Overview

Welcome to the Data Wrangling with R course series

The purpose of this course is to introduce you to essential R packages and functions that will make your life easier when it comes time to explore, clean, transform, and summarize your data. This course will include a series of lessons for scientists with little to no experience in R.

Course objectives

  • Learn how to navigate RStudio.
  • Learn how to load different types of data formats.
  • Get acquainted with the tidyverse packages, especially dplyr.
  • Become familiar with functions useful for cleaning, transforming, and summarizing data.

While this course will not make you an expert R programmer or full-fledged data analyst, it will help you learn how to analyze real-life, messy data and prepare it for visualization and further analyses.

Course Expectations

This course will include a series of eight, one hour lessons over the course of five weeks. Each lesson will be held virtually using the Webex platform on Tuesdays / Thursdays at 1 pm. Lessons will immediately be followed by a one-hour help session. Help sessions will be structured around a set of practice problems for you to test your new skills. Though, we welcome all questions!

Lesson 1: Introduction to R, RStudio, and the Tidyverse

This will be a no coding introduction to R, RStudio, and the Tidyverse. In this lesson, we will review some of the advantages of using R for data analysis and will get you acquainted with the RStudio environment. The help session will be devoted to getting everyone connected to the course on DNAnexus.

Lesson 2: Getting started with R.

Lesson 2 will focus on some of the basics of R programming including naming and assigning R objects, recognizing and using R functions, understanding data types and classes, becoming familiar with the R programming syntax.

Lesson 3: Importing and reshaping data

In lesson 3, we will learn how to import simple and complex data and how to avoid common mistakes. We will also learn how to reshape data, for example, from wide to long format, with tidyr.

Lesson 4: Data Visualization with ggplot2

Lesson 4 will be a brief reprieve from data wrangling. In this lesson, we will learn the basics of plotting with ggplot2.

Lesson 5: Introducing dplyr and the pipe

In Lesson 5, we will learn how to improve code interpretability with the pipe %>% from the magrittr package. We will also learn how to merge and filter data frames.

Lesson 6: Continue data wrangling with dplyr

In Lesson 6, we will continue to wrangle data using dplyr. This lesson will focus on functions such as group_by(), arrange(), summarize(), and mutate().

Lesson 7: Lesson Review

In Lesson 7 we will review many of the important concepts we learned throughout the course.

Lesson 8: Working with your own data

Lesson 8 will be a BYOD (bring your own data) class. You will have two hours to work on your own data and get help accordingly. If you do not have your own data, we will provide a data set and practice questions for you to test your wrangling skills.

Required Course Materials

To participate in this class you will need your government-issued computer and a reliable internet connection. You do not need to download or install any software to participate in the class.