Course Overview
Welcome to the Data Wrangling with R course series
The purpose of this course is to introduce you to essential R packages and functions that will make your life easier when it comes time to explore, clean, transform, and summarize your data. This course will include a series of lessons for scientists with little to no experience in R.
Course objectives
- Learn how to navigate RStudio.
- Learn how to load different types of data formats.
- Get acquainted with the tidyverse packages, especially
dplyr
. - Become familiar with functions useful for cleaning, transforming, and summarizing data.
While this course will not make you an expert R programmer or full-fledged data analyst, it will help you learn how to analyze real-life, messy data and prepare it for visualization and further analyses.
Course Expectations
This course will include a series of eight, one hour lessons over the course of five weeks. Each lesson will be held virtually using the Webex platform on Tuesdays / Thursdays at 1 pm. Lessons will immediately be followed by a one-hour help session. Help sessions will be structured around a set of practice problems for you to test your new skills. Though, we welcome all questions!
Lesson 1: Introduction to R, RStudio, and the Tidyverse
This will be a no coding introduction to R, RStudio, and the Tidyverse. In this lesson, we will review some of the advantages of using R for data analysis and will get you acquainted with the RStudio environment. The help session will be devoted to getting everyone connected to the course on DNAnexus.
Lesson 2: Getting started with R.
Lesson 2 will focus on some of the basics of R programming including naming and assigning R objects, recognizing and using R functions, understanding data types and classes, becoming familiar with the R programming syntax.
Lesson 3: Importing and reshaping data
In lesson 3, we will learn how to import simple and complex data and how to avoid common mistakes. We will also learn how to reshape data, for example, from wide to long format, with tidyr
.
Lesson 4: Data Visualization with ggplot2
Lesson 4 will be a brief reprieve from data wrangling. In this lesson, we will learn the basics of plotting with ggplot2
.
Lesson 5: Introducing dplyr and the pipe
In Lesson 5, we will learn how to improve code interpretability with the pipe %>%
from the magrittr
package. We will also learn how to merge and filter data frames.
Lesson 6: Continue data wrangling with dplyr
In Lesson 6, we will continue to wrangle data using dplyr
. This lesson will focus on functions such as group_by()
, arrange()
, summarize()
, and mutate()
.
Lesson 7: Lesson Review
In Lesson 7 we will review many of the important concepts we learned throughout the course.
Lesson 8: Working with your own data
Lesson 8 will be a BYOD (bring your own data) class. You will have two hours to work on your own data and get help accordingly. If you do not have your own data, we will provide a data set and practice questions for you to test your wrangling skills.
Required Course Materials
To participate in this class you will need your government-issued computer and a reliable internet connection. You do not need to download or install any software to participate in the class.