Course Details

  • Date: June 7th, 2022 - July 7th, 2022
  • Time: 1:00 pm - 2:00 pm
  • Location: Online Webinar
  • Presenter(s): Alex Emmons (BTEP)

Welcome to the Data Wrangling with R course series! The purpose of this course is to introduce you to essential R packages and functions that will make your life easier when it comes time to explore, clean, transform, and summarize your data.  Around 50-80 % of a data scientists time is often said to be devoted to data wrangling, or the act of getting data into a specific format. We can reduce some of this time simply by becoming more familiar with the packages and tools dedicated to tidying, transforming, and summarizing data. In R, one such collection of packages is known as the tidyverse, which will be the focus of this course.

This series will include 8 lessons over 5 weeks. Each lesson will be held virtually using the Webex platform on Tuesdays / Thursdays at 1 pm. Lessons will immediately be followed by a one-hour help session. Help sessions will be structured around a set of practice problems for you to test your new skills. Though, we welcome all questions!

Registering here will register you for all 8 lessons. You do not need to register for each individual lesson. If you decide to register for this series after the start of the course, please send us an email at ncibtep@nih.gov, and we will register you.

No experience with R is necessary to attend this course. The first few lessons will be focused on getting acquainted with R and RStudio. Moreover, you will not need to install R on your computer for this class. Instead, we will be using R through DNAnexus, a cloud platform for bioinformatics analysis. Upon registering for the class, register for a free DNAnexus account at https://www.dnanexus.com. You will need to send your username to ncibtep@nih.gov to finish setting up your DNAnexus account for course access. Even if you already have a DNAnexus account, please send your username to ncibtep@nih.gov.

In this series, you will learn how to navigate RStudio, assign objects and use functions, and clean, transform, and summarize data. The last course in this series will be devoted to you and your data. If you do not have your own data, we will provide a data set and practice questions for you to test your wrangling skills.

Lesson 1, June, 7th, 2022, Introduction to R, RStudio, and the Tidyverse 

This will be a no coding introduction to R, RStudio, and the Tidyverse. In this lesson, we will review some of the advantages of using R for data analysis and will get you acquainted with the RStudio environment. The help session will be devoted to getting everyone connected to the course on DNAnexus.

Lesson 2, June 9th, 2022, Getting started with R. 

Lesson 2 will focus on some of the basics of R programming including naming and assigning R objects, recognizing and using R functions, understanding data types and classes, becoming familiar with the R programming syntax.

Lesson 3, June 14, 2022, Importing and reshaping data

In lesson 3, we will learn how to import simple and complex data and how to avoid common mistakes. We will also learn how to reshape data, for example, from wide to long format, with tidyr.

Lesson 4, June 16, 2022, Data Visualization with ggplot2

Lesson 4 will be a brief reprieve from data wrangling. In this lesson, we will learn the basics of plotting with ggplot2.

Lesson 5, June 21st, 2022, Introducing dplyr and the pipe

In Lesson 5, we will learn how to improve code interpretability with the pipe (%>%) from the magrittr package. We will also learn how to merge and filter data frames.

Lesson 6, June 23rd, 2022, Continue data wrangling with dplyr. 

In Lesson 6, we will continue to wrangle data using dplyr. This lesson will focus on functions such as group_by(), arrange(), summarize(), and mutate().

Lesson 7, July 5th, 2022, Lesson Review

In Lesson 7 we will review many of the important concepts we learned throughout the course.

Lesson 8, July 7th, 2022, Working with your own data.

Lesson 8 will be a BYOD (bring your own data) class. You will have two hours to work on your own data and get help accordingly. If you do not have your own data, we will provide a data set and practice questions for you to test your wrangling skills.

Course materials will be updated before each lesson here.

Meeting Link: https://cbiit.webex.com/cbiit/j.php?MTID=m21dc5f9c2cb503ff6bf96ce52d57d9d5