Welcome to the NCI/CCR Dataquest class!

This class supports learners who have licenses to the Dataquest.io platform, available through CCR. If you are interested in obtaining a Dataquest license and joining the class, please send email to ncibtep@nih.gov

We will be posting material here to help you navigate the Dataquest platform and apply the skills you are using to solve biological problems.

Question and Answer (Q & A) forum


where you can post any questions you have as you work on Dataquest. Learners can both ask questions and post answers for other learners.

Class Sessions

Session One: May 7, 2020 @ 10 AM Recording is here.

Session Two: June 9 @ 1 PM Recording and chat transcript.

Session Three: June 23 @ 1 PM, Recording

Topic: Finding your way in Dataquest and Beyond. We’ll talk about the Dataquest platform and next steps to apply what you’ve learned.

Session Four: August 4, Tues @ 2 PM, Recording
Topic: Practical Python Programming by Example. Utilizing the exercise of converting a nucleotide sequence into an amino acid sequence, Peter FitzGerald will address issues that are encountered when writing real code to solve real world issues.

Example code found here: programs.tar
Slides are available as a PDF Programming class Slides

Biostar/Dataquest Learners

For users who are taking both the Biostars and Dataquest classes – installing the jupyter notebook as they show you on Dataquest will disable your Biostars class bioinformatics environment. Please follow these instructions to create a separate Dataquest environment.

  1. Set up an environment for the Dataquest course
    • conda create -n dataquest python=3.7
  2. Install jupyter in the Dataquest environment:
    • conda activate dataquest
    • conda install jupyter -y
  3. To open the jupyter notebook from the Dataquest environment, run:
    • jupyter notebook


    1. Rosalind.info: a platform for learning bioinformatics through problem solving. Enroll in the class here.
    2. Python for Biologists
      • Be sure to check out the Programming Articles: Sequence Similarity Search, Counting Bases in a Sequence, 29 Common Beginner Errors on One Page
    3. Please take the class survey here

Unix Resources

      1. Here is a link to a Unix cheat sheet. It is a great reference for learning Unix commands.
      2. Unix bootcamp overview of basic Unix commands
      3. Data Carpentry course, “Introduction to the Command Line for Genomics
      4. Instructions for “Logging into Biowulf” from your Mac or PC.

You can find Dataquest at https://dataquest.io

There are 62 courses listed in the Course Catalog, and 4 Paths.

      1. Data Analyst in Python
      2. Data Scientist in Python
      3. Data Engineer
      4. Data Analyst in R (here are the recommended courses, in order)
        • Introduction to Programming in R
        • Intermediate R Programming
        • Data Visualization in R
        • Data Cleaning in R
        • Data Cleaning in R: Advanced
        • SQL Fundamentals
        • Intermediate SQL in R
        • Statistics Fundamentals in R
        • Statistics Intermediate in R: Averages and Variability
        • Probability: Fundamentals in R
        • Conditional Probability in R
        • Hypothesis Testing in R
        • Linear Regression Modeling in R
        • Machine Learning Fundamentals in R

Each path has a recommended list of courses, and every course has learning objectives, known as “Course Missions”.

However, you can take whichever courses you like, in whatever order you prefer. We can offer guidance on which courses to take depending on your goals.

For example, if you want to learn Unix so you can work on the NIH High Performance Unix Cluster Biowulf, we recommend that you do both of these courses:

      • Elements of the Command Line
      • Text Processing in the Command Line

If you want to learn R so you can work with the Single-Cell RNA-Seq Analysis program Seurat, we recommend you start here:

      • Introduction to Programming in R
      • Intermediate R Programming
      • Data Visualization in R