ncibtep@nih.gov

Bioinformatics Training and Education Program

Class: Bioinformatics for Beginners using the Biostar Handbook

This web page contains all the information you need to participate in the “Bioinformatics for Beginners using the Biostar Handbook” class.

As of October, 2020, this class has ended. If you are interested in a future Bioinformatics for Beginners Class, please send email to ncibtep@nih.gov

This page will remain on the BTEP web site as a reference for the topics that were covered.

Instructor: Amy Stonelake

Assistant Instructors: Peter Fitzgerald, Carl McIntosh, Des Tillo

You will receive an email from Biostars with your class login information, please download all of the PDF for the following:

  1. The Biostar handbook (classroom edition)
  2. The Art of Bioinformatics Scripting
  3. RNA-Seq by Example
  4. Coronavirus Genome Analysis

NOTE: If you have a Windows PC you need to contact service.cancer.gov for help installing the class software. Do this as soon as possible.

If you have a Mac computer, please follow the instructions at Setting up your Mac Computer for the Biostar Class

There is a Question & Answer forum for the class, at Biostar Q&A

Things you should know about this course:

  1. We will start with “The Biostar Handbook”.
  2. We will use the ONLINE version of the class since the content gets updated frequently. The PDFs are for your reference and to keep a local copy on your computer.
  3. You have access to the Biostar Handbook resources online for 6 months. The PDFs can be downloaded and kept as long as you like.
  4. This web page will contain all instructions and assignments.
  5. You are expected to make progress every week, spending at least 2-4 hours/week reading the material, doing a couple exercises and contributing to class.
  6. There is a Question & Answer forum for the class, at Biostar Q&A
  7. Help sessions will be held every week. At each help session, we will go over the material for the week, and answer any questions you may have. You are expected to have read the material for the week before the help session.
  8. You can work at your own pace on this course. You have access to all materials and what we will be learning each week. Some folks have 2 hours/week to work on the course, while others have 8 hours/week. The help sessions will follow the schedule on this page (see below).
  9. You will install the software on your government issued computer or an at-home computer.

 Week one: Basic bioinformatics (April 3)

  1. For today, if you have time, please download the class materials as directed in the email from Biostars. Also, you can read the introductory materials in the Biostar Handbook “Preface”,  covering the following topics. Don’t worry if you can’t get this done today, there is plenty of time next week to read this and do the computer set-up.
    • About the author
    • Why bioinformatics?
    • What is bioinformatics?
    • Biology for bioinformaticians (most of you know this already)
    • How is bioinformatics practiced?
    • How to solve it
    • How not to waste your time

 

Week two: Installation (April 6 – 10)

  1. Read through Part II of the Biostar Handbook, “Installation”.
    • How to set up your computer (see NOTE above for Windows PC users)
    • How to set up conda
    • How to install software
    • Choose and install a text editor (Notepad++ for Windows PC and Komodo Edit for Mac)

Week three: Unix command line (April 13 – 17)

  1. Read Part III of the Biostar Handbook, “Unix command line”. While you are reading, open your Terminal program on Mac, or Ubuntu on PC, and type in the commands as you learn them. This way you can see how each of the commands works. We will review the commands and answer any questions in the class sessions (see below).
    • Introduction to Unix
    • The Unix bootcamp
    • Data analysis with Unix
    • Using makefiles – Skip this section for now, we can discuss makefiles later in the course
    • Data compression

Week four: Data sources (April 20 – 25)

  1. Read Part IV of the Biostar Handbook, “Data Sources”.
    • What is data?
    • Biological data sources
    • Common data types (GenBank, FASTA, FASTQ, GFF/GTF/BED, SAM/BAM, VCF)
    • Human and mouse genomes
    • Automating access to NCBI
    • Entrez direct by example

Week five: Data formats (April 27 – May 1)

  1. Read Part V of the Biostar Handbook, “Data formats”.
    • Introduction to data formats
    • The GenBank format
    • The FASTA format
    • The FASTQ format
    • Advanced FASTQ processing
  2. See the GitHub pages at https://github.com/AmyStonelake/BTEP/wiki
  3. Here are the GitHub pages in PDF format
    1. Retrieving data from NCBI with E Utilities · AmyStonelake:BTEP Wiki
    2. Decompressing files with the tar command · AmyStonelake:BTEP Wiki
    3. Bulk RNA Seq test data · AmyStonelake:BTEP Wiki
    4. Working with FASTQ and FASTA data · AmyStonelake:BTEP Wiki
  4.  chat transcript April 29
  5. April 29 Class Recording
  6. chat transcript April 30

Week six: Visualizing data

  1. Read Part VI of the Biostar Handbook, “Visualizing data”.

To do this week before class

  1. Please download and install the Integrative Genomics Viewer on your machine. There is a Mac version, a PC version, and a unix/linux version.  For our class, the Mac or PC version will be fine.
  2. Watch the YouTube tutorials – they are just a few minutes long and will be extremely helpful to you. Try to watch all four.
    1. Data Navigation Basics
    2. Sequencing Data Basics
    3. RNA-Seq Data Basics (Splice Junction Track, Downsampling)
    4. RNA-Seq Data Advanced (Alternative Splicing, Sashimi Plots)
  3. Class Github pages

Week seven: Sequencing Data

  1. Read Part XI of the Biostar Handbook, “Sequencing data

Week eight: Quality Control

  1. Read Part XII of the Biostar Handbook, “Quality Control”.
      • Visualizing sequencing data quality
      • Sequencing quality control
      • Sequencing adapter trimming
      • Sequence duplication
      • Advanced quality control
  2. Check out an online tutorial at YouTube
  3. Class recording (May 21)
  4. Class chat transcript May 20 and chat transcript May 21
  5. Class GitHub pages

Week nine: Sequence Patterns (postponed to week of June 1 – June 5)

  1. Read Part XIV of the Biostar Handbook, “Sequence Patterns”
    • Sequence patterns
    • Regular expressions
    • Sequence k-mers. If you’re having trouble running the “jellyfish” program, please see the GitHub page.
  2. Recording June 4
  3. Chat transcript June 4
  4. GitHub pages

 

Week ten: Sequencing instruments, (Week of June 8 – 12)

  1. Read Part X of the Biostar Handbook, “Sequencing Instruments”.
      • Sequencing instruments
      • Illumina sequencers
      • PacBio sequencers
      • Minion sequencers
      • Sequencing data preparation
  2. Please watch this Illumina video before class.
  3.  chat transcript Biostar Jun 10
  4. Sequencing Instruments slides
  5. Sequencing instruments Github page
  6. WebEx recording, June 11

Week eleven: Sequence Alignments, (Week of June 15 – 19)

  1. Read Part XV of the Biostar Handbook, “Alignments”.
    • Introduction to alignments
    • Global and local alignments
    • Misleading alignments
  2. Github page (16. Sequence Alignments)
  3. Class Recording, June 18

Week twelve: Review session (Week of June 22-26)

We will review:

  1. Unix commands
  2. Decompressing files
  3. Data types (FASTA, FASTQ, SAM/BAM)
  4. Quality control of RNA-Seq data (FASTQC, MultiQC, adapter trimming)
  5. Alignment of RNA-Seq data to genome with bowtie2
  6. Visualization of RNA-Seq data (Integrative Genomics Viewer, IGV)

Week thirteen (no class week of June 29 – July 2)

Week fourteen: BLAST, (Week of July 6 – 10)

  1. Read Part XVI of the Biostar Handbook, “BLAST”.
    • BLAST: Basic Local Alignment Search Tool
    • BLAST use cases
    • BLAST databases
  2. Github page, BLAST
  3. Class recording, July 9

Week fifteen: Short Read Aligners, (Week of July 13 – 17)

  1. Read Part XVII of the Biostar Handbook, “Short Read Aligners”
    • Short read aligners
    • The bwa aligner
    • The bowtie aligner
    • How do I compare aligners?
    • Multiple sequence alignment

Week sixteen: SAM/BAM Format

  1. Read Part XVIII of the Biostar Handbook, SAM/BAM Format
    • The SAM/BAM/CRAM format
    • How to make a BAM file
    • The SAM format explained
    • Working with BAM files
    • The SAM reference sheet
    • Sequence variation in SAM files

 

Week seventeen: Genomic Variation and Variation Calling, (Week of Aug 10-14)

  1. Read Part XIX of the Biostar handbook, “Genomic Variation”
    • An introduction to genomic variation
    • Online Mendelian Inheritance in Man (OMIM)
    • Why simulating reads is a good idea
    • Visualizing large scale genomic variations
  2. Read Part XX of the Biostar handbook, “Variation Calling”
    • Introduction to variant calling
    • Variant calling example
    • Multi-sample variant calling
    • Variant normalization
    • The Variant Call Format (VCF)
    • Filtering information in VCF files
    • Variant annotation and effect prediction
    • Why is variant calling challenging?

RNA-Seq session scheduled for Tuesday, Oct 27, 1 – 3 PM

Recording

See the following GitHub pages

  1. RNA Seq Example
  2. RNA Seq and Gene Enrichment Analysis
  3. Classification based RNA Seq of control samples

As of October, 2020, this class has ended. If you are interested in a future Bioinformatics for Beginners Class, please send email to ncibtep@nih.gov