This web page contains all the information you need to participate in the “Bioinformatics for Beginners using the Biostar Handbook” class.
As of October, 2020, this class has ended. If you are interested in a future Bioinformatics for Beginners Class, please send email to ncibtep@nih.gov
This page will remain on the BTEP web site as a reference for the topics that were covered.
Instructor: Amy Stonelake
Assistant Instructors: Peter Fitzgerald, Carl McIntosh, Des Tillo
You will receive an email from Biostars with your class login information, please download all of the PDF for the following:
- The Biostar handbook (classroom edition)
- The Art of Bioinformatics Scripting
- RNA-Seq by Example
- Coronavirus Genome Analysis
NOTE: If you have a Windows PC you need to contact service.cancer.gov for help installing the class software. Do this as soon as possible.
If you have a Mac computer, please follow the instructions at Setting up your Mac Computer for the Biostar Class
There is a Question & Answer forum for the class, at Biostar Q&A
Things you should know about this course:
- We will start with “The Biostar Handbook”.
- We will use the ONLINE version of the class since the content gets updated frequently. The PDFs are for your reference and to keep a local copy on your computer.
- You have access to the Biostar Handbook resources online for 6 months. The PDFs can be downloaded and kept as long as you like.
- This web page will contain all instructions and assignments.
- You are expected to make progress every week, spending at least 2-4 hours/week reading the material, doing a couple exercises and contributing to class.
- There is a Question & Answer forum for the class, at Biostar Q&A
- Help sessions will be held every week. At each help session, we will go over the material for the week, and answer any questions you may have. You are expected to have read the material for the week before the help session.
- You can work at your own pace on this course. You have access to all materials and what we will be learning each week. Some folks have 2 hours/week to work on the course, while others have 8 hours/week. The help sessions will follow the schedule on this page (see below).
- You will install the software on your government issued computer or an at-home computer.
Week one: Basic bioinformatics (April 3)
- For today, if you have time, please download the class materials as directed in the email from Biostars. Also, you can read the introductory materials in the Biostar Handbook “Preface”, covering the following topics. Don’t worry if you can’t get this done today, there is plenty of time next week to read this and do the computer set-up.
- About the author
- Why bioinformatics?
- What is bioinformatics?
- Biology for bioinformaticians (most of you know this already)
- How is bioinformatics practiced?
- How to solve it
- How not to waste your time
Week two: Installation (April 6 – 10)
- Read through Part II of the Biostar Handbook, “Installation”.
- How to set up your computer (see NOTE above for Windows PC users)
- How to set up conda
- How to install software
- Choose and install a text editor (Notepad++ for Windows PC and Komodo Edit for Mac)
Week three: Unix command line (April 13 – 17)
- Read Part III of the Biostar Handbook, “Unix command line”. While you are reading, open your Terminal program on Mac, or Ubuntu on PC, and type in the commands as you learn them. This way you can see how each of the commands works. We will review the commands and answer any questions in the class sessions (see below).
- Introduction to Unix
- The Unix bootcamp
- Data analysis with Unix
- Using makefiles – Skip this section for now, we can discuss makefiles later in the course
- Data compression
- Session Two: April 16, Thurs, 1 PM … Chat Transcript
- Class WebEx recording: https://cbiit.webex.com/cbiit/ldr.php?RCID=cd589c4d6548574f452d408c5c12ebd7
- Additional Unix resources for more practice:
- The Carpentries, Data Carpentry, Genomics, Introduction to the Command Line for Genomics
- Unix Cheat Sheet (print out this handy guide for reference)
Week four: Data sources (April 20 – 25)
- Read Part IV of the Biostar Handbook, “Data Sources”.
- What is data?
- Biological data sources
- Common data types (GenBank, FASTA, FASTQ, GFF/GTF/BED, SAM/BAM, VCF)
- Human and mouse genomes
- Automating access to NCBI
- Entrez direct by example
-
- Session Two, Data Sources: April 23, Thursday, 1 PM, Chat Transcript
Week five: Data formats (April 27 – May 1)
- Read Part V of the Biostar Handbook, “Data formats”.
- Introduction to data formats
- The GenBank format
- The FASTA format
- The FASTQ format
- Advanced FASTQ processing
- See the GitHub pages at https://github.com/AmyStonelake/BTEP/wiki
- Here are the GitHub pages in PDF format
- chat transcript April 29
- April 29 Class Recording
- chat transcript April 30
Week six: Visualizing data
- Read Part VI of the Biostar Handbook, “Visualizing data”.
- Visualizing biological data
- Using the Integrative Genomics Viewer
To do this week before class
- Please download and install the Integrative Genomics Viewer on your machine. There is a Mac version, a PC version, and a unix/linux version. For our class, the Mac or PC version will be fine.
- Watch the YouTube tutorials – they are just a few minutes long and will be extremely helpful to you. Try to watch all four.
- Data Navigation Basics
- Sequencing Data Basics
- RNA-Seq Data Basics (Splice Junction Track, Downsampling)
- RNA-Seq Data Advanced (Alternative Splicing, Sashimi Plots)
- Class Github pages
Week seven: Sequencing Data
- Read Part XI of the Biostar Handbook, “Sequencing data“
-
- Sequencing coverage
- Accessing the Sequence Read Archive (SRA)
- Automating access to SRA
- How much data in the SRA?
- Class Recording May 13
- Class Recording May 14
- Github Pages
Week eight: Quality Control
- Read Part XII of the Biostar Handbook, “Quality Control”.
-
- Visualizing sequencing data quality
- Sequencing quality control
- Sequencing adapter trimming
- Sequence duplication
- Advanced quality control
-
- Check out an online tutorial at YouTube
- Class recording (May 21)
- Class chat transcript May 20 and chat transcript May 21
- Class GitHub pages
Week nine: Sequence Patterns (postponed to week of June 1 – June 5)
- Read Part XIV of the Biostar Handbook, “Sequence Patterns”
- Sequence patterns
- Regular expressions
- Sequence k-mers. If you’re having trouble running the “jellyfish” program, please see the GitHub page.
- Recording June 4
- Chat transcript June 4
- GitHub pages
Week ten: Sequencing instruments, (Week of June 8 – 12)
- Read Part X of the Biostar Handbook, “Sequencing Instruments”.
-
- Sequencing instruments
- Illumina sequencers
- PacBio sequencers
- Minion sequencers
- Sequencing data preparation
-
- Please watch this Illumina video before class.
- chat transcript Biostar Jun 10
- Sequencing Instruments slides
- Sequencing instruments Github page
- WebEx recording, June 11
Week eleven: Sequence Alignments, (Week of June 15 – 19)
- Read Part XV of the Biostar Handbook, “Alignments”.
- Introduction to alignments
- Global and local alignments
- Misleading alignments
- Github page (16. Sequence Alignments)
- Class Recording, June 18
Week twelve: Review session (Week of June 22-26)
We will review:
- Unix commands
- Decompressing files
- Data types (FASTA, FASTQ, SAM/BAM)
- Quality control of RNA-Seq data (FASTQC, MultiQC, adapter trimming)
- Alignment of RNA-Seq data to genome with bowtie2
- Visualization of RNA-Seq data (Integrative Genomics Viewer, IGV)
- Class Github page: https://github.com/AmyStonelake/BTEP/wiki/17.-Review
- Class Recording, June 25
Week thirteen (no class week of June 29 – July 2)
Week fourteen: BLAST, (Week of July 6 – 10)
- Read Part XVI of the Biostar Handbook, “BLAST”.
- BLAST: Basic Local Alignment Search Tool
- BLAST use cases
- BLAST databases
- Github page, BLAST
- Class recording, July 9
Week fifteen: Short Read Aligners, (Week of July 13 – 17)
- Read Part XVII of the Biostar Handbook, “Short Read Aligners”
- Short read aligners
- The bwa aligner
- The bowtie aligner
- How do I compare aligners?
- Multiple sequence alignment
- Github page, Short Read Aligners
- Github [page, Multiple Sequence Aligners
- Class Recording July 16
Week sixteen: SAM/BAM Format
- Read Part XVIII of the Biostar Handbook, SAM/BAM Format
- The SAM/BAM/CRAM format
- How to make a BAM file
- The SAM format explained
- Working with BAM files
- The SAM reference sheet
- Sequence variation in SAM files
- Class Slides,SAM/BAM/CRAM formats
- Github, SAM and BAM formats
- Recording, July 30
Week seventeen: Genomic Variation and Variation Calling, (Week of Aug 10-14)
- Read Part XIX of the Biostar handbook, “Genomic Variation”
- An introduction to genomic variation
- Online Mendelian Inheritance in Man (OMIM)
- Why simulating reads is a good idea
- Visualizing large scale genomic variations
- Read Part XX of the Biostar handbook, “Variation Calling”
- Introduction to variant calling
- Variant calling example
- Multi-sample variant calling
- Variant normalization
- The Variant Call Format (VCF)
- Filtering information in VCF files
- Variant annotation and effect prediction
- Why is variant calling challenging?
- Thursday, August 13, 1 PM Recording
- Github page
- chat transcript
RNA-Seq session scheduled for Tuesday, Oct 27, 1 – 3 PM
See the following GitHub pages
- RNA Seq Example
- RNA Seq and Gene Enrichment Analysis
- Classification based RNA Seq of control samples
As of October, 2020, this class has ended. If you are interested in a future Bioinformatics for Beginners Class, please send email to ncibtep@nih.gov