QuestionsCategory: ChIP-Seq Data AnalysisPipelines and QC: Since reads are mapped to unique regions of the genome, what happens with data from repeat regions such as long non-coding RNA, LINEs, etc? Will any reads be mapped to these regions?
stonelakeak@nih.gov Staff asked 5 months ago

Since reads are mapped to unique regions of the genome, what happens with data from repeat regions such as long non-coding RNA, LINEs, etc? Will any reads be mapped to these regions?

1 Answers
stonelakeak@nih.gov Staff answered 5 months ago

It really depends on the type of repeat region you are discussing and how you decide to deal with blacklisted regions. Simple repetitive regions and transposable elements have little variation between repeats and result in a lot of multimapping. Because you cannot accurately ascertain which region of the genome these reads belong to, they have a tendency to cause problems with many different steps in the ChIP-seq processing, so it is often better to remove them from the traditional analysis. Long non-coding RNAs on the other hand are much more degenerate and do not tend to be in blacklisted regions. If you are setting up a workflow yourself, you have the option of removing these reads during the mapping step or at any later stage of the pipeline. – answered by Tovah Markowitz, Paul Schaughency, Vishal Koparde.