Which large data sets like Tabula Muris are available for cell type identification? What are the tools I can use to identify different cell types in my data? How do these tools work?
A very widely used and versatile R package for cell type identification is SingleR: SingleR utilizes the Spearman correlation values between the transcriptome of each cell (gene expression levels in your data) and the reference transcriptome of each cell type from different databases, such as ImmGen (for mouse) or Human Primary Cell Atlas and Blueprint+ENCODE consortium (combined) data sets (for humans). For each cell in your data set, SingleR assigns the highest scoring cell type as the predicted cell type. SingleR also offers options to check for the robustness of these predictions and remove low quality labels when multiple cell types have score similarly for some cells in your data.
For other data sets, such as Tabula Muris, one can use Seurat’s reference-based sample integration and label transfer approach described here: https://satijalab.org/seurat/v3.2/integration.html. With this approach, the reference data set is used as a guide for sample integration.
Seurat also provides an additional option for cell type identification with its AddModuleScore function. This approach is implemented by providing gene sets characteristic of different cell types and letting Seurat compute a score for each cell type for all cells in the data. Using this approach, the highest scoring cell type (per cell) is assigned as the cell type. Seurat actually uses the very same AddModuleScore function for mapping cells to different cell cycle phases by utilizing the canonical markers of G1, S, G2/M phases.
Please login to submit your answer