QuestionsCategory: Single-Cell RNA-Seqa few experimental design-related questions on single cell RNA-seq?
yinji@nih.gov asked 3 months ago
  1. We are planning to multiplex ~8 solid tissue samples (normal human lung) for scRNA-seq using natural barcodes and we have a few experimental design-related questions.
    1. The main optimization we need is to keep the integrity of all the cell types in the tissue throughout fresh tissue dissociation-freezing-thawing-multiplexing procedure. We are concerned that epithelial cell types (the type that we are most interested in) are especially vulnerable to freeze-thawing process. What would you recommend to optimize to achieve this goal? For example, we are looking into adjusting the enzyme content/concentration to make the dissociation step less harsh to epithelial populations, as well as testing different freezing media.
    2. To achieve the cell viability of cryopreserved cells suitable for 10X 3’ sequencing, would it be worth comparing commercial dead cell removal kit vs. FACS sorting directly into 10X library for removing dead cells after thawing?
    3. What considerations should go into pilot testing for an efficient way to process 8 samples at the same time to minimize cell loss during the potentially long hands-on time?
    4. We are aiming to multiplex 8 samples based on the feasibility stated in the literature, but do you have experience on how feasible it is in reality? E.g. should we consider testing 4 samples to maximize the success in sample identify deconvolution?
    5. We are planning of a pilot experiment of 10X 3’ RNA for a couple of 8-plexed reaction before we expand onto larger samples. In addition to the standard QC metrics (e.g. viability, number of detected cells and genes per cell, mitochondrial content etc) what would you recommend to pay attention to to determine the quality of the data?
  2. In general, validity of cell clusters based on the gene expression might be hard to assess. What are the methods that you recommend to validate the quality of our final results? (e.g. mapping to public clustering data of the same tissue type, matching to Flow cytometry data for select markers, CITE-seq for a limited antibody profile)
6 Answers
kellymc@nih.gov answered 3 months ago

“The main optimization we need is to keep the integrity of all the cell types in the tissue throughout fresh tissue dissociation-freezing-thawing-multiplexing procedure. We are concerned that epithelial cell types (the type that we are most interested in) are especially vulnerable to freeze-thawing process. What would you recommend to optimize to achieve this goal? For example, we are looking into adjusting the enzyme content/concentration to make the dissociation step less harsh to epithelial populations, as well as testing different freezing media.”
As your question rightfully emphasizes – the optimizations of sample preparation (and handling) is crucially important for getting meaningful data from your single cell sequencing assay – if you don’t have the cells going into the assay you will not be able to profile them. Worse yet, if you don’t know you lost them, you could risk making a conclusion based on the composition of cells in the sample. In this case, you know what cells are important in your samples and are concerned that you may lose them in the processing. To assay whether your protocol is optimized for retention of a particular cell type, we usually recommend folks who have access to flow cytometry and a reliable flow panel to use this to identify the composition of cell types and their viability. This ends up being a strong indicator of what you will see in the single cell sequencing data. If you have other ways to mark the cells of interest, simple cell counting or qualitative assessments on a standard microscope are also helpful.
Many dissociation methods / kits / avenues exists, with there often being a trade-off between robust dissociation of the sample and complete separation of troublesome cell clumps (which would get removed by filtering) and the effect that dissociation process has on the viability and transcriptional state of the cell. When the dissociation of whole cells become incompatible with assaying the biological state of the cells you are targeting, you should also consider single nuclei extractions – some would argue that this can be a allow a more comprehensive survey of cells – there are trade-offs here too, but something to at least consider if the ideal dissociation is a never-ending quest.

kellymc@nih.gov answered 3 months ago

“To achieve the cell viability of cryopreserved cells suitable for 10X 3’ sequencing, would it be worth comparing commercial dead cell removal kit vs. FACS sorting directly into 10X library for removing dead cells after thawing?”
Column-based dead cell removal (or any enrichment / depletion for that matter) is only that – don’t expect to go from a 10% viable sample to a 95%+ viable sample (unless maybe you did multiple rounds). Those vendors may argue with us on that, but although we’ve seen some impressive improvement in the percentage of cells in the output sample being viable, it’s not 100%.
Flow-based approaches, with stringent viability settings, will give you much higher purity… at the time that the cell passes the detector. What you have for viability in the tube following the sort can be quite different as cells that were compromised during the sort can begin to die. This depends on how ‘fragile’ you cells are, how aggressive the sort is (nozzle size, pressure, etc), and things like what is in the buffer you sort into. There is a chance that even if cell viability is maintain, the flow sorting process can ‘stress’ the cells and can sometimes results in a decrease in the sensitivity of detecting transcript for scRNA-Seq (possibly because some transcripts are lost if the membrane is transiently compromised during sorting.
Flow sorting has the benefit of being able to ‘clean up’ the sample of some other non-cellular debris as well. Of course, you (and your friendly flow core staff) don’t want to put a messy sample with large debris as input in, but in general samples coming off flow can be a bit easier to work with in many ways (with the caveats stated above).
Bottom line – you can compare, but I would say it would be more important to consider whether you want purity with potentially more harsh handling versus an enrichment of viability cells with potentially a more gentle process. Also, important to keep in mind that any selection or processing can influence the cell type composition of the sample in ways you didn’t anticipate. If you go from a 30% viable sample to a 80% viable sample by passing through a dead cell removal column, are the cells that you removed a particular cell type?
 

kellymc@nih.gov answered 3 months ago

“What considerations should go into pilot testing for an efficient way to process 8 samples at the same time to minimize cell loss during the potentially long hands-on time?”
This is definitely a challenge. If you have extensive processing the samples need to go through, it is an important consideration. We’ve had groups either use the divide and conquer model and others use a factory-line model. The divide-and-conquer model has the very real likelihood that differences in sample handling will show up in the resulting data as a ‘batch effect’. 
Don’t have a perfect answer for this and some depends on the details of what you need to do in the processing. Keep the samples on ice as much as possible, work efficiently, and try to reduce variation in handling that are correlated with the major comparisons you want to run during analysis of the data.

kellymc@nih.gov answered 3 months ago

“We are aiming to multiplex 8 samples based on the feasibility stated in the literature, but do you have experience on how feasible it is in reality? E.g. should we consider testing 4 samples to maximize the success in sample identify deconvolution?”
Yes. I gather from your reference to ‘natural barcodes’ that you are looking to employ a variant-based deconvolution method (along the lines of demuxlet). Your ability to deconvolve will depend on how distant the genotypes are of the multiplexed individuals (this generally does not work for inbred mouse strains), and the reliability of detecting those variants. Although there are some methods that do not require genotype data on all samples (such as a .vcf file), my impression is that the deconvolution is much easier to implement and more robust with this data. Aspects like read depth and the length of read that you use when sequencing the data may be important things to consider as well – I think some of that has been addressed in the literature and some other folks  on campus that use this more routinely may be able to comment.
We (SCAF) don’t routinely suggest this, in part because of the nature of many of the projects we support. It makes the most sense when you are working with a large number of human samples at one time and have access to high quality genotyping data. With any multiplexing method, you should think about what the risk is if you are not able to reliably deconvolve the sample source.
Brief side-note – the fewer multiplexed samples you have, the less you can ‘super load’ the capture lane. Also for seperloading to work, it assumes you have roughly equal proportions of each ‘multiplex-barcoded’ sample. Use this tool to figure out the balance: https://satijalab.org/costpercell
 

kellymc@nih.gov answered 3 months ago

“We are planning of a pilot experiment of 10X 3’ RNA for a couple of 8-plexed reaction before we expand onto larger samples. In addition to the standard QC metrics (e.g. viability, number of detected cells and genes per cell, mitochondrial content etc) what would you recommend to pay attention to to determine the quality of the data?”
Particularly based on your previous questions:
(1) make sure your data represents the cells you expect to be present
(2) If there are cells that you know are going to be fragile and in high abundance (and especially if they have a highly expressed gene), check for the specificity of that gene’s expression to the cluster of cells that it should be restricted too. Example would be erythrocyte lysing causing hemoglobin genes to show up in other cells in the sample. I can give you examples from pineal gland, retina, skin and other tissues where you can see something along these lines. If it means you would benefit from addition washes prior to capture, it would be something you would want to know ahead of time.
(3) most importantly, if your pilot contains samples that represent some aspect of the biological question you are looking to answer with the data – do a preliminary analysis to make sure you can see that signal (and that it doesn’t seem to be overridden by some other technical variation). It will save you from spending a lot of money only to find out you can’t reliably detect the biological signal you are looking for (and you might have known that even from your pilot).

kellymc@nih.gov answered 3 months ago

“In general, validity of cell clusters based on the gene expression might be hard to assess. What are the methods that you recommend to validate the quality of our final results? (e.g. mapping to public clustering data of the same tissue type, matching to Flow cytometry data for select markers, CITE-seq for a limited antibody profile)”
Automated calling of cell types is constantly improving, but we strongly recommend that you don’t just rely upon them as they usually only give a best-match answer when compared to a reference. Knowledge of the tissue and the cell types present is really still essential for efficient analysis of the dataset. Comparing to data from flow or histology is still the best. Comparing to the marker genes identified from public datasets or databases (either with manual comparison or with an automated tool) can definitely be helpful as well. CITE-Seq, from our experience would be more supportive of the conclusions drawn from other source – I would’t set flow-like ‘gates’ on determining cells types using a limited panel of CITE-Seq markers.
Also, important to keep in mind, and something that I don’t think folks appreciate until they start analyzing their data (or are in conversation with the bioinformatic analyst working with them) – the number of clusters you get depend on parameters you set and you will often go through many iterations of the analysis. Each time you change parameters, you cluster designations may have to be updated. An example would be in an analysis of a PBMC sample – at one resolution, all the T-cells will be one cluster, but with improved resolution now you will have to annotate several subsets of T-cells. The higher the resolution your analysis is, the more clusters of cells you will have to get a handle on figuring out what they are, but the greater granularity you will have on the biology (but also keep an eye if any technical variation starts to creep in there too).