nCounter® Knowledge Base: Data Analysis
This Knowledge Base serves as a technical resource specifically to answer common questions and assist with troubleshooting regarding nCounter® data analysis; NanoSting University is the primary source for manuals, guides and other documentation.
For additional assistance, email support.spatial@bruker.com
nCounter Data Analysis
General
There are two primary tools to analyze data from your nCounter study: 1) nCounter Data Analysis Pipeline – a free to download and use desktop-based software designed and hosted by Bruker Spatial Biology. It enables quick and easy quality control, normalization, and analysis of nCounter data. 2) Register with EuropaXp to use their cloud-based analysis suite.
A 3rd option is to work with our Data Analysis Services team to leverage the experience and guidance of our in-house data scientists for your next nCounter project.
Yes. The fold change data obtained from an nCounter analysis correlates well with fold change results obtained from microarray analyses. The level of concordance between nCounter results and microarray results is similar to comparisons of different microarray platforms.
Yes, there is excellent correlation between nCounter and qPCR analyses, both in terms of relative expression levels and fold changes. Moreover, the multiplexing capabilities of nCounter analyses increase the efficiency with which data can be obtained at qPCR levels of sensitivity. We therefore recommend using nCounter analyses to extend your current set of qPCR data.
We do not recommend changing the RLF name as this can cause difficulties with data collection and analysis as well as lead to confusion if the data are analyzed in the future by someone unaware of the RLF modification. We strive to maintain the single correct version of each RLF file within our bioinformatics database. If you are seeing differences in content within a single RLF version, please contact support.spatial@bruker.com with the RLFs in question.
Parametric statistical tests operate on the assumption that the data conforms to some expected distribution, such as a normal distribution if performing a t-test. Transforming linear data into log2 values will generally satisfy this requirement.
The nCounter Data Analysis Pipeline automatically performs log2 transformations in the background before performing any statistical testing, and as such all the reported p-values are already based on log-transformed data.
If performing any data analysis outside of the nCounter Data Analysis Pipeline software, it is recommended to work from log-transformed nCounter data.
nCounter Data Analysis Pipeline
No – cancel the study and create a new one with the correct features.
Any version starting with 4.0 (Arbor Day) should work.
Check which version of R you are using. If you have not installed R, other than with nSolver previously, please install the latest version (>4.0), and try again.
This is not a feature of the Data Analysis Pipeline. The Study database, while a useful way of tracking studies in nSolver, created some issues for users. We have decided that the Data Analysis pipeline will not have a study database – instead it simply has a location on the computer where all studies are stored. You can re-do any analysis from nSolver in Data Analysis Pipeline.
Yes, this is expected! We have updated the analysis tools to what we believe is in line with our latest and greatest understanding of how analysis should be done. However, that doesn’t mean you need to re-do all your studies, or that your former answers were “wrong”. They were simply performed with a different tool. As long as you mention in methods when publishing or presenting your work what tools you used, this is acceptable and happens in many studies.
Data QC and Normalization
The positive controls are spike-in oligos used for quality control. The positive control counts in each sample are influenced by a number of factors: pipetting accuracy, hybridization efficiency (e.g. inaccurate temperature or presence of contaminants from sample input that inhibit hybridization), as well as sample processing and binding efficiency.
Positive controls serve three general QC purposes:
Assess the overall assay efficiency. The nCounter Data Analysis Pipeline software raises a warning flag when the geometric mean of positive controls is >3 fold different from the mean of all samples.
Assess assay linearity. Decreasing linear counts are expected from POS_A to POS_F.
Assess limit of detection (LOD). It is expected that counts for POS_E will be higher than the mean of negative controls plus two standard deviations.
Some level of variability among positive control counts is expected. If you receive no positive/negative control QC flags in the nCounter Data Analysis Pipeline software, you may rest assured that the assay worked as expected. Even if you do receive warning flags, it does not necessarily mean the assay has failed. You may send your RCC files to support.spatial@bruker.com, and we will be happy to check for root cause of the flags for you.
FOV (field of view) registration is as close to 100% as possible, but minimally 75%.
Binding density is in the linear dynamic range (between 0.05-2.25 for PRO/MAX/FLEX, 0.1-1.8 for SPRINT).
POS controls (POS_A to POS_E) have robust counts and are in a linear range (R^2 higher than 0.95).
NEG controls have low counts (average < 50 is expected).
At least three housekeeping genes have reasonable counts that are above background and cover the range of gene expression (counts in the thousands and counts in the hundreds, etc).
The total surface area of each lane in a cartridge is scanned in multiple discrete units called fields of view (FOV). After scanning is complete, the FOV within each lane are aggregated together to generate total counts across the entire surface area within each lane. The “Imaging QC” metric quantifies the performance of this imaging process. Specifically, it is a fraction that is calculated by dividing the number of FOVs that have successfully been scanned by the number of FOVs that were attempted to be scanned. Significant discrepancy between the number of FOV for which imaging was attempted (“FOV Count”) and for which imaging was successful (“FOV Counted”) may indicate an issue with imaging performance.
Within the nCounter Data Analysis Pipeline software, a sample that has an Imaging QC value less than 0.75 (or 75%) will be flagged. The threshold of 0.75 was selected based on internal testing that evaluated performance over a range of FOV values. The scanner is more likely to encounter difficulties near the edge of the slide. Therefore, when the maximum scan setting is selected for PRO/MAX/FLEX systems (the SPRINT instrument has one scan setting), it is more likely that some FOV will be dropped. Reduction in number of FOV counted does not compromise data quality and is accounted for during data normalization. However, when a substantial percentage of FOVs are not successfully counted, there may be issues with the resulting data. Consistent large reductions in percentages can be indicative of an issue associated with the instrumentation.
If Imaging QC is less than 0.75, then clean the bottom of the cartridge with a lint-free wipe, and re-scan the cartridge, being sure that the cartridge lays flat in the scanner. If Imaging QC is greater than 0.75, then a re-scan may be performed, if desired, in attempt to increase number of FOV counted, though as a routine practice this is not necessary or recommended. Please note that the re-scan option is currently available for PRO/MAX/FLEX systems only; it is not available for the SPRINT system. If the re-scan does not improve imaging performance in samples with Imaging QC less than 0.75, then email the raw data (RCC files) and instrument log files to support.spatial@bruker.com. The data and logs will be examined for hardware or assay problems.
A binding density refers to the number of barcodes/μm2. The recommended range is from 0.1 to 2.25 for PRO/MAX/FLEX instruments and 0.1 to 1.8 for SPRINT. If the density is less than 0.1, the instrument may not be able to focus on the cartridge due to a lack of optical information. If the density is greater than maximum on the platform, the barcode overlap will result in a loss of data, as overlapping barcodes are excluded from the analysis.
A combination of several factors can affect binding density, including:
Assay input quantity: the higher the amount of input used for the assay, the higher the Binding Density will be. The relationship between input amount and Binding Density is linear until the point of assay saturation. Conversely, if the amount of sample input is too low, the Binding Density will likely be flagged for being less than the optimal range.
Expression level of genes: if the target genes have high expression levels, there will be more molecules on the lane surface which will increase the Binding Density value.
Size of the CodeSet: a large CodeSet with probes for many targets is more likely to have high Binding Density values than a CodeSet with probes for fewer targets. A small CodeSet with a limited number of targets is more likely to have low Binding Density values.
A QC flag does not necessarily mean that data from a flagged lane cannot be used. The thresholds for QC flags are set at a conservative level in order to both catch samples which may have failed, and also to identify samples with usable data which happened to experience a reduction in assay efficiency.
To determine whether a QC flag is indicating a critical problem, examine the raw and normalized data and check whether the flagged samples have a poorer limit of detection for low count transcripts when compared to non-flagged samples. For some genes, differences in expression level between samples will be caused by differences in treatment or pathology, so it may be more appropriate to determine if the expression of only the low count genes for any flagged lane falls within the range of expression values observed across a number of unflagged samples which come from different treatments or pathologies.
One can approach this potential limit of detection question in a number of ways. First, a simple visual scan of the data may suffice to detect problems in the flagged samples. This can be performed on raw data which have been background subtracted in the nCounter Data Analysis Pipeline software to identify targets that are below the background. Alternatively, outlier samples could be identified by generating a heat map of normalized data from all samples to see if the flagged samples in question are strongly divergent from other samples with similar pathology. Another option would be to examine the calculated QC metrics within the nCounter Data Analysis Pipeline software. If these QC metrics have only exceeded the threshold by a very small margin (i.e., the FOV registration is 74% instead of 75%), then the resultant data are generally going to be quite robust and usable.
More details on QC flags can be found in the nCounter Data Analysis Pipeline software user manual. If QC flags become more than a rare anomaly, we encourage you to contact our support team (support.spatial@bruker.com and/or your local Field Application Scientist) in order to assist you in tracking down the root cause of these potential problems with the assay consistency.
Data normalization is designed to remove sources of technical variability from an experiment, so that the remaining variance can be attributed to the underlying biology of the system under study. The precision and accuracy of nCounter Gene Expression assays are dependent upon robust methods of normalization to allow direct comparison between samples. There are many sources of variability that can potentially be introduced into nCounter assays. The largest and most common categories of variability originate from either the platform or the sample. Both types of variability can be normalized using standard normalization procedures for Gene Expression assays.
Standard normalization uses a combination of Positive Control Normalization, which uses synthetic positive control targets, and CodeSet Content Normalization, which uses housekeeping genes, to apply a sample-specific correction factor to all the target probes within that sample lane. These correction factors will control for sources of variability such as pipetting errors, instrument scan resolution, and sample input variability that affect all probes equally.
Note that Positive Control Normalization will not correct for sample input variability, and thus should usually be used in combination with CodeSet Content (housekeeping gene) Normalization. Performing such a two-step normalization will usually not differ mathematically from Content Normalization alone, and thus is mathematically somewhat redundant. Nevertheless, normalizing to both target classes will provide a good indicator of how technical variability is partitioned between the two major sources of assay noise (platform and sample), and thus may provide a good tool for troubleshooting low assay performance. Normalization workflows are described below.
nCounter Reporter probes (or TagSet probes) are manufactured to contain six synthetic ssDNA control targets. The counts from these targets may be used to normalize all platform-associated sources of variation (e.g., automated purification, hybridization conditions, etc.).
The procedure is as follows:
Calculate the geometric mean of the positive controls for each lane (POS_E to POS_A).
Calculate the arithmetic mean of these geometric means for all sample lanes.
Divide this arithmetic mean by the geometric mean of each lane to generate a lane-specific normalization factor.
Multiply the counts for every gene by its lane-specific normalization factor.
It is expected that some noise will be introduced into the nCounter assay due to variability in sample input. For most experiments, normalization of sample input is most effectively done using so-called housekeeping genes. These are mRNA targets included in a CodeSet which are known to or are suspected to show little-to-no variability in expression across all treatment conditions in the experiment. Because of this, these targets will ideally vary only according to how much sample RNA was loaded.
Using the geometric mean of three housekeeping genes, at minimum, to calculate normalization factors is highly recommended. This is done in order to minimize the noise from individual genes and to ensure that the calculations are not weighted towards the highest expressing housekeeping targets. It is important to note that some previously-identified housekeeping genes may, in fact, behave poorly as normalizing targets in the current experiment, and may therefore need to be excluded from normalization.
The procedure is the same as that for Positive Control Normalization:
Calculate the geometric mean of the selected housekeeping genes for each lane.
Calculate the arithmetic mean of these geometric means for all sample lanes.
Divide this arithmetic mean by the geometric mean of each lane to generate a lane-specific normalization factor.
Multiply the counts for every gene by its lane-specific normalization factor.
Samples with normalization flags have counts for either the positive controls and/or housekeeping genes that are much lower or higher than most of the samples included in the analysis. Samples with a Positive Control Normalization flag may indicate a notable difference in hybridization/assay performance as compared to most of the samples included in the analysis. In certain situations, samples with these Positive Control Normalization flags may need to be re-run/excluded. Samples with a CodeSet Content Normalization flag may indicate a notable difference in RNA quality and/or input amount as compared to most samples included in the analysis. Samples will have CodeSet Content Normalization flags if the CodeSet Content Normalization factor is < 0.1 or >10, as anything beyond these values will result in inaccurate normalization. As such, samples with CodeSet Content Normalization flags may need be excluded or (if possible) re-run at higher or lower input amounts depending on the normalization factor. To determine whether a QC flag is indicating a critical problem, examine the raw and normalized data and check whether the flagged samples have a poorer limit of detection for low count transcripts when compared to non-flagged samples. For some genes, differences in expression level between samples will be caused by differences in treatment or pathology, so it may be more appropriate to determine if the expression of only the low count genes for any flagged lane falls within the range of expression values observed across a number of unflagged samples which come from different treatments or pathologies.
A positive control normalization flag indicates that the POS controls for the lane (sample) in question are more than three-fold different (greater or smaller) than the POS control counts from the other samples in the experiment. High POS control counts are rarely problematic, so a flag usually only indicates a problem when the POS controls are particularly low for a sample. Such low POS counts are indicative of relatively low assay efficiency at capturing and counting targets, which may lower sensitivity or introduce bias into the assay.
To determine whether a POS control normalization flag is indicating a critical problem, examine the raw and normalized data and check whether the flagged samples have a poorer limit of detection for low count transcripts when compared to non-flagged samples. For some genes one should anticipate differences in expression level between samples due to differences in treatment or pathology, so it may be more appropriate to see if the expression of the low count genes for any flagged lane falls in the range of expression values observed across a number of unflagged samples which come from different treatments or pathologies.
One can approach this potential limit of detection question in a number of ways. First, a simple visual scan of the data may suffice to detect problems in the flagged samples. This can be performed on raw data which have been background subtracted in the nCounter Data Analysis Pipeline software to identify targets that are below the background. Alternatively, outlier samples could be identified by generating a heat map of normalized data from all samples to see if the flagged samples in question are strongly divergent from other samples with similar pathology. Another option would be to examine the calculated POS control normalization factors within the nCounter Data Analysis Pipeline software. If these factors have only exceeded the threshold by a very small margin (i.e., the POS control normalization factor is 3.2), then one can usually assume that the resultant data are generally going to be quite robust and usable for the majority of data sets.
More details on POS control normalization flags can be found in the nCounter Data Analysis Pipeline software user manual. If POS control normalization flags become more than a rare anomaly, we encourage you to contact our support team (support.spatial@bruker.com and/or your local Applications Scientist) in order to assist you in tracking down the root cause of these potential problems with the assay consistency.
A QC flag for content normalization indicates that the flagged sample had a content (or housekeeping gene) normalization factor more than 10-fold different from the average sample in the same experiment. In other words, the flagged sample had significantly lower or higher counts in the Housekeeping genes which are used to normalize sample input. Although unusually high housekeeping gene counts would not typically be problematic, it is much more common to see samples with lower housekeeping gene counts, and these would be flagged if the content correction factor for that sample were greater than 10.
Content normalization flags can be caused by either a significant reduction in overall assay efficiency for that sample, or because of an effective reduction in quantity or quality (fragmentation) of the input RNA. The likelihood of a reduction in assay efficiency can be assessed by the presence of any other QC flags for that sample. If the lane failed the QC specifications by a large margin for any of the other QC metrics (including POS control normalization), then overall counts may be reduced enough to also cause a Content normalization flag. Essentially, in this scenario the assay is working so poorly that the counts for endogenous and housekeeping genes are dramatically reduced even if sufficient RNA targets are present. If, however, the sample had no other QC flags except that for Content normalization, this usually means that the assay is working well, but there were insufficient RNA targets to count. This can be caused either by low RNA concentrations or highly fragmented RNA, such as from an archival FFPE sample.
To determine whether a Content normalization flag is creating a critical problem, examine the raw and normalized data and check whether the flagged samples have a poorer limit of detection for low count transcripts when compared to non-flagged samples. For some genes one should anticipate differences in expression level between samples due to differences in treatment or pathology, so it may be more appropriate to see if the expression of the low count genes for any flagged lane falls in the range of expression values observed across a number of unflagged samples which come from different treatments or pathologies.
One can approach this potential limit of detection question in a number of ways. First, a simple visual scan of the data may suffice to detect problems in the flagged samples. This can be performed on raw data which have been background subtracted in the nCounter Data Analysis Pipeline software to identify targets that are below the background. Alternatively, outlier samples could be identified by generating a heat map of normalized data from all samples to see if the flagged samples in question are strongly divergent from other samples with similar pathology. Another option would be to examine the calculated QC metrics within the nCounter Data Analysis Pipeline software. If these QC metrics have only exceeded the threshold by a very small margin (i.e., the FOV registration is 74% instead of 75%), then the resultant data are generally going to be quite robust and usable.
More details on Content normalization flags can be found in the nCounter Data Analysis Pipeline software user manual. If QC flags become more than a rare anomaly, we encourage you to contact our support team (support.spatial@bruker.com and/or your local Applications Scientist) in order to assist you in tracking down the root cause of these potential problems with the assay consistency.
While many mRNAs demonstrate low variance across tissues, there simply is no single set of mRNAs that can be used across all experimental conditions and tissues.
It is recommended that every CodeSet design have at least 3 – 6 “reference” or “housekeeping” targets to use for technical variance normalization. Characteristics of effective reference targets are 1) minimal variance across samples, and 2) high correlation with each other (assuming technical variance is much lower than biological variance).
If you have generated data on the nCounter platform or other platforms previously that show certain targets do not vary across your treatment conditions, and that they fit the above criteria, these would be ideal targets to start with as reference mRNAs. However, if you haven’t characterized candidate reference targets yet, it is important to measure the expression of at least 6 – 8 candidate genes in a pilot experiment. Starting with this number of candidates should allow you to identify a set of 3 or more useful targets, as some may drop out due to higher-than-expected variance or biological effects across your samples and treatments.
To select candidate genes, potential reference targets can be gleaned from online reference gene tools (such as Refgenes or NormFinder), pre-existing data, or the literature in your field. Please note the reference gene tools are not affiliated with Bruker Spatial Biology; please see the linked websites for support.
There are several options to perform background subtraction using the nCounter Data Analysis Pipeline software. To estimate background, we provide several probes in each Codeset for which no target is present. These negative controls can be used to estimate background levels in your experiment. Background levels may be estimated using either the average of the negative controls for that lane or the average of the negative controls plus a multiple of the standard deviation of all the negative controls in a lane. Alternatively, background levels may also be estimated by running a blank lane in which nuclease-free water instead of RNA is added as input; this will generate a background measurement that will estimate probe-specific background levels instead of general background levels, as estimated from a set of negative controls. Once the appropriate background level has been determined, the background counts are subtracted from the raw counts to determine the true counts.
For most nCounter gene expression experiments, except those from miRNA panels, reference gene normalization is the preferred normalization method. For panels without robust or stably expressed housekeeping targets, global normalization may provide better normalization as long as relatively few genes show expression level changes as a result of the experimental treatment.
The stability of putative housekeeping genes may be assessed using either the %CV metric within the nCounter Data Analysis Pipeline software.
The best approach for normalizing miRNA data will depend mostly on the sample type they represent. For everything except biofluids (such as plasma or serum), using a “global” normalization method which normalizes to total counts of the 100 most highly expressed (on average) miRNA targets across all samples is recommended. This method does not use the Positive Control or Positive Ligation Control probes for any of these calculations.
However, it does get more complicated with biofluids or other samples where the number of expressed targets drops below ~150-200 targets. As a frame of reference, targets expressed above background are usually identified by comparison to the Negative control probes (either the mean, mean +2 Standard Deviation, the maximum value of the NEG probes, or 100 to be conservative).
When normalizing samples from biofluids, a judgement call can be made depending on how many targets are expressed above background. In the miRNA assay, background would usually be ~30 counts, but will vary from one experiment to the next. Therefore, sometimes a global approach (TOP 100 method) can still work with biofluids if samples express 100-150 miRNA targets above this cutoff.
However, if this is not the case, the identification of good “housekeeper” miRNAs will likely allow you to normalize and obtain robust results. There are not many well-characterized housekeeper miRNA targets from plasma or other biofluids, as they do seem to vary depending on extraction kits and pathologies being studied. Consequently, a literature search would not necessarily help you determine appropriate housekeepers and a more data-driven approach would be better suited. Using third party software or algorithms can identify the most stably expressed targets within the particular experiment. It is recommended that this method of identifying housekeeping genes be repeated as more data is generated to confirm these are appropriate for the entirety of the study and not just for the initial experiment.
The path of least resistance on published algorithms for Stable Housekeeper gene identification is NormFinder, because it is free and easy to use.
Claus Lindbjerg Andersen, Jens Ledet Jensen and Torben Falck Ørntoft. Cancer Res 2004;64:5245-5250.
http://cancerres.aacrjournals.org/content/64/15/5245.Supplemental Methods
http://cancerres.aacrjournals.org/content/suppl/2004/08/24/64.15.5245.DC1.htmlSoftware download: http://moma.dk/normfinder-software
geNorm is another program that uses slightly different principles. Specifically, NormFinder chooses targets with the lowest within and between group variance, while geNorm also picks multiple targets that give the lowest estimates of variance when they are used together (NormFinder only picks them individually or gives the best two together). geNorm can be obtained with a license.
If Spike-In synthetic miRNAs are used to normalize variance introduced in purification of samples, it is assumed and highly recommended that equal volume inputs are used across samples. Synthetic oligos must be spiked in before sample extraction, and it is strongly recommended that Spike-Ins are used for all samples in that experiment.
Three Methods for Normalization
Normalize using only the Spike-In control probes
Normalize using only the Housekeeping miRNA targets as identified by the user.
First normalize all the endogenous counts (including the putative miRNA housekeepers) to the Spike-In control probes. Then use the spike-In normalized miRNA housekeeper counts to normalize the endogenous miRNA targets. This option is not available in the nCounter Data Analysis Pipeline software so it would need to be performed in Excel. The basic workflow in Excel is:
For each lane calculate the geometric mean of the Spike-In controls.
Calculate the arithmetic mean of these geometric means across all lanes.
Divide this arithmetic mean by the geometric mean in each lane (calculated in #1) to get a lane-specific normalization factor.
Multiply all the endogenous counts in a lane its lane-specific normalization factor.
Repeat 1 through 4 using the Spike-In normalized housekeeper miRNA targets.
The three methods for normalization may yield similar results. Typically, the better normalization approaches will result in overall lower variance. Below is an example graph depicting what would be expected of a typical normalization method. For each of the three methods, variance should be calculated, and the lowest variance method should be chosen. Theoretically, the third method provides the best reduction in technical and sample input variance.
Data Interpretation
Pathway scores are designed to summarize expression level changes of biologically related groups of genes. This score can help identify pathways that are being altered by the pathology or treatment under study, and thus can help contextualize differential expression changes observed for individual genes. Pathway scores are derived from the first Principal Components Analysis (PCA) scores (1st eigenvectors) for each sample based on the individual gene expression levels for all the measured genes within a specific pathway. Although expression levels from multiple genes will generally comprise this first PC, some of these genes will have much higher weight applied to them if they capture a greater proportion of the variability in the data.
Typically, Pathway Scores will be positive for pathways containing many up-regulated genes, and negative for those containing more down regulation of expression. One can generally make direct comparisons between scores of an individual pathway across samples within an experiment (that is, a comparison of one sample’s cell cycle pathway score to another sample’s cell cycle pathway score), and a higher score for the same pathway will generally mean greater levels of up-regulation. However, comparisons between different pathways within the same experiment or across different experiments is not recommended. Moreover, because of the complexity of the calculations for Pathways scores, interpretation of these should NOT be performed without correlating them to other analysis results to ensure that they are placed in the correct biological context. Thus, before concluding that a pathway has been upregulated in a group of samples, it is advised to correlate the pathway level findings to the expression levels of individual genes within that pathway.
Both Pathway Analysis and Gene Set Analysis (GSA) are higher level assessments of expression changes that may be occurring within related sets of genes from the same pathway. Because both scores are generated from differences in expression between samples across many genes, the scores should be roughly concordant with each other. However, differences in the way that the calculations are performed may lead to some divergence between scores, as well as differences in the interpretation of these higher-level measurements.
One important difference is that Pathway scores are generated for individual samples, while GSA scores are ‘population’ or ‘group’ level statistics and thus measure patterns between sample groups. A subtler difference is that a Pathway score uses results from only the first PC of the PCA, meaning that it can explain only some proportion of the variance in the data, which may also cause some differences when making comparisons to the GSA scores.
Notably, Pathway scores are generated from weighted expression level data, while the genes from any pathway are given equal weight in the calculation of GSA scores. The Differential gene weights in Pathway scores can allow them to detect changes that affect only a small portion of the genes in a pathway, which may be obscured in GSA if most genes in the pathway do not show significant changes in expression (that is, have a small t-statistic). Similarly, if many genes in a pathway show consistent trends in expression which are not individually significant, Pathway scores may have better sensitivity to detect these trends compared to the statistical summation approach of GSA.
In summary, comparing pathway scores directly to those from the GSA module should be performed with caution, and should always be correlated or cross-referenced with expression level changes in individual genes to ensure that biological interpretations are supported.
Cell type profiling scores are generated for immune cell types using expression levels of cell-type specific mRNAs as described in the literature. For details of the selection and validation process for these markers, see Danaher et al 2016 (Gene expression markers of Tumor Infiltrating Leukocytes BioRxiv August 11, 2016).
The cell type score itself is calculated as the mean of the log2 expression levels for all the probes included in the final calculation for that specific cell type. Because the scores are dependent on probe-specific counting and capturing efficiencies, these should only be interpreted as relative cell abundance values compared to the same cell type within other samples or groups of samples. The scores should not be used as measures for the abundance of a cell type relative to other cell types within the same sample, nor should it be used to quantitate cell abundance within a single sample.
Cell type scores may be calculated as raw or relative scores. The raw cell scores will measure the overall cell abundance for that type of cell, whereas the relative cell scores measure the specific cell abundance relative to (essentially normalized to) the abundance of Tumor Infiltrating Leukocytes (TILs) in that sample. These are defined as the average of B-cell, T-cell, CD45, Macrophage, and Cytotoxic cell scores. This relative score can alternatively be customized to incorporate a baseline cell type or mixed population other than TILs.