- Q: Do you offer analysis for data other than microarray and next-generation sequencing?
A: Yes. We frequently get requests for statistical analysis of additional types of data, for example, metabolomics data from LCMS or protein array data, or clinical/patient data. Please contact us for more information about your particular project, and we can discuss the services we can provide for you.
- Q: Do you only work with UIC researchers?
A: As we are physically located in the College of Medicine West building on UIC’s west campus, we are ideally located to work closely with UIC’s medical and biological researchers. However, RIC services are available to outside academic and commercial institutions as well, at a higher rate than for internal UIC users. Researchers from Rush University, Northwestern University, and University of Chicago may use RIC services at internal rates.
- Q: How much do your services cost?
A: We have standardized analysis methodologies for many services, such as RNA-seq, but due to the rapidly changing nature of the bioinformatics field and the analysis complexities that are specific to different projects, we are unable to provide static quotes for services. If you would like a quote for your project, please contact us to schedule a free consultation. After reviewing your project and analysis needs we will be able to provide a project plan and budget. Internal rates for customized analysis services is $85/hour.
- Q: Where can I get technical support for creating RRCANs or with registration
A: Please contact Scientific Computing Core at firstname.lastname@example.org for technical support.
- Q: Should I use next-generation sequencing (NGS) or microarray for my project?
A: This depends on the scope and budget of your project. As a general rule, microarrays are cheaper and faster to run and analyze than NGS, but your measurements are limited to the features on the array so there is less scope for novel discovery. For genotyping/resequencing projects, microarray platforms generally have hundreds of thousands to millions of SNPs, and aim to cover the most common variants in a population. They usually do not cover insertions and deletions, and are generally not suitable for detecting somatic mutations. If your goal is to genotype a large cohort of patients, microarray will likely be the most effective platform; the Sequenom platform used by the UIC Core Genomics Facility also offers the ability to cheaply genotype a customized set of SNPs (10s to 100s) on a large number of patients. If your goal is to study rare variations that may not be present on an array, more complex variants than SNVs, or somatic mutations (i.e., in cancer) in a smaller set of samples, NGS is most appropriate.For gene expression, there are a number of array options that can cover the entire transcriptome for well-annotated species (human, and common model organisms like mouse and fly). Thus, if your goal is primarily differential expression on a gene level, microarray is probably a good bet, although 3′ RNA-seq protocols have also recently developed as an economical sequencing option for standard gene expression measurements. If you want to measure changes in gene splicing/isoforms, or discover novel, unannotated transcripts (like lncRNAs), NGS is the probably the best approach; for differential splicing, note that there are new microarray platforms that will provide detailed information about splicing through exon-exon junction probe sets.For protein-DNA binding (ChIP-seq) and other epigenomics studies, we recommend NGS for all experiments. Although there are microarray options (e.g., ChIP-chip), in general it is difficult to appropriately cover an entire mammalian genome on an array, and NGS offers much higher precision and coverage.
- Q: What is the difference between biological and technical replicates? Why should I do biological replicates?
A: Technical replicates are essentially repeated measurements of the same sample, for example, preparing multiple libraries from the same cell population and sequencing the libraries separately. Biological replicates are measurements of different samples that are in the same biological condition, for example cells obtained from two mice that are the same age, genotype, treatment, etc.Collecting biological replicates is critical for accurately detecting changes between different biological conditions (such as WT vs KO). This analysis typically boils down to some version of comparing within-group variation to between-group variation, as you would do in a t-test, and so estimating within-group variation correctly is very important. There is always natural variation from measurement-to-measurement – some comes from the measurement itself, and some is inherent to the biological system, such as individual mouse-to-mouse differences. Technical replicates will only reflect the former, but biological replicates will reflect both. Since different samples from different conditions must vary at least as much as different samples from the same condition, biological replicates are necessary to differentiate conditions.
- Q: I want to do gene expression microarray. Which platform should I choose, and how many samples/replicates should I collect?
A: The appropriate choice of array depends on the budge and scope of your project: bigger expression arrays give more information about gene splicing and isoforms, but are more expensive. 3’ arrays consist of a single probe set per gene and are generally suitable for differential expression on the gene level. Exon arrays have probes for each exon of a gene, and provide indirect information about differential gene splicing through changes in exon expression. Newer arrays from Affymetrix, such as HTA 2.0, include probes for exon-exon splice junctions and so give direct evidence of differential splicing.We recommend at least 3 biological replicates per condition, and ideally 4 or 5 in case one sample is of low quality, for both RNA-seq and gene expression microarray. If samples are collected from patient cohorts, rather than an animal model or cell line, you will need considerably more than 3 per group, but the exact number is difficult to predict as it depends on the person-to-person variation within each group. In some circumstances, fewer than 3 replicates is sufficient. For instance, if you are following a time series, 1 or 2 samples per time point may be sufficient.
- Q: How many samples/replicates should I collect to do RNA-seq. and what sequencing depth should I aim for?
A: We recommend at least 3 biological replicates per condition, and ideally 4 or 5 in case one sample is of low quality, for both RNA-seq and gene expression microarray. If samples are collected from patient cohorts, rather than an animal model or cell line, you will need considerably more than 3 per group, but the exact number is difficult to predict, as it depends on the person-to-person variation within each group. In some circumstances, fewer than 3 replicates is sufficient. For instance, if you are following a time series, 1 or 2 samples per time point may be sufficient.The level of sequencing depth depends on the scope of the experiment, and the type of RNA-seq performed. For differential expression from whole-transcript RNA-seq (the standard experiment), we recommend at least 20-30M reads for a mammalian genome. Keep in mind that the deeper you sequence, the better you will be able to distinguish changes between low-expressed genes, where most of the noise is. If you are interested in discovering novel isoforms or non-coding transcripts, we recommend much deeper sequencing, 100M reads or more. We also strongly recommend paired-end sequencing for RNA-seq, especially if differential splicing/isoforms are of interest. These depths may change based on other factors as well, such as the quality of the RNA sample (how degraded transcripts are), and the strategy used to exclude ribosomal RNA from sequencing (rRNA depletion versus polyA capture).On the other hand, if you are primarily interested in gene expression, and not differentiation of isoforms or discovery of novel transcripts, 3′ RNA-seq – where only the 3′ end of transcripts are sequenced – offers a more economical option, as ribosomal RNAs are not a concern, and sequencing depth as low as 5M reads may be acceptable, and only single-end sequencing is needed.For miRNA-seq, as few as 5M reads is sufficient for differential expression of annotated miRNAs. If you would like to discover unannotated miRNAs as well, we recommend closer to 25M reads. Single-end sequencing is sufficient for miRNA-seq.For more information, we recommend reading the ENCODE guidelines for RNA-seq.
- Q: I want to do genotyping by microarray. Which platform should I choose, and how many samples/replicates should I collect?
A: The choice of platform depends on the scope of the project and the samples being genotyped. For instance, Affymetrix offers arrays tailored to different ethnic groups (such as the Axiom Pan-African arrays for people of African descent) that capture the bulk of genetic variation within that group. Additionally, you can design custom arrays for a set of 10s to 100s of individual SNVs to test, using Sequenom. We recommend that investigators contact the UIC Core Genomics facility to get more information about the available platforms.In general, one replicate is sufficient for genotyping arrays, as long as the quality of the DNA is sufficient.
- Q: I want to do DNA resequencing. What sequencing depth should I aim for, and how many samples/replicates should I collect?
A: The sequencing depth depends on the type of variation you are looking for, namely germline genetic variants or somatic mutations. For germline variation we recommend at least 50x coverage (average reads per base). For somatic mutations we recommend at least 150x coverage. The overall recommended sequencing depth then depends on the genomic domain being resequenced. For example, for detecting germline variation in whole-exome reqsequencing in humans (~30Mb of coding sequences) with 2×100 paired-end reads, we would recommend ~11M paired-end reads: 30Mb * (1/200 bases/read pair) * (50 reads/base coverage) * 1.4 – the “buffer” factor of 1.4 adds 40% depth to account for discrepancies from the ideal coverage, which includes PCR duplication, variance in coverage, and low-quality reads.A couple extra notes about somatic mutations: the ability to detect these mutations depends strongly on the purity of the affected tissue (histology of the tumor sample). Samples from tumor tissue should be paired with a control sample from the same individual to differentiate somatic mutations from germline variants. Finally, the deeper sequencing recommended for these experiments typically creates more redundant reads (PCR duplicates) and thus we recommend considering a larger buffer factor, possibly as high as 2.0; recommended depth for somatic mutation in exome sequencing with 2×100 reads is closer to 45M paired-end reads.In general, we recommend paired-end reads for resequencing projects, as longer fragments yield higher confidence alignments, better differentiation of PCR duplicates, and more accurate SNP calls. In general, one replicate is sufficient for DNA resequencing, as long as the quality of the data is sufficient.
- Q: I want to do ChIP-seq. How many samples/replicates should I collect, and what sequencing depth should I aim for?
A: We recommend 2 replicates per condition, with a paired input (no-IP DNA-sequencing) for ChIP-seq. Sequencing depth depends on the type of protein being studied, whether a narrow mark or broad mark. For narrow marks, where the protein binds in a site-specific manner (true of most transcription factors) or is highly localized (promoter- or enhancer-associated histone marks, like H3K9Ac or H3K4me3), we recommend 20-30M reads. For broad marks, where the ChIP enrichment may span large (>50kb) domains of the genome (histone marks like H3K27me3, H3K9me2), we recommend at least 75-100M reads. Single-end reads are typically sufficient for ChIP-seq.For more information, we recommend reading the ENCODE guidelines for RNA-seq.
- Q: I want to study epigenomic marks. What methodology should I choose?
A: There are a large variety of experiments you can choose to measure various epigenomics marks. Histone modifications, which are associated with a variety of chromatin states like promoters, enhancers, active transcription, and silenced transcription, can be measured by ChIP-seq.Regions of open chromatin (i.e., absence of nucleosomes), which are associated with active transcription and protein-DNA binding, can be measured by DNase-seq or FAIRE-seq. Alternatively, nucleosome positioning can be measured by MNase-seq. However, a new methodology, ATAC-seq, can be used to measure both open chromatin and nucleosome positioning (the latter only if paired-end sequencing is done), and is an easier protocol to follow.DNA methylation can be measured by MeDIP-seq or bisulfite-seq (BS-seq). MeDIP uses an antibody to pull down methylated regions, and thus gives a broad measure of DNA methylation in a gene locus. BS-seq relies on chemical conversion of non-methylated nucleotides, and thus gives single-nucleotide resolution of methylated DNA, at the risk of false-positives due to incomplete conversion.Finally, long-range looping interactions consistent with enhancer-promoter regulation or long-scale DNA structure can measured by the chromosome conformation capture family of methodologies (3C, 4C, 5C, Hi-C). These protocols can also be linked to an immunoprecipitation step to measure looping in the context of a specific protein (CHIA-PET).
- Q: What sequencing method should I use? Length of read, single-end or paired-end, Illumina vs Ion Torrent, etc.?
A: This depends greatly on what you wish to study, and some information can be found in the answers above. As a general rule, longer reads are helpful when (A) high-confidence alignments are crucial and (B) you are looking for potential deviations from the normal genomic structure. (A) is typically true of genome resequencing projects (whole-genome resequencing, whole exome resequencing, etc.), where alignment biases from short reads can cause SNP calling errors. (B) often the case for RNA-seq projects, especially where differentiating between different gene isoforms is important: reads mapping across different exon-exon junctions are the key evidence, and longer reads give more resolution about where splicing is occurring.Read length is currently limited to 100-150 bases on the Illumina HiSeq platform (and a bit longer on the NextSeq), so in cases where long reads are necessary, paired-end sequencing is a powerful approach. Illumina MiSeq can sequence up to 250 bases reliably, but is limited in overall library size to a few million reads. The Ion Torrent platform produces variable length reads, but an average length much longer than Illumina. Currently, the average base quality is lower for Ion Torrent than Illumina. In cases where the priority is simply estimating enrichment of genomic loci (e.g., ChIP-seq), shorter, single-end reads are usually sufficient.
- Q: What sequencing and microarray resources are available at UIC? Do I need to collect my data at UIC ?
A: The RIC works closely with the UIC Genomics Core (CGF), which processes microarrays for both genotyping and gene expression as well as RNA-seq, and DNA services (DNAS), which offers both Illumina (MiSeq, NextSeq, and HiSeq) and Ion Torrent next-generation sequencing services. However, we will work with data collected from any facility, and any institution.