Genomics and transcriptomics
In recent years, dramatic advances in biology, medicine and technology have set the foundation for understanding the information contained in our DNA, the genome.
In particular the new sequencing technologies, Next Generation Sequencing (NGS), now enable us for the first time to access the sequence of our DNA in a simple and effective way, providing a personalized assessment of the genetic information of each individual. While most of the differences in the DNA sequence between different people is harmless, some changes may determine the onset of pathologies or increase the subjective probability of risk. Some diseases are caused directly by a genetic defect while other, more complex, develop from the interaction between genetic predisposition and environmental factors (diet, lifestyle, exposure to particular substances etc).
The advent of these technologies resulted in a rapid intensification in the scope and speed of completion of genome sequencing projects. With the vast amount of data about human DNA generated by the Human Genome Project and other genomic research, scientists and clinicians have more powerful tools to study the role that multiple genetic factors acting together and with the environment play in very complex diseases.
The advanced genomic technologies available at the Genomics Laboratory are centered around next generation sequencing (NGS) utilizing Illumina (Nextseq500) and Thermo Fisher (Ion Proton) platforms. We are focused on performing a variety of genetic analyses, and on optimizing and developing protocols so as to meet a number of applications.
INSTRUMENTATIONS
- Next Generation Sequencing – this latest technology allows the complete sequencing of the genome, exome and transcriptome, in just a few hours to get information on the normal or altered status associated with disease. FPS has two different NGS platforms available, an Illumina NextSeq 500 and a Life Technologies Ion ProtonTM system.
- Genetic Analyzer – Sanger sequencing and fragment analysis using a 3500 Genetic Analyzer (Thermo Fisher), an 8-plex sequencing instrument specifically designed to sequence DNA fragments shorter than 1000bp. The Analyzer is used for a variety of different applications: NGS validation, single-nucleotide-polymorphism genotyping, loss-of-heterozygosity, conformational analysis (SSCP), microsatellite analysis, large fragment analysis.
- Digital Droplet PCR system – a QX100TM (Biorad) is used for the quantification of DNA or RNA. The main applications are: copy number variation, identification of rare sequences, gene expression analysis, analysis of miRNAs, analysis of single cells, detection of pathogens, analysis of libraries for the NGS technology.
- Real Time and Standard PCR instrumentations – a CFX96 Touch™ Real-Time PCR Detection System provides 5-plex quantitative PCR capabilities. Eppendorf and Biorad master cyclers are also available to perform standard PCR amplification reactions.
- Nucleic acid extractor – A Maxwell® 16 MDx Research Instrument offers automated extraction of DNA and RNA from multiple sample types. Preprogrammed methods are available for both standard elution volumes (SEV) of 200µl–400µl and low elution volumes (LEV) as small as 25µl.
- DNA/RNA quantitation – A Qubit (Thermo Fisher) and the Tape Station Bioanalyzer (Agilent) are used to measure concentration of DNA and RNA, and for quality control of the samples prior to NGS analysis.
- Laser capture microdissection – A laser capture microdissection system based on a semi-confocal microscope is used for the accurate isolation of cells or regions of tissue. Such as isolation is necessary to gain cell specific signatures from the often highly heterogeneous tumor tissue samples. DNA, RNA, protein, and lipids may then be isolated and analyzed. The system consists of an Apotome 2 Axio Observer Z1 microscope equipped with a Palm Robomover laser capture microdissection system (both Zeiss).
- ChemiDoc™ Imaging System – is used for DNA/RNA gel or western blot imaging. It is designed for chemiluminescence detection, and for immediate visualization of proteins without gel staining and instant verification of protein transfer to blots.
The Laboratory of Genomics and Transcriptomics includes a fully equipped CELL FACILITY for functional in vitro characterization of the results of NGS or proteomics analysis. Specific instrumentation includes:
- Zeiss Microscope Stage Incubator – Incubation system with temperature and CO2 control for live cell imaging with the semi-confocal microscope under physiological conditions.
- LUNA™ Automated Cell Counter
- Flow cytometry – a CyFlow Cube Sorter (Partec) enables a rapid analysis of cell populations by measuring their volume, morphology, fluorescence, etc. The association “sorter” function enables specific subgroups of cells (with defined morphological, fluorescent characteristics) to be isolated from heterogeneous cell populations.
- Metabolic flux analyzer – The role of metabolism in cellular and physiological processes is well established, with many diseases now linked to metabolic dysfunction or reprogramming. A Seahorse system (Agilent) enables the quantification of cellular bioenergetics by measuring the flux of mitochondrial respiration and glycolysis in live cells. The Seahorse system allows manipulation of the cells during testing, to investigate how the cells respond to specific stimuli.
- IVTech-bioreactors – advanced cell culture systems can be used to perform 3D dynamic in-vitro models, following the standard protocols. Provided with a basic modul, LB1, a transparent, single flow bioreactor that can be used to mimic metabolic tissues, such as liver and a more advanced unit, LB2, that can be used to mimic physiological barriers, such as lung, skin, intestinal epithelia in dual flow. Designed for the development of physiologically relevant tissue models and for the reduction and refinement of animal tests. All our devices permit in-situ live -imaging and can be linked together to simulate multi-organ systems.
APPLICATIONS
HIGH and LOW yield of starting material:
- DNA seq (whole genome, whole exome and copy number variation analysis) and RNA seq (whole transcriptome) analysis on fresh frozen tissues, FFPE tissues, biological fluids and cell culture samples.
- DNA seq (whole exome and copy number variation analysis) and RNA seq (whole transcriptome) analysis on microdissected fresh and formalin fixed paraffin embedded (FFPE) samples. We are able to process dozens to a few hundred cells per experiment. FFPE tissues and fixed samples have always been challenging samples. Fortunately, the modern technologies have made it increasingly possible to work with this type of material, because of its great potential. More challenging are the microdissected fresh and fixed samples. Laser microdissection enables the collection of homogenous specific cells allowing the overcoming of the problem of tissues heterogeneity. In particular, many solid cancers are characterized by heterogeneity within the cell populations. Cancer inter-tumor and intra-tumor heterogeneity is identified daily by pathologists and may have important consequences for personalized-medicine approaches. In our laboratory, we developed next generation sequencing methods that allow the investigation of the whole exome and whole transcriptome profile of little amount of degraded starting material, based on the technique of the Ampli1/SMARTer amplification technology and on the Illumina/Ion Proton protocols that we modified carefully.
- Gene panel exome and transcriptome analysis by NGS technology and by real time array analysis
- Digital PCR molecular analysis
BIOINFORMATICS
Pipeline for RNA-Seq data analysis
Sequencing of transcribed RNA molecules (RNA-seq) is an invaluable tool for studying cell transcriptomes at high resolution and depth. RNA-seq datasets typically consist of tens to hundreds of millions of relatively short (30–200 nt) sequence fragments of the original RNA transcripts. The common goals of RNA-Seq data analysis are evaluation of gene expression and differential expression, analysis of alternative expression, allele specific expression, mutation discovery, fusion detection and RNA editing. Each type of RNA-Seq analysis has distinct requirements and challenges but common steps exist: obtain raw data and convert them to specific format; evaluate the quality of the reads, align and/or assemble reads; process alignment; post-process: import into downstream software for visualization and pathway analysis.
In our institute we use RNA-Seq techniques to compare different condition (better vs worse prognosis) or response to different treatment.
- In our data analysis pipeline, we perform the Quality Control (QC) of raw data as the initial step of routine RNA-seq workflow. We apply a tools such as FastQC to assess the quality, enabling assessment of the overall and per-base quality for each read in each sample. In addition we evaluate the contamination of raw data for different organisms (bacteria, fungi, virus) by applying FastqScreen, a tool able to screen a library of sequences in Fastq format against a set of sequence of contaminants.
- The second step of pipeline is alignment. Currently we use as mapper, the software Star. This mapper uses a genome as reference and it is a spliced aligner that allows a wide range of gaps. This approach may increase the probability of identifying novel transcripts generated by alternative splicing.
- For transcriptome reconstruction (the identification of all transcripts expressed in a specimen) we use the reference-guided approach. This approach is advantageous when reference annotation information is well-known, such as in human and mouse, which is employed in Cufflinks), that is our preferred method. Cufflinks is also able to perform the expression quantification at isoform-level.
- The gene differential expression is performed by using Cuffdiff, a software included in the Cufflinks package which utilizes transcript-based detection methods.
- Data exploration and visualization is performed by using the R package, CummeRbund. It takes the various output files from a cuffdiff run, describing appropriate relationships between genes, transcripts, transcription start sites, and CDS regions. It is possible to generate numerous plotting functions as well for commonly used visualizations.
Recently we introduced the usage of SeqMonk, a tool able to visualize and analyze mapped sequenced data. It is possible to calculate raw counts for each sample and produce a data matrix to output in downstream analysis. The analysis of the differential expression is performed with EdgeR, a package running in R.
Pipeline implementation for DNA-seq data analysis
To analyze data coming from NGS genomic experiments (genome, exome and panel-based analysis) we use SeqMule pipeline, a fully customizable scripts which include the most common open software able to exploit the mutational data. We introduced this pipeline to analyze the data coming from Illumina sequencer (NextSeq500).
Briefly the first step in the pipeline involves the quality control (QC) and sequence alignment programs. It uses FastQC to evaluate the quality of the reads. Then the alignment is performed by BWA-MEM the most common choice for mutational analysis. The other steps are:
- Marking duplicates. By using a Picard command-line tool that marks PCR duplicates that may produce a bias in estimating variant allele frequencies.
- Local Realignment and base quality recalibration. Performed by Genome Analysis ToolKit (GATK) realigner and GATK recalibration tools to conduct local realignments, which helps to correct misalignments and systematic bias, and reduces false positives in variant calling.
- Variant calling. Performed by using Samtools, GATK and FreeBayes. These programs detect both SNPs and INDELs and the results are reported in a Variant Call Format file (VCF) for each of three callers. In addition a consensus VCF file reports the variants shared by all the callers.
- Variant Annotation and Filtration. This step is performed by using different software as Wannovar, Enlis Genome Research and Variant Studio from Illumina BaseSpace.
All the pipelines and all requested softwares are installed and implemented in three Dell PowerEdge workstation with 32 cpu cores, 128 GB of RAM and 16 TB of data storage.
The Laboratory of Genomics and Transcriptomics is directed by Chiara Maria Mazzanti.