Student Wiki on methodology

This Wiki is intended to collectively make the point on methodologies employed in research papers we analyze during the course. "Writers" are students who wish to contribute to a specific subject. Before contributing, please add your name in the "Writers group choice". When initiating a contribution, please indicate your name in brackets.


PLEASE:  DO NOT change the INDEX page !!!
This page contains the links to the nine official subjects, which are the same in the Choice.

To contribute, go to the correct page by clicking on the description here in the index, then click EDIT and contribute. At the end, please save.

 IMPORTANT !!!

Please do not make extensive cut-and-paste: it s useless, anybody can go to the source you use and read it.  Read the texts, digest, and make a short résumé. If you wih you can include link(s) to the source(s).

Other contributors can revise, add, erase, modify...   Please do not repeat the same text as well. 


Transcriptome: special techniques, RNA-Seq, GRO-Seq, CAGE, others.

Go back to the index

Note from Luca: I've added some placeholders for CAGE and Other Methods in order to preserve the original formatting. I will take care of deleting them if they are not filled in a timely manner.

RNA-seq [edit]

As understood by [Luca Visentin]

At its most essential principle, RNA-seq is the practice of extracting and purifying RNA molecules from a set of cells (or even a single cell), and feed them to Next-Generation Sequencing (NGS). This technique allows us to harness incredibly in-depth information about the transcriptome, the complete set of transcripts of a cell. These include mRNA, rRNA, miRNA (micro RNA), non-coding RNA and other small RNA. Analyzing these data gives us insight on the complete transcriptome profile (in layman's terms, the set of genes which are transcribed at a certain time point, and their relative expression levels), exon content of transcripts, exon splicing, and gene fusion (the process where two genes fuse together to form a single transcript, and thus protein, which occurs often in cancer).

(Note: it is important to note that all steps involving RNA manipulation have to be carried out in RNase-free environments, including pipettes, pipette tips and various containers, as well as wearing gloves)

To perform RNA-seq, samples (which can span from tissues to single cells) are collected and RNA is extracted, typically using commercially-available kits such as RNAEasy (Quiagen Hilde) or TRIZOL (Life Technologies), among others. Pure RNA can be preserved for later use (by using kits such as RNAlater), enriched by size to isolate specific RNA types, such as miRNA by using miRVana chromatographic columns (Ambion), sequence-selected through probes, or enriched for specific features, such as a poly-A tail (for mRNA). Enriched RNA is then subjected to DNase digestion, to remove the often significant DNA contamination. (Note: I've added the kit's names just because they are very "imaginative")

Afterwards, samples are analysed for purity, protein contamination and RNA quantity, in order to obtain libraries of sufficient quality. Library preparation then occurs, where RNA is retro-transcribed into cDNA (which is more stable, important for the sequencing step) and subjected to several, NGS-platform-specific steps. The final result, however, is almost always PCR-enriched, adaptor-ligated, end-repaired, RNA-free ds DNA. At this point, the library can be stored, replicated or sent to sequence in NGS-equipped laboratories.

After NGS, huge amounts of data are obtained, and they need to be sieved through in silico. For more information on the techniques used at this step, I would refer you to the Bioinformatics course. It is sufficient to note that, through both alignment algorithms and statistics, all information discussed above can be harnessed. RNA-seq data is usually (especially for large studies) available online for free, and can be "mined" for additional information even after the original study has concluded. I link to some databases in the sources.

RNA-seq Sources

  • First of all, the book "RNA-seq Data Analysis" by Eija Koperlaninen, Jarno Tuimala, Panu Somervuo, Mikael Huss, and Garry Wong; CRC Press (ISBN: 978-1-4665-9500-2)
  • Wikipedia has a really good article on RNA-seq.
  • Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57-63. (Link)
  • For some information of the formulae behind RNA quantification through RNA-seq, some slides by the lectures of Colin Dewey, Mark Craven, and Anthony Gitter (link)
  • More information on RNA extraction, as provided by Labome (link)
  • The Gene Expression Atlas, curated by EBI, and the ENCODE project (for RNA-seq experiments), as examples of RNA-seq databases.

Go back to the index

Section 1 Change-log

  • [20/03/2019 - 20:50] Luca Visentin: Created page.


GRO-seq [edit]

As understood by [Luca Visentin]

RNA-seq, although powerful, doesn't necessarily measure transcriptional activity of a certain locus, only the "snapshot" concentration of RNA present in the cell at a specific time point; of course, this quantity is due to transcription, but is also heavily influenced by RNA stability and turnover. GRO-seq, which stands for Global Run-On sequencing, has been a well-established method to assess de novo synthesis of RNA molecules, giving information about the transcriptional activity of a cell, rather than a transcriptome "snapshot".

Nuclei are isolated from cells (through ipotonic solutions that burst the cells), washed to remove nucleotides, and kept in ice-cold conditions. The cold blocks transcription, leaving Polymerase complexes blocked in their current position: this is key for the whole GRO-seq workflow. After transcription blockage, nuclei are then thawed and incubated at 30 °C together with brominated uracil (which have incorporated Bromine) and all other nucleotides. Elongation then resumes, incorporating these new labelled nucleotides in the newly born RNA. Formation of new PICs is prevented by the addition of the detergent sarkosyl (Sodium lauroyl sarcosinate), so only transcripts that were being transcribed at the moment of nuclear harvesting are studied. This step is called "Nuclear Run-On".

After some time, the nuclei are dissolved, RNA is extracted and immunoprecipitated (enriched) to select only Br-labelled RNA molecules using anti-BrdU antibodies. Once newly born RNA is isolated, RNA-seq steps for sample preparation and library creation are followed, and the created libraries are sequenced through NGS. The data thus created is analysed in silico, as per all other RNA-seq applications. 

To quote Alessandro Gardini (see Sources): "GRO-seq has shown unprecedented accuracy to ascertain defects in RNAPII elongation and pause-release as well as termination. [...] Additionally, GRO-seq has revealed that RNAPII fires bi-directionally at most mammalian promoters, initiating noncoding RNAs that are transcribed antisense with respect to the messenger RNA. Owing to their instability, these transcripts do not accumulate in the nucleus and elude most RNA detection protocols." It is redundant to say that this technique can provide exciting new insights on RNA transcription and regulation.

Go back to the index

GRO-seq Sources

  • Gardini, Alessandro. “Global Run-On Sequencing (GRO-Seq)” Methods in molecular biology (Clifton, N.J.) vol. 1468 (2017): 111-20. (PubMed Central link)
  • 5-Bromouracil (Wikipedia)
  • Sodium Lauroyl sarcosinate (Wikipedia)
Section 2 Change-log

  • [20/03/2019 - 21:30] Luca Visentin: Created Page.


CAGE [edit]

As understood by [Fabio Grieco]

Cap analysis gene expression (CAGE) is a high throughput method to  analyse RNA expression.  It takes advanatges by  7-methylguanosine cap at the 5' end of mRNAs that make possible map precise transcription start site (TSS).  In addition, transcriptional start sites (TSSs) within promoters are characterized at single nucleotide resolution. Thus, CAGE is primarily used to locate exact TSSs in the genome. This knowledge in turn allows a researcher to investigate promoter structure necessary for gene expression. CAGE has been instrumental in globally mapping specific TSSs in eukaryotes, emphasizing the existence of alternatively regulated TSSs, novel regulatory elements and has allowed predictions of transcription factor binding sites and other motifs associated with transcription. 

The protocol includes biotinylation of the cap structure, reverse transcription and treatment of the RNA/DNA heteroduplex with RNase, to ensure that only 5'-complete cDNA stay associated with the biotin tag and are pulled down by streptavidin -coated beads. A linker sequence containing recognition site for type III restriction endonuclease is ligated to the 5' end of the captured cDNA and a corresponding restriction enzyme is used  to cleave off a short fragment (typically 27 bp) from 5' end. The resulting fragments are then amplified and sequenced. The large number of short sequenced tags can be mapped back to the reference genome, and make possible localize the exact position of TSSs. Number of CAGE tags gives the information about the expression level from that specififc TSS. Thus CAGE provides information on single base pair resolution map of TSSs and relative levels of transcripts initiated at each TSSs.

Because CAGE selectively removes non-capped RNAs, small RNAs and other non-capped RNAs  is not applicable to prokaryotes, or to RNAs shorter to ~100 nt, which are filtered out during the linker purification procedures. 

Alternative methods to CAGE include the 5’-end serial analysis of gene expression (SAGE), which is based on selectively ligating de-capped 5’ RNA end with T4 RNA ligase. SAGE produces a snapshot of the mRNAs population in a sample of interest in the form of small tags that correspond to fragments of those transcripts. In contrast to CAGE, RNA ligase may show sequence preferences.

CAGE has led to the discovery of distinct classes of promoteres with respect to TSSsdistribution that correlates with both underlying sequence features and gen function and implies distinct modesof thier regulation. Quantitative nature of CAGE has been used to model expression dynamics and reconstruct the regulatory networks driving the differentiation and mantaining identity of numerous cell types and tissues. Thus CAge is also a powerful approach for studying various aspects of gene regulation (Cage signal has been shown to be enriched at enhancers), also during the differentiation. 

Go back to the index

CAGE Sources

  • Paper by Takahashi et al on CAGE-seq.
  • Review by Vanja Haberle and Boris Lenhard.
  • Wikipedia page on Cap analysis expression has good description of it.
  • Wikipedia page on Serial analysis of gene expression.
Section 3 Change-log

  • [20/03/2019 - 21:30] Luca Visentin: Created placeholder.
  • [29/03/2019 - 15:00] Luca Visentin: Found a paper by Takahashi


Other Techniques [edit]



DNA Microarray

As understood by [Daniele Giacosa]
Another technique used in transcriptome is the DNA Microarray, that consist of short nucleotide oligomers, known as "probes", which are typically arrayed in a grid on a glass slide. Transcript abundance is determined by hybridisation of fluorescently labelled transcripts to these probes.  The fluorescence intensity at each probe location on the array indicates the transcript abundance for that probe sequence.

Microarrays require some genomic knowledge from the organism of interest, for example, in the form of an annotated genome sequence, or a library of ESTs that can be used to generate the probes for the array.
Microarrays for transcriptomics typically fall into one of two broad categories: low-density spotted arrays or high-density short probe arrays. Transcript abundance is inferred from the intensity of fluorescence derived from fluorophore-tagged transcripts that bind to the array.
Spotted low-density arrays typically feature picolitre drops of a range of purified cDNAs arrayed on the surface of a glass slide. These probes are longer than those of high-density arrays and cannot identify alternative splicing events. Spotted arrays use two different fluorophores to label the test and control samples, and the ratio of fluorescence is used to calculate a relative measure of abundance. High-density arrays use a single fluorescent label, and each sample is hybridised and detected individually. High-density arrays were popularised by the Affymetrix GeneChip array, where each transcript is quantified by several short 25-mer probes that together assay one gene.
NimbleGen arrays were a high-density array produced by a maskless-photochemistry method, which permitted flexible manufacture of arrays in small or large numbers. These arrays had 100,000s of 45 to 85-mer probes and were hybridised with a one-colour labelled sample for expression analysis. Some designs incorporated up to 12 independent arrays per slide.

Go back to the index

OT Sources

  • Wikipedia has a really good sum up on the microarrays
  • Yukihide Maeda et al. published a really good article combining RNAseq and DNA microarray.
  • this article of Tomasz Waller et al.
Section 4 Change-log

  • [20/03/2019 - 21:30] Luca Visentin: Created placeholder.