Advancing Epigenetics Towards Systems Biology

A pipeline for ChIP-seq data analysis (Prot 56)

Ruhi Ali1, Florence M.G. Cavalli1, Juan M. Vaquerizas1, Nicholas M. Luscombe2,3


Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is becoming the standard experimental procedure to investigate transcriptional regulation and epigenetic mechanisms on a genome-wide scale (reviewed in (Park, 2009)). The technique involves covalent cross-linking of proteins to the DNA, followed by fragmentation and immunoprecipitation (IP) of the chromatin by using an antibody against the protein or histone modification of interest. The result of this experiment is a set of short DNA fragments of about 200 bp in length that represent regions of the genome where the protein is bound, or where specific histone modifications occurred. The segments are then sequenced using one of the various next generation sequencing procedures now available. The resulting reads (usually 36 to 100bp) are then mapped back to the reference genome of interest in order to identify regions with significant binding.

Since the introduction of the experimental technique, several bioinformatics approaches have been developed to cope with the analysis of these data (reviewed in (Laajala et al., 2009; Wilbanks and Facciotti, 2010)). Usually these different methods have been initially developed to analyse a given dataset and associated experimental design and therefore they are based on different assumptions. For example, some methods used window-based scans to establish read density profiles, while others use different kernel density estimators; some perform peak assignments in a strand-specific basis, whereas others do so in a non-strand sensitive fashion; some implement the usage of control or background datasets; and, usually, every method is based on a different statistical model or test and uses alternative approaches to adjust for multiple testing or to normalise the data. Given such disparity in methods, it is not difficult to imagine that the results obtained from such analyses are heavily dependent on the method employed, with peak overlaps ranging from 100 to 13% depending on the algorithm (Wilbanks and Facciotti, 2010).

Here, we present a step-by-step protocol for the analysis of ChIP-seq data using a new robust procedure based on the estimation of background signal using an input DNA control. Unlike many of the currently available methods, which are based on fitting the ChIP-seq signal to a given distribution, our approach is based on an unbiased evaluation of the noise in the sample that is then used to calculate the statistical significance of the binding events. Hence, our procedure is ideal for profiles where no previous information about the mode of binding –e.g. sharp peaks or broad domains– is known. The method, implemented through the statistical package R/Bioconductor (Gentleman et al., 2004), has been successfully used for small genomes such as D. melanogaster (Schwartz et al., 2006; Kind et al., 2008; Conrad et al., 2012), and can be used for any dataset with a sufficient coverage for both the input and the IP sample. In this protocol, we use a recent ChIP-seq dataset by Raja et al. to illustrate each step of the analysis (Raja et al., 2010). The code is available at

PDF version

Ruhi Ali1, Florence M.G. Cavalli1, Juan M. Vaquerizas1, Nicholas M. Luscombe2,3

1 European Bioinformatics Institute. Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
2 Okinawa Institute of Science & Technology, 1919-1 Tancha, Onna-son, Kunigami- gun, Okinawa 904-0495, Japan.

3 University College London Genetics Institute, Gower Street, London WC1E 6BT, UK

Corresponding author: Nicholas M. Luscombe
Email feedback to: This email address is being protected from spambots. You need JavaScript enabled to view it.

Ruhi Ali, Florence M.G. Cavalli, Juan M. Vaquerizas, Nicholas M. Luscombe
Mon, Jun 10th 2019- Tue, Jun 11th 2019

Following the success of the previous conferences, we are excited to announce that “7th International Conference on Hypertension & Healthcare” will be held during June 10-11, 2019 at the most beautif...

Mon, Jun 17th 2019- Wed, Jun 19th 2019

Being successful in the previous 3 conferences in the series, Cardiologists 2016 in Berlin, Germany, Cardiologists 2017 in Paris, France and Cardiologists 2018 in Barcelona, Spain, this year we are mo...

Mon, Jun 17th 2019- Tue, Jun 18th 2019

RNA modifications are ubiquitous in biology and present in all classes of cellular RNAs. Recent studies have shown that RNA modifications are critical to the processing and metabolism of different RNA...

Mon, Aug 12th 2019- Tue, Aug 13th 2019

Asia Chemical Engineering 2019 welcomes all attendees, presenters, and exhibitors from all over the world to Auckland, New Zealand. We are delighted to invite you all to attend the “ 7th Asia Pacific ...

  • Naumi Hotel Auckland Airport, 153 Kirkbride Rd, Mangere, Auckland 2022, New Zealand
  • Organizer


EpiGeneSys Final
Meeting in Paris

Thur. 11 February 2016 - Sat. 13 February 2016

More than 280 scientists attended the fifth Annual Meeting of EpiGeneSys. The conference kicked off with a talk by coordinator Geneviève Almouzni, Director of the Research Center at the Institut Curie, highlighting the achievements of the network over more than five years...

Maison des océans - Paris Read more


The Non-Coding Genome ...

December 3-4 th, 2015

The last training workshop of the EpiGeneSys network

Hotel Mediterraneo - Rome, Italy Read more

Paris / TriRhena Chromatin Club

July 9th, 2015

...exciting talks and network with members of the Chromatin community!

... An EpiGeneSys TAB workshop

June 11st-12nd , 2015

... learn about current approaches to single cell epigenetics and to meet up and network with...

Montpellier, FranceRead more

Latest publications


The Histone Acetyltransferase Mst2 Protects Active Chromatin from Epigenetic Silencing by Acetylating the Ubiquitin Ligase Brl1.

Read more

Proliferation Drives Aging-Related Functional Decline in a Subpopulation of the Hematopoietic Stem Cell Compartment.

Read more

The impact of rare and low-frequency genetic variants in common disease.

Read more