Advancing Epigenetics Towards Systems Biology

A pipeline for ChIP-seq data analysis (Prot 56)

Ruhi Ali1, Florence M.G. Cavalli1, Juan M. Vaquerizas1, Nicholas M. Luscombe2,3


Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is becoming the standard experimental procedure to investigate transcriptional regulation and epigenetic mechanisms on a genome-wide scale (reviewed in (Park, 2009)). The technique involves covalent cross-linking of proteins to the DNA, followed by fragmentation and immunoprecipitation (IP) of the chromatin by using an antibody against the protein or histone modification of interest. The result of this experiment is a set of short DNA fragments of about 200 bp in length that represent regions of the genome where the protein is bound, or where specific histone modifications occurred. The segments are then sequenced using one of the various next generation sequencing procedures now available. The resulting reads (usually 36 to 100bp) are then mapped back to the reference genome of interest in order to identify regions with significant binding.

Since the introduction of the experimental technique, several bioinformatics approaches have been developed to cope with the analysis of these data (reviewed in (Laajala et al., 2009; Wilbanks and Facciotti, 2010)). Usually these different methods have been initially developed to analyse a given dataset and associated experimental design and therefore they are based on different assumptions. For example, some methods used window-based scans to establish read density profiles, while others use different kernel density estimators; some perform peak assignments in a strand-specific basis, whereas others do so in a non-strand sensitive fashion; some implement the usage of control or background datasets; and, usually, every method is based on a different statistical model or test and uses alternative approaches to adjust for multiple testing or to normalise the data. Given such disparity in methods, it is not difficult to imagine that the results obtained from such analyses are heavily dependent on the method employed, with peak overlaps ranging from 100 to 13% depending on the algorithm (Wilbanks and Facciotti, 2010).

Here, we present a step-by-step protocol for the analysis of ChIP-seq data using a new robust procedure based on the estimation of background signal using an input DNA control. Unlike many of the currently available methods, which are based on fitting the ChIP-seq signal to a given distribution, our approach is based on an unbiased evaluation of the noise in the sample that is then used to calculate the statistical significance of the binding events. Hence, our procedure is ideal for profiles where no previous information about the mode of binding –e.g. sharp peaks or broad domains– is known. The method, implemented through the statistical package R/Bioconductor (Gentleman et al., 2004), has been successfully used for small genomes such as D. melanogaster (Schwartz et al., 2006; Kind et al., 2008; Conrad et al., 2012), and can be used for any dataset with a sufficient coverage for both the input and the IP sample. In this protocol, we use a recent ChIP-seq dataset by Raja et al. to illustrate each step of the analysis (Raja et al., 2010). The code is available at

PDF version

Ruhi Ali1, Florence M.G. Cavalli1, Juan M. Vaquerizas1, Nicholas M. Luscombe2,3

1 European Bioinformatics Institute. Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
2 Okinawa Institute of Science & Technology, 1919-1 Tancha, Onna-son, Kunigami- gun, Okinawa 904-0495, Japan.

3 University College London Genetics Institute, Gower Street, London WC1E 6BT, UK

Corresponding author: Nicholas M. Luscombe
Email feedback to: This email address is being protected from spambots. You need JavaScript enabled to view it.

Ruhi Ali, Florence M.G. Cavalli, Juan M. Vaquerizas, Nicholas M. Luscombe
Mon, Jan 21st 2019- Fri, Jan 25th 2019

This meeting will be the first digital health meeting that focuses specifically on the scientific foundations and the health applications of digital technologies. Taking a novel sensor or new device...

Wed, Jan 30th 2019- Thu, Jan 31st 2019

"Conferences Series LLC LTD is organizing “2nd World Congress on Petrochemistry” scheduled to be held during January 30-31,2019, Bangkok, Thailand. The conference invites all the participants across t...

Mon, Feb 25th 2019- Tue, Feb 26th 2019

The seminar will feature approximately 10 talks and 2 poster sessions. All attendees are expected to actively participate in the conference, either by giving an oral presentation or presenting a poste...

Mon, Mar 11th 2019- Tue, Mar 12th 2019

We are pleased to invite you to the upcoming conference 12th World Congress on Cell and Tissue Science scheduled in Singapore on March 11-12,2019. It will bring world-class personalities and researche...

  • Holiday inn Singapore Atrium 317 Outram Road Singapore 169075
  • Organizer


EpiGeneSys Final
Meeting in Paris

Thur. 11 February 2016 - Sat. 13 February 2016

More than 280 scientists attended the fifth Annual Meeting of EpiGeneSys. The conference kicked off with a talk by coordinator Geneviève Almouzni, Director of the Research Center at the Institut Curie, highlighting the achievements of the network over more than five years...

Maison des océans - Paris Read more


The Non-Coding Genome ...

December 3-4 th, 2015

The last training workshop of the EpiGeneSys network

Hotel Mediterraneo - Rome, Italy Read more

Paris / TriRhena Chromatin Club

July 9th, 2015

...exciting talks and network with members of the Chromatin community!

... An EpiGeneSys TAB workshop

June 11st-12nd , 2015

... learn about current approaches to single cell epigenetics and to meet up and network with...

Montpellier, FranceRead more

Latest publications


The Histone Acetyltransferase Mst2 Protects Active Chromatin from Epigenetic Silencing by Acetylating the Ubiquitin Ligase Brl1.

Read more

Proliferation Drives Aging-Related Functional Decline in a Subpopulation of the Hematopoietic Stem Cell Compartment.

Read more

The impact of rare and low-frequency genetic variants in common disease.

Read more