Advancing Epigenetics Towards Systems Biology

Identification of Transcription Factor Binding Sites in ChIP-exo using R/Bioconductor (Prot 68)

Pedro Madrigal1,2


Precisely mapping protein-DNA binding to genomic sites is a pivotal task in order to understand gene regulation. Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or sequencing (ChIP-seq) have been extensively used to map transcription factor binding sites (TFBSs), with ChIP-seq comparing favourably with respect to ChIP-chip in terms of resolution and signal-to-noise ratio (Ho et al., 2011). While ChIP-seq remains the standard, most-used methodology (Furey, 2012), λ exonuclease digestion followed by high-throughput sequencing, or ChIP-exo, has recently emerged as a powerful and promising technique able to substitute ChIP-seq, and to circumvent its limitations (Rhee and Pugh, 2011; Mendenhall and Bernstein, 2012). In this protocol, the distribution of mapped reads is characterised by pairs of two distinct peaks, one at each DNA strand, centred at the λ exonuclease borders and separated frequently at fixed distances (Rhee and Pugh, 2011). Importantly, the improved resolution of ChIP-exo can provide novel insights into protein-DNA interactions (Rhee and Pugh, 2011; Serandour et al., 2013). Furthermore, ChIP-exo distinguishes weaker peaks more confidently, and also closely-located binding events, that in ChIP-seq are generally unresolved or deconvolved through computational approaches (e.g., Guo et al. (2012)).
In this protocol, first I describe the differences between ChIP-seq and ChIP-exo data analysis pipelines, and then concentrate on peak calling using the R/Bioconductor package CexoR. Unlike (for example) the popular ChIP-seq peak caller MACS (Feng et al., 2012), CexoR analyses multiple ChIP-exo replicates together, allowing a better identification of narrow peaks and simpler downstream analysis.
CexoR is able to locate reproducible protein-DNA interaction in ChIP-exo datasets with no need of genome sequence information, manual matching of peak-pairs, paired control data (inputs), or downstream assessment of replicate reproducibility. In addition, the R statistical environment allows integration with other pipelines and downstream analyses via other R and Bioconductor packages.

PDF version

Pedro Madrigal1,2

1 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
2 Wellcome Trust-MRC Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK

Corresponding author: Pedro Madrigal
Email feedback to: This email address is being protected from spambots. You need JavaScript enabled to view it.

Pedro Madrigal
  • No events


EpiGeneSys Final
Meeting in Paris

Thur. 11 February 2016 - Sat. 13 February 2016

More than 280 scientists attended the fifth Annual Meeting of EpiGeneSys. The conference kicked off with a talk by coordinator Geneviève Almouzni, Director of the Research Center at the Institut Curie, highlighting the achievements of the network over more than five years...

Maison des océans - Paris Read more


The Non-Coding Genome ...

December 3-4 th, 2015

The last training workshop of the EpiGeneSys network

Hotel Mediterraneo - Rome, Italy Read more

Paris / TriRhena Chromatin Club

July 9th, 2015

...exciting talks and network with members of the Chromatin community!

... An EpiGeneSys TAB workshop

June 11st-12nd , 2015

... learn about current approaches to single cell epigenetics and to meet up and network with...

Montpellier, FranceRead more

Latest publications


The Histone Acetyltransferase Mst2 Protects Active Chromatin from Epigenetic Silencing by Acetylating the Ubiquitin Ligase Brl1.

Read more

Proliferation Drives Aging-Related Functional Decline in a Subpopulation of the Hematopoietic Stem Cell Compartment.

Read more

The impact of rare and low-frequency genetic variants in common disease.

Read more