Advancing Epigenetics Towards Systems Biology

A Guideline for ChIP - Chip Data Quality Control and Normalization (Prot 47)

Matthias Siebert, Michael Lidschreiber, Holger Hartmann, and Johannes Söding

Introduction

Chromatin immunoprecipitation coupled to tiling microarray analysis (ChIP-on-chip) is used to measure genome-wide the DNA binding sites of a protein of interest. In ChIP-on-chip, proteins are covalently cross-linked to the DNA by formaldehyde, cells are lysed, the chromatin is immunoprecipitated with an antibody to the protein of interest and the fragmented DNA that is directly or indirectly bound to the protein is analyzed with tiling arrays. For this purpose, the fragmented DNA is fluorescently labeled and hybridized to the tiling array, which consists of millions of short (25 to 60 nucleotides long) probes that cover the genome at a constant spacing (4 to 100s of nucleotides), like tiles covering a roof. The data generated by one experiment consists of an intensity value for each DNA probe. These values measure the relative quantity of DNA at the probe's genomic position in the immunoprecipitated material.

This guideline describes the first steps in the data analysis for ChIP-on-chip measurements in a bare-bones fashion. The steps comprise the quality control of the obtained data and normalizations to render the data comparable between different arrays, to correct for saturation effects, and to obtain enrichment and occupancy values. Peak calling and other downstream procedures are not part of this exposé. For each  step, we give detailed recommendations, warn about problematic but often popular procedures, and occasionally suggest improved versions of standard procedures.

Although we have gained experience in ChIP chip data analysis mainly in yeast, this protocol tries to give a general guideline that should be applicable to other species and to both single and two-color arrays. Most advocated procedures can be carried out  within the R environment for statistical data analysis, using packages from the Bioconductor project (see, e.g., Toedling and Huber, 2008). A "Bioconductor package Starr for Affymetrix platforms supporting the analysis steps described here is available and will be described in an upcoming protocol (Zacher B. and Tresch A., 2010).

PDF version

Matthias Siebert, Michael Lidschreiber, Holger Hartmann, and Johannes Söding

Gene Center Munich - Ludwig-Maximilians-Universität - Feodor-Lynen-Str. 25 - 81377 Munich, Germany

Matthias Siebert, Michael Lidschreiber, Holger Hartmann, and Johannes Söding