Advancing Epigenetics Towards Systems Biology

Basic Analysis of NimbleGen ChIP-on-chip Data using Bioconductor/R (Prot 43)

Tobias Straub

Introduction

Hybridization of chromatin immuno-precipitation (ChIP) material to tiling arrays at NimbleGen service facilities usually leaves the customer with a set of data files that are of limited use. Most information about the experiment is gained by either displaying immuno-precipitate(IP)/input ratio tracks (the GFF files provided) of individual hybridisation experiments with NimbleGen’s SignalMap software or by scanning the list of peaks identified by the automated data analysis. Summary profiles from replicate experiments cannot be investigated; further calculations are left to the customers. Apart from the fact that data quality cannot be directly evaluated, raw data is not corrected for systematic signal distortions in the generation of ratio GFF files. Furthermore, the robustness of the provided peak finding procedures is questionable, as the algorithm will frequently identify many "significant" peaks in noise-only experiments.

Several independent software tools are available that provide a more robust and more detailed analysis. Unfortunately, those tools frequently tend to underperform on datasets that are largely different from the ones they have been built for. Apart from simple problems such as compatibility issues between different organisms and naming conventions for chromosomes many tools aim, e.g., for identifying peaks or peak centers. In the realm of epigenetics, however, many features are distributed rather broadly and peak centers are not the only features defining biological states. Overall, many tools do not provide sufficient flexibility and transparency for the user to know and control what is actually happening with the data.

This will ultimately lead to non-reproducibility and/or analysis failures once the tools are modified.

With increasing amounts of quantitative data biologists are often left with endpoints of analyses that they simply have to trust if they are not given the possibility to look behind the procedures. This protocol mainly aims for aiding the biologists to get a rather unbiased look at the quantitative data of their NimbleGen tiling array experiments during initial processing steps. Ultimately, summary tracks of the profiles will be generated that permit visual browsing of the data, which serves as important inspiration for downstream analyses. The procedures introduced here can in principle applied to any other 2-color tiling array data (Agilent) and/or 2-sample comparison data on Affymetrix chips (single colour).

PDF version

Tobias Straub

Adolf Butenandt Institute - Molecular Biology - Ludwig-Maximilians University, Germany

Tobias Straub