Selecting ChIP-Seq Normalization Methods from the Perspective of their Technical Conditions

Author

Sara Colando, Jo Hardin

Published

May 1, 2022

Overview: Within the last two decades, high throughput sequencing has become one of the most popular methods for data generation within genomics, epigenomics, and transcriptomics. A popular method of high throughput sequencing is chromatin immunoprecipitation followed by high throughput sequencing, or ChIP-Seq. ChIP-Seq data provides vital insights into locations on the genome where there are differences in DNA occupancy between experimental states (i.e., differential DNA occupancy). However, since ChIP-Seq data is collected experimentally, it must be normalized to assess which genomic regions have differential DNA occupancy. While normalization is an essential step in identifying genomic regions with differential DNA occupancy, the primary technical conditions underlying ChIP-Seq normalization methods have yet to be specifically examined in the academic literature. In this thesis, we identify three primary technical conditions underlying ChIP-Seq between-sample normalization methods: (1) Symmetric Differential DNA Occupancy, (2) Equal Amount of Total DNA Occupancy across Experimental States, and (3) Equal Amount of Total Background Binding across Experimental States. We then categorize commonly-used ChIP-Seq normalization methods based on the technical conditions they use to normalize between experimental states. A major contribution of this thesis is our ChIP-Seq read count simulation results, which validate our categorization of the normalization methods by their technical conditions. We conclude by underscoring how our findings demonstrate that not all normalization methods are equally effective on all kinds of ChIP-Seq data. Rather, researchers should use their understanding of the ChIP-Seq experiment at hand to guide their choice of normalization method when possible or try mulitiple between-sample ChIP-Seq normalization methods to create a ‘high-confidence’ peakset which contains peaks that are less sensitive to one’s choice of normalization method.

Funded by Pomona College’s The Class of 1971 Summer Undergraduate Research Fund (Summer 2022) and the Kenneth Cooke Summer Research Fellowship (Summer 2024).