Basics

https://www.youtube.com/watch?v=JMT6oRYgkTk


More detail


Chromosome conformation technologies

Briefly, DNA is cross linked and digested with DNA restriction enzymes. The loose DNA fragment ends can then be re-ligated to form a hybrid DNA molecule formed of two fragments of DNA which may be very far apart in linear distance. If two fragments of DNA are ligated using this method then it provides evidence that the fragments were interacting in the genome.



1. Hi-C 3

“The classical Hi-C technique involves restriction digestion of a formaldehyde cross-linked genome with sequence specific restriction enzymes, followed by fill in and repair of digested ends with the incorporation of biotin-linked nucleotides. The repaired ends are then re-ligated. Finally, the cross-linking is reversed and associated proteins are degraded. This produces the ligation products which are then non specifically sheared, generally by sonication, and enriched for sheared fragments containing the ligation junction, using a biotin pull-down strategy, and finally sequenced using paired-end sequencing (Belton et al. 2012). The enrichment step aims to select sonicated fragments containing the ligation junction, increasing the proportion of informative non-same fragment read pairs (mate pairs originated from different restriction fragments).” 4. Note that all of these steps up until the ligation are performed in in fact cell nuclei.


Data-generation:

  1. FASTQ files of paired-end reads (reads from either end of the DNA fragment) are obtained and aligned to the reference genome. Since each read are expected to map in different unrelated regions of the genome, they are aligned to the reference genome separately. Note that problems may arise if the reads span the ligation junction, thus having two portions of the read itself matching distinct genomic positions (chimeric reads).

  2. The reads are filtered to remove spurious signals due to experimental artifacts.

  3. The read counts are then binned into genomic bins. This allows more robust and less noisy signals for the estimation of contact frequencies, but means that the resolution is reduced. Strategies to find the optimal genomic bin size have been proposed.

  4. Read counts are normalised.


2. Capture Hi-C


An example study

  • 5 used promoter capture Hi-C to find cell type specific promoter interactomes, with the aim of linking non-coding GWAS variants to their target genes (by seeing which genes’ promoters they interact with physically). They aimed to provide a comprehensive catalog of promoter-interacting regions (PIRs). PCHi-C is Hi-C whereby only interactions involving promoters are found (using sequence capture to pull down fragments of interest, i.e. those of promoters).

    • Found a median of 4 interactions per promoter fragment per cell type.
    • 55% of preys interacted with a single promoter fragment and less than 10% interacted with 4 or more.
    • Median linear distance between promoters and their interacting region is 331 Kb.
    • Found less interactions crossing TAD boundaries than expected, and it didn’t seem to matter whether the promoters were close to the edge or in the centre of a TAD.
    • PIRs were significantly enriched for regulatory chromatin features (e.g. 56% contained accessible regions detected by ATAC-seq in at least one blood cell type). E.g. they are significantly enriched for histone marks associated with active enhancers (H3K27ac and H3K4me1).
    • Most PIRs were annotated as enhancers. But some also may have structural or topological roles, or they may be poised for activation.
    • Enhancers generally show additive effects on the expression of their target genes. This may explain why genes are often able to buffer the effects of mutations at individual enhancers.
    • PIRs seem to function in gene expression control.
    • The study makes a strong case for using 3D genome information to interpret non-coding disease-associated variants.
    • Enhancer-promoter contacts can be either “instructive” (triggering transcriptional activation) or “permissive” (poised for activation) 6.
    • They ultimately devise a statistical methodology to link GWAS SNPs to their putative target genes based on PCHi-C interaction data.

E.g. IDEAS

An integrative and discriminative epigenome annotation system, for jointly characterizing epigenetic landscapes in many cell types and detecting differential regulatory regions.

Motivation: Need to investigate how epigenomic variation both across the genome and across different cell types relates to gene expression changes and phenotypic diversity. Current methods involve genomic segmentation (which are mostly developed for a single genome) which have been extended for genomic concatenation and data stacking to analyse multiple cell lines.



  1. http://www.roadmapepigenomics.org/

  2. https://science.sciencemag.org/content/364/6439/eaat8266/tab-pdf

  3. https://link.springer.com/content/pdf/10.1007%2Fs12551-018-0489-1.pdf

  4. https://link.springer.com/content/pdf/10.1007%2Fs12551-018-0489-1.pdf

  5. https://www.ncbi.nlm.nih.gov/pubmed/27863249

  6. https://www.nature.com/articles/nature12753