项目作者: skchronicles

项目描述 :
Benchmarking ChIP-seq peak callers
高级语言: Python
项目地址: git://github.com/skchronicles/PeakCalling.git
创建时间: 2018-03-08T22:16:37Z
项目社区:https://github.com/skchronicles/PeakCalling

开源协议:MIT License

下载


Benchmarking ChIP-Seq Peak Callers

Table of Contents

  1. PeakRanger
  2. MAC2
  3. SICER
  4. GEM
  5. MUSIC
  6. PePr
  7. DFilter

1. PeakRanger

Description:

PeakRanger is a multi-purporse software suite for analyzing next-generation sequencing (NGS) data.
It contains the following tools:

  1. nr: a noise ratio estimator useful for QC statistics. Estimates signal to noise ratio which is an indicator for ChIP
    enrichment.
  2. lc: library complexity calculator useful for QC statistics. Calculates the ratio of unique reads over total reads.
    Only accepts bam files.
  3. ranger: ChIP-Seq peak caller. Ranger servers better as a narrow-peak caller. It behaves in a conservative but
    sensitive way compared to similar algorithms. It is able to identify enriched genomic regions while at the same time
    discover summits within these regions.
    Ranger supports HTML-based annotation reports.
  4. bcp: ChIP-Seq peak caller. Tuned for the discovery of broad peaks. BCP supports HTML-based annotation reports.
  5. ccat: ChIP-Seq peak caller. Tuned for the discovery of broad peaks. CCAT supports HTML-based annotation reports.

Peakranger is installed on Biowulf.

Loading PeakRanger on Biowulf:
  1. module load peakranger
Running NR (Noise Ratio Estimator):
  1. peakranger nr \
  2. --format bam \
  3. --data {expt1.bam} \
  4. --control {control.bam} \
  5. --output bcp_results
Running LC (Library Complexity Calculator):
  1. peakranger lc \
  2. --format bam \
  3. {*.bam} \
  4. --output bcp_results
Running Ranger (Narrow Peak Caller):
  1. peakranger ranger \
  2. --format bam \
  3. --report \
  4. --plot_region 10000 \
  5. --data {expt1.bam} \
  6. --control {control.bam} \
  7. --output bcp_results
  8. -t 4
Running BCP (Broad Peak Caller):
  1. peakranger bcp \
  2. --format bam \
  3. --report \
  4. --plot_region 10000 \
  5. --data {expt1.bam} \
  6. --control {control.bam} \
  7. --output bcp_results
  8. -t 4
Running CCAT (Broad Peak Caller):
  1. peakranger ccat \
  2. --format bam \
  3. --report \
  4. --plot_region 10000 \
  5. --data {expt1.bam} \
  6. --control {control.bam} \
  7. --output bcp_results
  8. -t 4

2. MAC2

Description:

MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library
construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a
dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and
robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with
or without control samples.MAC2 is installed on Biowulf.

Loading MACs on Biowulf:
  1. module load macs
Running MAC2 (Narrow Peak Mode):
  1. module load macs/2.1.0.20150420 R
  2. macs2 callpeak -t {input[0]} \
  3. -c {input[1]} -f BAM -g {config[macs_g]} \
  4. --outdir peaks/mac2/narrow -n {wildcards.sample} \
  5. --nomodel --extsize {usePhantomPeaks.Rscript} -B -q 0.01 &> {log}
  6. cd peaks/mac2/narrow && Rscript {wildcards.sample}_model.r
Running MAC2 (Broad Peak Mode):
  1. module load macs/2.1.0.20150420
  2. macs2 callpeak -t {input[0]} \
  3. -c {input[1]} -f BAM -g {config[macs_g]} \
  4. --broad --broad-cutoff 0.1 --nomodel --extsize {usePhantomPeaks.Rscript} \
  5. --outdir peaks/mac2/broad -n {wildcards.sample} -q 0.001 &> {log}

3. SICER

Description:

Sicer is a clustering approach for identification of enriched domains from histone modification ChIP-Seq data.

Loading SICER on Biowulf:
  1. module load sicer
Running SICER with controls (Narrow Peaks):
  1. bash {params.SICERDIR}/SICER.sh ./ {wildcards.name}.bed {params.ctrl}.bed ./ hg18 1 100 {getfromPhantomPeaks} 0.79 200 0.01
Running SICER with controls (Broad Peaks):
  1. bash {params.SICERDIR}/SICER.sh ./ {wildcards.name}.bed {params.ctrl}.bed ./ hg18 1 200 {getfromPhantomPeaks} 0.79 400 0.01

Example: $sh DIR/SICER.sh [“InputDir”] [“bed file”] [“control file”] [“OutputDir”] [“Species”] [“redundancy
threshold”] [“window size (bp)”] [“fragment size”] [“effective genome fraction”] [“gap size (bp)”]
[“FDR”]

Running SICER without controls (Narrow Peaks):
  1. bash {params.SICERDIR}/SICER-rb.sh ./ {wildcards.name}.bed ./ hg18 1 100 {getfromPhantomPeaks} 0.79 200 100
Running SICER without controls (Broad Peaks):
  1. bash {params.SICERDIR}/SICER-rb.sh ./ {wildcards.name}.bed ./ hg18 1 200 {getfromPhantomPeaks} 0.79 400 100

Example: $sh DIR/SICER-rb.sh [“InputDir”] [“bed file”] [“OutputDir”] [“species”] [“redundancy threshold”]
[“window size (bp)”] [“fragment size”] [“effective genome fraction”] [“gap size (bp)”] [“E-value”]

Meanings of the parameters that are not self-explanatory:

  • Species: allowed species and genome versions are listed in GenomeData.py. You can add your own species and/or genome versions and relevant data there. Redundancy Threshold: The number of copies of identical reads allowed in a
    library.
  • Window size: resolution of SICER algorithm. For histone modifications, one can use
    200 bp
  • Fragment size: is for determination of the amount of shift from the beginning of a
    read to the center of the DNA fragment represented by the read.
    FRAGMENT_SIZE=150 means the shift is 75.
  • Effective genome fraction: Effective Genome as fraction of the genome size.
  • Gap size: needs to be multiples of window size. Namely if the window size is 200,
    the gap size should be 0, 200, 400, 600, ….

4. GEM

Description:

GEM is a high-resolution peak calling and motif discovery tool for ChIP-seq and ChIP-exo data. GEM only supports BED and SAM
alignment file formats.
GEM is installed on Biowulf.

Loading GEM on Biowulf:
  1. module load gem
Running GEM:
  1. java -Xmx10g -jar $GEMJAR --t 24 \
  2. --d ./Read_Distribution_default.txt \
  3. --g ./mm10.chrom.sizes
  4. --genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
  5. --s 2000000000
  6. --expt SRX000540_mES_CTCF.bed \
  7. --ctrl SRX000543_mES_GFP.bed \
  8. --f BED \
  9. --out mouseCTCF --k_min 6 --k_max 13
  • File for --d ./Read_Distribution_default.txt can be found here
    and files for --g ./*.chrom.sizes can be found here

5. MUSIC

Description:

MUSIC is a tool for identification of enriched regions at multiple scales in the read depth signals from ChIP-Seq experiments.
MUSIC is installed on Biowulf.

Loading Music on Biowulf:
  1. module load samtools # needed to convert to sam format
  2. module load music
Running Music:
  1. mkdir chip; mkdir input
  2. samtools view chip.bam | MUSIC -preprocess SAM stdin chip/
  3. samtools view input.bam | MUSIC -preprocess SAM stdin input/
  4. samtools view /directory/to/chip.bam | MUSIC -preprocess SAM stdin chip/
  5. samtools view /directory/to/input.bam | MUSIC -preprocess SAM stdin input/
  6. mkdir chip/sorted;mkdir chip/dedup;mkdir input/sorted;mkdir input/dedup
  7. MUSIC -sort_reads chip chip/sorted
  8. MUSIC -sort_reads input input/sorted
  9. MUSIC -remove_duplicates chip/sorted 2 chip/dedup
  10. MUSIC -remove_duplicates input/sorted 2 input/dedup
  11. MUSIC -get_multiscale_broad_ERs \
  12. -chip chip/dedup \
  13. -control input/dedup \
  14. -mapp Mappability_36bp \
  15. -l_mapp 36 \
  16. -begin_l 1000 \
  17. -end_l 16000 \
  18. -step 1.5
  • This code tells MUSIC to identify the enriched regions starting from 1kb smoothing window length upto 16kb with
    multiplicative factor of 1.5 using the default parameters for the remaining parameters. The ERs for each scale are dumped.

6. PePr

Description:

PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts
across replicates and between groups with a negative binomial distribution. PePr empirically estimates the optimal
shift/fragment size and sliding window width, and estimates dispersion from the local genomic area. Regions with less
variability across replicates are ranked more favorably than regions with greater variability. Optional post-processing
steps are also made available to filter out peaks not exhibiting the expected shift size and/or to narrow the width of peaks.
PePr is installed on Biowulf.

Loading PePr on Biowulf:
  1. module load PePr
Running Pepr:
  1. PePr -c chip_rep1.bam,chip_rep2.bam \
  2. -i input_rep1.bam,input_rep2.bam \
  3. -f bam \
  4. -n {expname}
  • —shiftsize Half the fragment size.. again comes for ppqt… this should be half ext size that we used for macs
    -f needs to be bampe for PE data…not to worry about this now —> this seems to get *

7. DFilter

Description:

DFilter has been made to detect regulatory regions and enriched sites using tag count data. It has been made using
a generalized approach so that data from multiple kinds of assays can be analyzed. The raw tags files can be in 6-column
bed file, bedgraph, bam or sam format. For more information, read through
DFilter’s documentation.

Location of DFilter:
  1. /data/CCBR_Pipeliner/db/PipeDB/bin/DFilter1.6
Running DFilter (Narrow Peaks):
  1. run_dfilter.sh -d=ChIP.bed -c=input-control.bed -o=peaks.bed -ks=15 -lpval=6 -nonzero -refine -bs=50
Running DFilter (Broad Peaks):
  1. run_dfilter.sh -d=ChIP.bed -c=input-control.bed -o=peaks.bed -ks=25 -lpval=3 -nonzero -bs=100

Running DFilter (Open-chromatin ~ATAC-seq):

  1. run_dfilter.sh -d=Dnase-seq.bed -o=peaks.bed -ks=50 -lpval=2 -bs=100


Back to Top