Benchmarking ChIP-seq peak callers
PeakRanger is a multi-purporse software suite for analyzing next-generation sequencing (NGS) data.
It contains the following tools:
nr
: a noise ratio estimator useful for QC statistics. Estimates signal to noise ratio which is an indicator for ChIPlc
: library complexity calculator useful for QC statistics. Calculates the ratio of unique reads over total reads.ranger
: ChIP-Seq peak caller. Ranger servers better as a narrow-peak caller. It behaves in a conservative butbcp
: ChIP-Seq peak caller. Tuned for the discovery of broad peaks. BCP supports HTML-based annotation reports. ccat
: ChIP-Seq peak caller. Tuned for the discovery of broad peaks. CCAT supports HTML-based annotation reports.Peakranger is installed on Biowulf.
module load peakranger
peakranger nr \
--format bam \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
peakranger lc \
--format bam \
{*.bam} \
--output bcp_results
peakranger ranger \
--format bam \
--report \
--plot_region 10000 \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
-t 4
peakranger bcp \
--format bam \
--report \
--plot_region 10000 \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
-t 4
peakranger ccat \
--format bam \
--report \
--plot_region 10000 \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
-t 4
MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library
construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a
dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and
robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with
or without control samples.MAC2 is installed on Biowulf.
module load macs
module load macs/2.1.0.20150420 R
macs2 callpeak -t {input[0]} \
-c {input[1]} -f BAM -g {config[macs_g]} \
--outdir peaks/mac2/narrow -n {wildcards.sample} \
--nomodel --extsize {usePhantomPeaks.Rscript} -B -q 0.01 &> {log}
cd peaks/mac2/narrow && Rscript {wildcards.sample}_model.r
module load macs/2.1.0.20150420
macs2 callpeak -t {input[0]} \
-c {input[1]} -f BAM -g {config[macs_g]} \
--broad --broad-cutoff 0.1 --nomodel --extsize {usePhantomPeaks.Rscript} \
--outdir peaks/mac2/broad -n {wildcards.sample} -q 0.001 &> {log}
Sicer is a clustering approach for identification of enriched domains from histone modification ChIP-Seq data.
module load sicer
bash {params.SICERDIR}/SICER.sh ./ {wildcards.name}.bed {params.ctrl}.bed ./ hg18 1 100 {getfromPhantomPeaks} 0.79 200 0.01
bash {params.SICERDIR}/SICER.sh ./ {wildcards.name}.bed {params.ctrl}.bed ./ hg18 1 200 {getfromPhantomPeaks} 0.79 400 0.01
Example: $sh DIR/SICER.sh [“InputDir”] [“bed file”] [“control file”] [“OutputDir”] [“Species”] [“redundancy
threshold”] [“window size (bp)”] [“fragment size”] [“effective genome fraction”] [“gap size (bp)”]
[“FDR”]
bash {params.SICERDIR}/SICER-rb.sh ./ {wildcards.name}.bed ./ hg18 1 100 {getfromPhantomPeaks} 0.79 200 100
bash {params.SICERDIR}/SICER-rb.sh ./ {wildcards.name}.bed ./ hg18 1 200 {getfromPhantomPeaks} 0.79 400 100
Example: $sh DIR/SICER-rb.sh [“InputDir”] [“bed file”] [“OutputDir”] [“species”] [“redundancy threshold”]
[“window size (bp)”] [“fragment size”] [“effective genome fraction”] [“gap size (bp)”] [“E-value”]
Meanings of the parameters that are not self-explanatory:
GEM is a high-resolution peak calling and motif discovery tool for ChIP-seq and ChIP-exo data. GEM only supports BED and SAM
alignment file formats.
GEM is installed on Biowulf.
module load gem
java -Xmx10g -jar $GEMJAR --t 24 \
--d ./Read_Distribution_default.txt \
--g ./mm10.chrom.sizes
--genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
--s 2000000000
--expt SRX000540_mES_CTCF.bed \
--ctrl SRX000543_mES_GFP.bed \
--f BED \
--out mouseCTCF --k_min 6 --k_max 13
--d ./Read_Distribution_default.txt
can be found here--g ./*.chrom.sizes
can be found hereMUSIC is a tool for identification of enriched regions at multiple scales in the read depth signals from ChIP-Seq experiments.
MUSIC is installed on Biowulf.
module load samtools # needed to convert to sam format
module load music
mkdir chip; mkdir input
samtools view chip.bam | MUSIC -preprocess SAM stdin chip/
samtools view input.bam | MUSIC -preprocess SAM stdin input/
samtools view /directory/to/chip.bam | MUSIC -preprocess SAM stdin chip/
samtools view /directory/to/input.bam | MUSIC -preprocess SAM stdin input/
mkdir chip/sorted;mkdir chip/dedup;mkdir input/sorted;mkdir input/dedup
MUSIC -sort_reads chip chip/sorted
MUSIC -sort_reads input input/sorted
MUSIC -remove_duplicates chip/sorted 2 chip/dedup
MUSIC -remove_duplicates input/sorted 2 input/dedup
MUSIC -get_multiscale_broad_ERs \
-chip chip/dedup \
-control input/dedup \
-mapp Mappability_36bp \
-l_mapp 36 \
-begin_l 1000 \
-end_l 16000 \
-step 1.5
PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts
across replicates and between groups with a negative binomial distribution. PePr empirically estimates the optimal
shift/fragment size and sliding window width, and estimates dispersion from the local genomic area. Regions with less
variability across replicates are ranked more favorably than regions with greater variability. Optional post-processing
steps are also made available to filter out peaks not exhibiting the expected shift size and/or to narrow the width of peaks.
PePr is installed on Biowulf.
module load PePr
PePr -c chip_rep1.bam,chip_rep2.bam \
-i input_rep1.bam,input_rep2.bam \
-f bam \
-n {expname}
DFilter has been made to detect regulatory regions and enriched sites using tag count data. It has been made using
a generalized approach so that data from multiple kinds of assays can be analyzed. The raw tags files can be in 6-column
bed file, bedgraph, bam or sam format. For more information, read through
DFilter’s documentation.
/data/CCBR_Pipeliner/db/PipeDB/bin/DFilter1.6
run_dfilter.sh -d=ChIP.bed -c=input-control.bed -o=peaks.bed -ks=15 -lpval=6 -nonzero -refine -bs=50
run_dfilter.sh -d=ChIP.bed -c=input-control.bed -o=peaks.bed -ks=25 -lpval=3 -nonzero -bs=100
run_dfilter.sh -d=Dnase-seq.bed -o=peaks.bed -ks=50 -lpval=2 -bs=100