Prophage Tracer
Prophage Tracer: Precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment.
Based on our analysis, in order to detect prophages with low excision rate, 100–1000× sequencing depth for a genome is recommended. At this range of sequencing depth, Prophage Tracer can detect the hidden prophages with excision rates (attB/gyrB) > $10^-3$ and/or replication (attP/gyrB) > $10^-3$ in host genomes.
wget -c https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
bash Miniconda2-latest-Linux-x86_64.sh
export PATH=~/miniconda2/bin:$PATH
conda install -c bioconda bwa
conda install -c bioconda sambamba
conda install -c bioconda samtools
prophage_tracer.sh
and put it in your working pathAssume that you have a sequenced bacterium (strain1) genome in FASTA format reference_genome_strain1.fasta
, and paried reads in FASTQ format 1.fastq.gz
and 2.fastq.gz
bwa index reference_genome_strain1.fasta -p strain1
bwa mem strain1 1.fastq.gz 2.fastq.gz >strain1.sam
samtools view -S -b strain1.sam -o strain1.bam
sambamba markdup -r strain1.bam strain1.rmdup.bam
samtools view strain1.rmdup.bam -o strain1.rmdup.sam
bash prophage_tracer.sh -m strain1.rmdup.sam -r reference_genome_strain1.fasta -p strain1
usage: prophage_tracer [options] -m <in.sam> -r <in.fasta> -p <prefix>
options:
-m FILE a full SAM file (required)
-r FILE a reference genome sequence (required)
-p STRING prefix of output files (required; usually a strain name or a sample name)
-x INT maximal size of a prophage (default: 150000)
-n INT minimal size of a prophage (default: 5000)
-a INT minimal length of attchment site (default: > 2)
-t INT number of threads used for BlastN (default: 1)
-s INT minimal event of split reads required for supporting a prophage candidate (default: 1)
-d INT minimal event of discordant read pairs required for supporting a prophage candidat (default: 1)
strain1.prophage.out
prophage_candidate | contig | attL_start | attL_end | attR_start | attR_end | prophage_size | SR_evidence_attB | SR_evidence_attP | DRP_evidence_attB | DRP_evidence_attP |
---|---|---|---|---|---|---|---|---|---|---|
candidate_1 | contig00007=::=contig00014 | 209162 | 209236 | 2365 | 2439 | 16770 | 0 | 4 | 1 | 2 |
candidate_2 | contig00001 | 1064123 | 1064145 | 1100156 | 1100178 | 36033 | 0 | 1 | 0 | 0 |
candidate_3 | =contig00003::=contig00004 | 1700 | 1764 | 46895 | 46959 | 48658 | 2 | 28 | 2 | 24 |
contig
column, it means an intact predicted prophage is in this contig.prophage_tracer.sh
is used for chromosome-level genomes. prophage_tracer_WGS.sh
can be used for chromosome-level and contig-level genomes. However, using prophage_tracer_WGS.sh for analysis of chromosome-level genomes would be slow.ncbi-blast-2.6.0+-win64.exe
) installed. For installing Git Bash and blastn in Windows 10, please refer to https://git-scm.com/downloads and https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.6.0/.genome file: GCF_009846525.1_ASM984652v1_genomic.fna
reads files: test_R1.fq.gz and test_R2.fq.gz
1.Mapping reads to the genome
bwa index GCF_009846525.1_ASM984652v1_genomic.fna -p test-strain
bwa mem test-strain test_small_R1.fq.gz test_small_R2.fq.gz test-strain.sam
2.Run prophage_tracer
bash prophage_tracer.sh -m test-strain.sam -r GCF_009846525.1_ASM984652v1_genomic.fna -p test-strain
3.Output of test
prophage_candidate contig attL_start attL_end attR_start attR_end prophage_size SR_evidence_attB SR_evidence_attP DRP_evidence_attB DRP_evidence_attP
candidate_1 NZ_CP024621.1 2090511 2090576 2139945 2140010 49434 0 1 0 1
#### Using `generate_DNA.sh` for generating simulated genomes resulted from prophage excision
------
Install `seqkit` first
```Bash
conda install -c bioconda seqkit
Download generate_DNA.sh
and random_DNA.py
Run script (default: simulating 20 genomes; one prophage in each genome)
bash generate_DNA.sh
blastn
to set whether mismatch were allowed in the att sites (-penalty).Version 1.0.3: ignore the error message caused by makeblastdb and continue with the following command
Kaihao Tang, khtang@scsio.ac.cn;
Xiaoxue Wang, xxwang@scsio.ac.cn;
Marine Biofilm Lab;
SCSIO, Chinese Academy of Sciences