项目作者: nriddiford

项目描述 :
Extract and explore snv data
高级语言: R
项目地址: git://github.com/nriddiford/mutationProfiles.git
创建时间: 2017-07-25T12:20:07Z
项目社区:https://github.com/nriddiford/mutationProfiles

开源协议:

下载


mutationProfiles

This is a tool to extract and analyse SNV information from VCF files produced by Mutect2 or Varscan2.
This tool is under constant development. Please feel free to contact me, or raise an issue if you encounter any problems.

Installation

Install from github:

  1. git clone https://github.com/nriddiford/mutationProfiles.git
  2. cd mutationProfiles

Dependencies

trinucs.pl requires BioPerl, which can be installed using cpanm:

  1. brew install cpanm
  2. sudo cpanm Bio::Perl

and vcfParse, which can be installed from github:

  1. git clone https://github.com/nriddiford/vcfParse.git
  2. cd vcfParse
  3. perl Makefile.PL
  4. make
  5. make test
  6. make install

Extracting SNV calls from Mutect2 or Freebayes vcf files or Varscan2 native format

Move all .vcf files into data/ and run bash run_trinucs.sh -g <path to genome.fasta>
For Varscan native data run: bash run_trinucs.sh -v -g <path to genome.fasta>

This will run script/trinucs.pl on each .vcf file in data/, and write data from all samples to data/combined_snvs.txt in the following format:

  1. [sample] [chrom] [pos] [ref] [alt] [tri context] [ref>alt] [decomposed trinuc context] [decomposed ref>alt] [type]
  1. A512R17 2L 229832 A C CAG A>C CTG T>G somatic
  2. A512R17 2L 1819239 T A TTC T>A TTC T>A somatic
  3. A512R17 2L 2439881 C T GCC C>T GCC C>T somatic
  4. A512R17 2L 3154318 C G GCC C>G GCC C>G somatic
  5. A512R17 2L 3511198 G A CGA G>A TCG C>T somatic
  6. A512R17 2L 4565784 C G CCT C>G CCT C>G somatic
  7. A512R17 2L 5233457 T G TTA T>G TTA T>G somatic
  8. A512R17 2L 6478473 G C GGT G>C ACC C>G somatic
  9. A512R17 2L 9792284 C T GCC C>T GCC C>T somatic

Annotate SNVs with gene and feature it’s contained within

Run perl script/snv2gene.pl -i data/combined_snvs.txt to annotate the gene and feature hit by each SNV

e.g.:

  1. [sample] [chrom] [pos] [ref] [alt] [tri context] [ref>alt] [decomposed trinuc context] [decomposed ref>alt] [type] [feature] [gene]
  1. A512R17 2L 229832 A C CAG A>C CTG T>G somatic intron kis
  2. A512R17 2L 1819239 T A TTC T>A TTC T>A somatic intergenic intergenic
  3. A512R17 2L 2439881 C T GCC C>T GCC C>T somatic intron dpp
  4. A512R17 2L 3154318 C G GCC C>G GCC C>G somatic intron Mad
  5. A512R17 2L 3511198 G A CGA G>A TCG C>T somatic exon_5 LeuRS
  6. A512R17 2L 4565784 C G CCT C>G CCT C>G somatic intron dpy
  7. A512R17 2L 5233457 T G TTA T>G TTA T>G somatic intron tkv

Explore snv data

Start an R session, and install package:

  1. library(devtools)
  2. install_github("nriddiford/mutationProfiles")
  3. library(mutationProfiles)
  4. setwd('mutationProfiles')

mutationProfiles

The following functions are included:

  1. chromDist : function (object = NA, notch = 0)
  2. cleanTheme : function (base_size = 12)
  3. featuresHit : function ()
  4. geneHit : function (n = 10)
  5. genomeSnvs : function ()
  6. genTris : function ()
  7. getData : function (infile = "data/annotated_snvs.txt")
  8. mutSigs : function (samples = NA, pie = NA)
  9. mutSpectrum : function ()
  10. notchSnvs : function ()
  11. samplesPlot : function (count = NA)
  12. setCols : function (df, col)
  13. snvStats : function ()
  14. triFreq : function (genome = NA, count = NA)

See some stats

  1. snvStats()

Plot mutations per sample

  1. samplesPlot()

Plot mutation spectrum for all samples combined

  1. mutSpectrum()

Plot mutational signatures in data

This plots the output of the package deconstructSigs

  1. mutSigs()

Plot distribution of snvs across chromosomes

  1. chromDist()

Plot number of times a feature type has been hit_ref

  1. featuresHit()

Show the 20 most hit genes

  1. geneHit(n=20)