项目作者: RichardAGoldstein

项目描述 :
Haplotype RecOnstruction using Longitudinal sequencing Data
高级语言: Java
项目地址: git://github.com/RichardAGoldstein/HaROLD.git
创建时间: 2018-10-12T13:00:44Z
项目社区:https://github.com/RichardAGoldstein/HaROLD

开源协议:GNU General Public License v3.0

下载


HaROLD: Haplotype RecOstruction using Longitudinal Data

This program performs haplotype reconstruction on longitudinal deep sequencing samples, by analysing co-varying variants in a probabilistic framework.

Program description and documentation available at:

R. A. Goldstein, A. U. Tamuri, S. Roy, J. Breuer, 2018, Haplotype assignment of virus NGS data using co-variation of variant frequencies, bioRxiv doi: 10.1101/444877.

Installation

HaROLD requires Java 8 (or newer). Download pre-built binaries from releases.

Alternatively, you can clone the source and build using the following commands (requires Maven):

  1. git clone https://github.com/RichardAGoldstein/HaROLD.git
  2. cd HaROLD
  3. mvn compile assembly:single

This will create a Java JAR file in the ‘target’ directory.

Usage

View program options:

  1. $ java -jar harold-1.0.jar -h
  2. Usage:
  3. HaROLD haplotype reconstruction program
  4. java -jar harold-1.0.jar [-hvV] [--alpha-frac=<alpha_frac>]
  5. [--error-opt-iter=<errorOptimiseIterations>]
  6. [--threads=<threads>] [--tol=<tol>] [-g=<gammaCache>]
  7. [-s=<randomSeed>] [-a=<initialAlphaParams>
  8. <initialAlphaParams>]... -c=<countFile>...
  9. [-c=<countFile>...]... -n=<haplotypes>...
  10. [-n=<haplotypes>...]...
  11. Description:
  12. HaROLD (HAplotype Reconstruction Of Longitudinal Deep sequencing Data) performs
  13. haplotype reconstruction on longitudinal deep sequencing samples, by analysing
  14. co-varying variants in a probabilistic framework.
  15. HaROLD reads in a set of files, one for each timepoint. These files should be in the
  16. output format of bam-readcounts.
  17. Run using: java -jar harold-1.0.jar -c <count file> -n <no. of haplotypes>
  18. Options:
  19. -c, --count-file=<countFile>...
  20. File containing list of count files
  21. -n, --haplotypes=<haplotypes>...
  22. Number of haplotypes
  23. -g, --gamma-cache=<gammaCache>
  24. Number of Gamma function calculations to cache
  25. -s, --seed=<randomSeed> Seed for random number generator
  26. --threads=<threads> Number of processors for multi-threaded operation
  27. --alpha-frac=<alpha_frac>
  28. Fraction of sites to use to optimise error parameters
  29. -a, --initial-alpha=<initialAlphaParams> <initialAlphaParams>
  30. Initial parameter values for error model
  31. --error-opt-iter=<errorOptimiseIterations>
  32. Limit error parameter optimisation to n rounds (0 means
  33. no limit)
  34. --tol=<tol> Optimisation tolerance
  35. -h, -?, --help Show this help
  36. -v, --verbose
  37. -V, --version Show version
  38. Copyright (c) 2018 Richard A Goldstein

Example

The example directory contains a simple example: three different timepoints of synthetically created data, consisting of three different mixtures of the two CMV sequences KP745665.1 and KP745692.1. (See Supplementary Material in bioRxiv document for details.)

e.g.(in the example folder):

  1. java -jar harold-1.0.jar -c filelist -n 2