项目作者: hewillk

项目描述 :
Highly optimized Burrow-Wheeler Aligner specifically for Illumina ~150 bp short-read alignment.
高级语言: C++
项目地址: git://github.com/hewillk/aligner.git
创建时间: 2021-03-02T16:21:40Z
项目社区:https://github.com/hewillk/aligner

开源协议:

下载


Hewill Aligner

Hewill Aligner is a highly optimized Burrow-Wheeler Aligner (implement using modern C++20) specifically for paired-end short read alignment with read length ~150 bp.
The elapsed time of mapping sequencing data with 50× coverage (2×180GB) is less than 2 hours under 80 cores and ~18 GB memory usage and has comparable performance to bwa-mem2.

Compiler

  • GCC >= 10.2
  • Intel Threading Building Blocks (sudo apt install libtbb-dev)

Run

build executable

  1. $ g++-10 main.cpp -o hewill -pthread -ltbb -std=c++20 -O3 biomodern/ssw.cpp -Wno-ignored-attributes

Command

  1. Command:
  2. ./hewill index <hs37d5.fa>
  3. ./hewill align <hs37d5.fa> <in1.fq> <in2.fq> <sam_prefix> <sample_name> <read_group_id> [insert_mean] [insert_var] [thread_num]
  4. Required Arguments:
  5. <hs37d5.fa> Reference sequence hs37d5.fa file path
  6. <in1.fq> Read1 FASTQ path
  7. <in2.fq> Read2 FASTQ path
  8. <sam_prefix> SAM file output prefix
  9. <sample_name> SAM file SM tag value
  10. <read_group_id> SAM file RGID tag value
  11. Optional Arguments:
  12. [insert_mean] Mean insert size of the paired-end data, default value is 550
  13. [insert_var] Variance insert size of the paired-end data, default value is 150
  14. [thread_num] Number of threads, default value is return value of std::thread::hardware_concurrency()

index

  • Only support for uncompressed hs37d5.fa. (download)
  • The suffix array sorting time is less than 3 minutes under 80 cores and ~20 GB memory usage.
    1. $ ./hewill index /mnt/fa/hs37d5.fa

    align

  • Only support for uncompressed fastq.
  • The following command will generate HG001.1.sam, HG001.2.sam, … HG001.X.sam and HG001.Y.sam in /mnt/sam/HG001 folder.
    1. // <hs37d5.fa> <in1.fq> <in2.fq> <sam_prefix> <sample_name> <read_group_id> [insert_mean] [insert_var] [thread_num]
    2. $ ./hewill align /mnt/fa/hs37d5.fa /mnt/fq/HG001.1.fq /mnt/fq/HG001.2.fq /mnt/sam/HG001 HG001 1 550 150 80

Performance

PrecisionFDA Truth Challenge benchmark versus bwa-mem2:

Important

This aligner is highly optimized on the following sequencing characteristic (other datasets are not recommended):

  • read length: 148~150bp

If you know the mean and variance of the insert size of the sequencing data, we highly recommend you pass it into the aligner.