项目作者: Picani

项目描述 :
Make phylogenetic trees and lineages from the NCBI Taxonomy database
高级语言: Rust
项目地址: git://github.com/Picani/fastax.git
创建时间: 2019-06-08T23:29:31Z
项目社区:https://github.com/Picani/fastax

开源协议:MIT License

下载


Fastax

crates.io badge

Fastax is a command-line tool that makes phylogenetic trees and lineages
from the NCBI Taxonomy database. It uses a local copy of the database,
which makes it really fast.

By default, all results are pretty-printed. In addition, it can output trees
as Newick and lineages as CSV.

It can also be used to get information about some taxa like there alternative
scientific names or the genetic code they use.

Installation

Fastax is written in Rust, which makes it safe, fast and portable. The
code is managed using Cargo and published on crates.io. If Cargo
is already installed, just open a terminal and type:

  1. $ cargo install fastax

Et voilà !

Alternatively, you can compile it from sources:

  1. $ git clone https://github.com/Picani/fastax.git
  2. $ cd fastax
  3. $ cargo build --release

The executable file is target/release/fastax. Just move it somewhere on
your PATH.

Populate the local database

First, you need to get the local copy of the NCBI Taxonomy database.

  1. $ fastax populate -ve plop@example.com

populate will download the latest database dumps, extract them, and load
them in a local SQLite database. -v asks fastax to tell what it’s doing.
-e asks to connect to the NCBI with that email address. Note that giving
your email is optional but preferred.

The database is located in a fastax folder inside your local data folder,
which should be $HOME/.local/share.

Usage

For each command, you need to query at least one node. The term used to get
a node can be either its unique NCBI Taxonomy ID (so called taxid), its
binomial scientific name or its binomial scientific name with the two part
separated by an underscore (the character _). This last option is useful
for scripting.

Note also that for some species, multiple binomial scientific names are in
use. Fastax looks for each of them.

The show command

You can get general information about a node:

  1. $ fastax show 4932
  2. Saccharomyces cerevisiae - species
  3. ----------------------------------
  4. NCBI Taxonomy ID: 4932
  5. Same as:
  6. * Saccharomyces capensis
  7. * Saccharomyces italicus
  8. * Saccharomyces oviformis
  9. * Saccharomyces uvarum var. melibiosus
  10. Commonly named baker's yeast.
  11. Also known as:
  12. * S. cerevisiae
  13. * brewer's yeast
  14. Part of the Plants and Fungi.
  15. Uses the Standard genetic code.
  16. Its mitochondria use the Yeast Mitochondrial genetic code.

or:

  1. $ fastax show "Homo sapiens"
  2. Homo sapiens - species
  3. ----------------------
  4. NCBI Taxonomy ID: 9606
  5. Commonly named human.
  6. Also known as:
  7. * man
  8. First description:
  9. * Homo sapiens Linnaeus, 1758
  10. Part of the Primates.
  11. Uses the Standard genetic code.
  12. Its mitochondria use the Vertebrate Mitochondrial genetic code.

or also:

  1. $ fastax show Tyrannosaurus_rex
  2. Tyrannosaurus rex - species
  3. ---------------------------
  4. NCBI Taxonomy ID: 436495
  5. Part of the Vertebrates.
  6. Uses the Standard genetic code.
  7. Its mitochondria use the Vertebrate Mitochondrial genetic code.

The lineage command

You can get the lineage of a node:

  1. $ fastax lineage 4932
  2. root
  3. └┬─ no rank: cellular organisms (taxid: 131567)
  4. └┬─ superkingdom: Eukaryota (taxid: 2759)
  5. └┬─ no rank: Opisthokonta (taxid: 33154)
  6. └┬─ kingdom: Fungi (taxid: 4751)
  7. └┬─ subkingdom: Dikarya (taxid: 451864)
  8. └┬─ phylum: Ascomycota (taxid: 4890)
  9. └┬─ no rank: saccharomyceta (taxid: 716545)
  10. └┬─ subphylum: Saccharomycotina (taxid: 147537)
  11. └┬─ class: Saccharomycetes (taxid: 4891)
  12. └┬─ order: Saccharomycetales (taxid: 4892)
  13. └┬─ family: Saccharomycetaceae (taxid: 4893)
  14. └┬─ genus: Saccharomyces (taxid: 4930)
  15. └── species: Saccharomyces cerevisiae (taxid: 4932)

The same lineage in CSV:

  1. $ fastax lineage Saccharomyces_cerevisiae
  2. no rank:root:1,no rank:cellular organisms:131567,superkingdom:Eukaryota:2759,no rank:Opisthokonta:33154,kingdom:Fungi:4751,subkingdom:Dikarya:451864,phylum:Ascomycota:4890,no rank:saccharomyceta:716545,subphylum:Saccharomycotina:147537,class:Saccharomycetes:4891,order:Saccharomycetales:4892,family:Saccharomycetaceae:4893,genus:Saccharomyces:4930,species:Saccharomyces cerevisiae:4932

The tree command

You can get a phylogenetic tree:

  1. $ fastax tree "Escherichia coli" 4932 Drosophila_melanogaster 9606 "Mus musculus"
  2. ─┬─ no rank: root
  3. └─┬─ no rank: cellular organisms
  4. ├─┬─ no rank: Opisthokonta
  5. ├─┬─ no rank: Bilateria
  6. ├─┬─ superorder: Euarchontoglires
  7. ├── species: Mus musculus
  8. └── species: Homo sapiens
  9. └── species: Drosophila melanogaster
  10. └── species: Saccharomyces cerevisiae
  11. └── species: Escherichia coli

The same tree in Newick:

  1. $ fastax tree -n 562 4932 7227 9606 10090
  2. (root,(cellular organisms,(Escherichia coli,Opisthokonta,(Saccharomyces cerevisiae,Bilateria,(Drosophila melanogaster,Euarchontoglires,(Homo sapiens,Mus musculus))))));

With -f/--format, you can also change the default node formatting:

  1. $ fastax tree -f "%taxid (%name)" "Escherichia coli" 4932 Drosophila_melanogaster 9606 "Mus musculus"
  2. ─┬─ 1 (root)
  3. └─┬─ 131567 (cellular organisms)
  4. ├─┬─ 33154 (Opisthokonta)
  5. ├─┬─ 33213 (Bilateria)
  6. ├─┬─ 314146 (Euarchontoglires)
  7. ├── 10090 (Mus musculus)
  8. └── 9606 (Homo sapiens)
  9. └── 7227 (Drosophila melanogaster)
  10. └── 4932 (Saccharomyces cerevisiae)
  11. └── 562 (Escherichia coli)

The available tags are

  • %name which is replaced by the scientific name,
  • %rank which is replaced by the rank,
  • %taxid which is replaced by the NCBI Taxonomy ID.

By default, the nodes with only one child are hidden. You can show them with
the -i/--internal option:

  1. $ fastax tree -i Mus_musculus Rattus_norvegicus
  2. ─┬─ no rank: root
  3. └─┬─ no rank: cellular organisms
  4. └─┬─ superkingdom: Eukaryota
  5. └─┬─ no rank: Opisthokonta
  6. └─┬─ kingdom: Metazoa
  7. └─┬─ no rank: Eumetazoa
  8. └─┬─ no rank: Bilateria
  9. └─┬─ no rank: Deuterostomia
  10. └─┬─ phylum: Chordata
  11. └─┬─ subphylum: Craniata
  12. └─┬─ no rank: Vertebrata
  13. └─┬─ no rank: Gnathostomata
  14. └─┬─ no rank: Teleostomi
  15. └─┬─ no rank: Euteleostomi
  16. └─┬─ superclass: Sarcopterygii
  17. └─┬─ no rank: Dipnotetrapodomorpha
  18. └─┬─ no rank: Tetrapoda
  19. └─┬─ no rank: Amniota
  20. └─┬─ class: Mammalia
  21. └─┬─ no rank: Theria
  22. └─┬─ no rank: Eutheria
  23. └─┬─ no rank: Boreoeutheria
  24. └─┬─ superorder: Euarchontoglires
  25. └─┬─ no rank: Glires
  26. └─┬─ order: Rodentia
  27. └─┬─ suborder: Myomorpha
  28. └─┬─ no rank: Muroidea
  29. └─┬─ family: Muridae
  30. └─┬─ subfamily: Murinae
  31. ├─┬─ genus: Rattus
  32. └── species: Rattus norvegicus
  33. └─┬─ genus: Mus
  34. └─┬─ subgenus: Mus
  35. └── species: Mus musculus

The subtree command

You can get the phylogenetic tree of the children of a node:

  1. $ fastax subtree Homininae
  2. ─┬─ subfamily: Homininae
  3. ├─┬─ genus: Homo
  4. ├── species: Homo heidelbergensis
  5. └─┬─ species: Homo sapiens
  6. ├── subspecies: Homo sapiens subsp. 'Denisova'
  7. └── subspecies: Homo sapiens neanderthalensis
  8. ├─┬─ genus: Pan
  9. ├─┬─ species: Pan troglodytes
  10. ├── subspecies: Pan troglodytes verus x troglodytes
  11. ├── subspecies: Pan troglodytes ellioti
  12. ├── subspecies: Pan troglodytes vellerosus
  13. ├── subspecies: Pan troglodytes verus
  14. ├── subspecies: Pan troglodytes troglodytes
  15. └── subspecies: Pan troglodytes schweinfurthii
  16. └── species: Pan paniscus
  17. └─┬─ genus: Gorilla
  18. ├─┬─ species: Gorilla beringei
  19. ├── subspecies: Gorilla beringei beringei
  20. └── subspecies: Gorilla beringei graueri
  21. └─┬─ species: Gorilla gorilla
  22. ├── subspecies: Gorilla gorilla diehli
  23. ├── subspecies: Gorilla gorilla uellensis
  24. └── subspecies: Gorilla gorilla gorilla

If you only want the species:

  1. $ fastax subtree -s Homininae
  2. ─┬─ subfamily: Homininae
  3. ├─┬─ genus: Homo
  4. ├── species: Homo heidelbergensis
  5. └── species: Homo sapiens
  6. ├─┬─ genus: Pan
  7. ├── species: Pan troglodytes
  8. └── species: Pan paniscus
  9. └─┬─ genus: Gorilla
  10. ├── species: Gorilla beringei
  11. └── species: Gorilla gorilla

The same tree in newick:

  1. $ fastax subtree -sn Homininae
  2. (Homininae,(Homo,(Homo sapiens,Homo heidelbergensis),Gorilla,(Gorilla beringei,Gorilla gorilla),Pan,(Pan paniscus,Pan troglodytes)));

As with the tree command, you can format the node with the -f/--format
option, and show the internal nodes with the -i/--internal option. See
above for more information.

License

Copyright © 2019 Sylvain PULICANI picani@laposte.net

This work is free. You can redistribute it and/or modify it under the terms
of the MIT license. See the LICENSE file for more details.