项目作者: cobilab

项目描述 :
A lossless compression tool for Amino Acid sequences
高级语言: C
项目地址: git://github.com/cobilab/ac.git
创建时间: 2017-11-21T11:41:57Z
项目社区:https://github.com/cobilab/ac

开源协议:GNU General Public License v3.0

下载


Build Status
Conda
License: GPL v3

AC

AC: a lossless compression tool for amino acid sequences.



AC is a new lossless compressor to compress efficiently amino acid sequences (proteins). It uses a cooperation between multiple context and substitutional tolerant context models. The cooperation between models is balanced with weights that benefit the models with better performance according to a forgetting function specific for each model.

1. INSTALLATION

Downloading and installing AC:

  1. git clone https://github.com/pratas/ac.git
  2. cd ac/src/
  3. cmake .
  4. make

Cmake is needed for the installation (http://www.cmake.org/). You can download it directly from http://www.cmake.org/cmake/resources/software.html or use an appropriate packet manager, such as:

  1. sudo apt-get install cmake
An alternative to cmake, but limited to Linux, can be set using the following instructions:
  1. cp Makefile.linux Makefile
  2. make

2. USAGE

To see the possible options of AC type

  1. ./AC
or
  1. ./AC -h
These will print the following options:
  1. Usage: AC [OPTION]... -r [FILE] [FILE]:[...]

  2. Compression of amino acid sequences.

  3. Non-mandatory arguments:

  4. -h give this help,

  5. -s show AC compression levels,

  6. -v verbose mode (more information),

  7. -V display version number,

  8. -f force overwrite of output,

  9. -l level of compression [1;7] (lazy -tm setup),

  10. -t threshold frequency to discard from alphabet,

  11. -e it creates a file with the extension ".iae"

  12. with the respective information content.

  13. -rm ::/:: reference model (-rm 1:10:0.9/0:0:0),

  14. -rm ::/:: reference model (-rm 5:90:0.9/1:50:0.8),

  15. ...

  16. -tm ::/:: target model (-tm 1:1:0.8/0:0:0),

  17. -tm ::/:: target model (-tm 7:100:0.9/2:10:0.85),

  18. ...

  19. target and reference templates use for

  20. context-order size, for alpha (1/),

  21. for gamma (decayment forgetting factor) [0;1),

  22. to the maximum sets the allowed mutations,

  23. on the context without being discarded (for

  24. deep contexts), under the estimator , using

  25. for gamma (decayment forgetting factor)

  26. [0;1) (tolerant model),

  27. -r reference file ("-rm" are loaded here),

  28. Mandatory arguments:

  29. :<...>:<...> file to compress (last argument). For more

  30. files use splitting ":" characters.

  31. Example:

  32. [Compress] ./AC -v -tm 1:1:0.8/0:0:0 -tm 5:20:0.9/3:20:0.9 seq.txt

  33. [Decompress] ./AD -v seq.txt.co

  34. Report bugs to <{pratas,seyedmorteza,ap}@ua.pt>.

3. EXAMPLE

After AC intallation, run the following:

  1. wget http://sweet.ua.pt/pratas/datasets/AminoAcidsCorpus.zip
  2. unzip AminoAcidsCorpus.zip
  3. cp AminoAcidsCorpus/HI .
  4. ./AC -v -l 2 HI
  5. ./AD -v HI.co
  6. cmp HI HI.de
It will download nine amino acid sequences and compress and decompress one of the smallest (HI). Finally, it compares if the uncompressed sequence is equal to the original. ## 4. CITATION ## On using this tool/method, please cite: - Hosseini, M., Pratas, D. & Pinho, A.J., 2019, Feb. AC: A Compression Tool for Amino Acid Sequences. Interdiscip Sci Comput Life Sci (2019). https://doi.org/10.1007/s12539-019-00322-1 - Pratas, D., Hosseini, M. and Pinho, A.J., 2018, May. Compression of Amino Acid Sequences. In International Conference on Practical Applications of Computational Biology & Bioinformatics (pp. 105-113). Springer, Cham. ## 5. ISSUES ## For any issue let us know at [issues link](https://github.com/pratas/ac/issues). ## 6. LICENSE ## GPL v3. For more information:
  1. http://www.gnu.org/licenses/gpl-3.0.html