A lossless compression tool for Amino Acid sequences
AC: a lossless compression tool for amino acid sequences.
AC is a new lossless compressor to compress efficiently amino acid sequences (proteins). It uses a cooperation between multiple context and substitutional tolerant context models. The cooperation between models is balanced with weights that benefit the models with better performance according to a forgetting function specific for each model.
Downloading and installing AC:
- git clone https://github.com/pratas/ac.git
- cd ac/src/
- cmake .
- make
Cmake is needed for the installation (http://www.cmake.org/). You can download it directly from http://www.cmake.org/cmake/resources/software.html or use an appropriate packet manager, such as:
An alternative to cmake, but limited to Linux, can be set using the following instructions:
- sudo apt-get install cmake
- cp Makefile.linux Makefile
- make
To see the possible options of AC type
or
- ./AC
These will print the following options:
- ./AC -h
Usage: AC [OPTION]... -r [FILE] [FILE]:[...]
Compression of amino acid sequences.
Non-mandatory arguments:
-h give this help,
-s show AC compression levels,
-v verbose mode (more information),
-V display version number,
-f force overwrite of output,
-l
level of compression [1;7] (lazy -tm setup),
-t threshold frequency to discard from alphabet,
-e it creates a file with the extension ".iae"
with the respective information content.
-rm : : / : : reference model (-rm 1:10:0.9/0:0:0),
[Compress] ./AC -v -tm 1:1:0.8/0:0:0 -tm 5:20:0.9/3:20:0.9 seq.txt
After AC intallation, run the following:
It will download nine amino acid sequences and compress and decompress one of the smallest (HI). Finally, it compares if the uncompressed sequence is equal to the original. ## 4. CITATION ## On using this tool/method, please cite: - Hosseini, M., Pratas, D. & Pinho, A.J., 2019, Feb. AC: A Compression Tool for Amino Acid Sequences. Interdiscip Sci Comput Life Sci (2019). https://doi.org/10.1007/s12539-019-00322-1 - Pratas, D., Hosseini, M. and Pinho, A.J., 2018, May. Compression of Amino Acid Sequences. In International Conference on Practical Applications of Computational Biology & Bioinformatics (pp. 105-113). Springer, Cham. ## 5. ISSUES ## For any issue let us know at [issues link](https://github.com/pratas/ac/issues). ## 6. LICENSE ## GPL v3. For more information:
- wget http://sweet.ua.pt/pratas/datasets/AminoAcidsCorpus.zip
- unzip AminoAcidsCorpus.zip
- cp AminoAcidsCorpus/HI .
- ./AC -v -l 2 HI
- ./AD -v HI.co
- cmp HI HI.de
- http://www.gnu.org/licenses/gpl-3.0.html