项目作者: HosnaJabbari

项目描述 :
RNA Pseudoknotted Secondary Structure Prediction Using Strict Hierarchical Folding
高级语言: C++
项目地址: git://github.com/HosnaJabbari/HFold.git
创建时间: 2017-06-23T21:15:11Z
项目社区:https://github.com/HosnaJabbari/HFold

开源协议:

下载


HFold

Description:

Software implementation of HFold.
HFold is an algorithm for predicting the pseudoknotted secondary structures of RNA using strict Hierarchical Folding.

Cite:

Jabbari, H., Condon, A., Pop, A., Pop, C., Zhao, Y. (2007). HFold: RNA Pseudoknotted Secondary Structure Prediction Using Hierarchical Folding. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science, vol 4645. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-540-74126-8_30

Jabbari, H., Condon, A., Zhao Y. Novel and Efficient RNA Secondary Structure Prediction Using Hierarchical Folding.Journal of Computational Biology.Mar 2008.139-163.
http://doi.org/10.1089/cmb.2007.0198

Supported OS:

Linux
macOS

Installation:

Requirements: A compiler that supports C++11 standard (tested with g++ version 4.7.2 or higher) and CMake version 3.1 or greater.

CMake version 3.1 or greater must be installed in a way that HFold can find it.
To test if your Mac or Linux system already has CMake, you can type into a terminal:

  1. cmake --version

If it does not print a cmake version greater than or equal to 3.1, you will have to install CMake depending on your operating system.

Mac:

Easiest way is to install homebrew and use that to install CMake.
To do so, run the following from a terminal to install homebrew:

  1. /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

When that finishes, run the following from a terminal to install CMake.

  1. brew install cmake

Linux:

Run from a terminal

  1. wget http://www.cmake.org/files/v3.8/cmake-3.8.2.tar.gz
  2. tar xzf cmake-3.8.2.tar.gz
  3. cd cmake-3.8.2
  4. ./configure
  5. make
  6. make install

Linux instructions source

Steps for installation

  1. Download the repository and extract the files onto your system.
  2. From a command line in the root directory (where this README.md is) run
    1. cmake -H. -Bbuild
    2. cmake --build build
    If you need to specify a specific compiler, such as g++, you can instead run something like
    1. cmake -H. -Bbuild -DCMAKE_CXX_COMPILER=g++
    2. cmake --build build
    This can be useful if you are getting errors about your compiler not having C++11 features.

Help

  1. Usage: HFold[options] [input sequence]

Read input file from cmdline; predict minimum free energy and optimum structure using the RNA folding algorithm.

  1. -h, --help Print help and exit
  2. -V, --version Print version and exit
  3. -r, --input-structure Give a restricted structure as an input structure
  4. -i, --input-file Give a path to an input file containing the sequence (and input structure if known)
  5. -o, --output-file Give a path to an output file which will the sequence, and its structure and energy
  6. -n, --opt Specify the number of suboptimal structures to output (default is 1)
  7. -p --pk-free Specify whether you only want the pseudoknot-free structure to be calculated
  8. -k --pk-only Only add base pairs which cross the constraint structure. The constraint structure is returned if there are no energetically favorable crossing base pairs
  9. -d --dangles Specify the dangle model to be used (base is 2)
  10. -P, --paramFile Read energy parameters from paramfile, instead of using the default parameter set.\n
  11. --noConv Do not convert DNA into RNA. This will use the Matthews 2004 parameters for DNA

How to use:

  1. Remarks:
  2. make sure the <arguments> are enclosed in "", for example -r "..().." instead of -r ..()..
  3. input file for -i must be .txt
  4. if -i is provided with just a file name without a path, it is assuming the file is in the diretory where the executable is called
  5. if -o is provided with just a file name without a path, the output file will be generated in the diretory where the executable is called
  6. if -o is provided with just a file name without a path, and if -i is provided, then the output file will be generated in the directory where the input file is located
  7. if suboptimal structures are specified, repeated structures are skipped. That is, if different input structures come to the same conclusion, only those that are different are shown
  8. If no input structure is given, or suboptimal structures are greater than the number given, CParty generates hotspots to be used as input structures -- where hotspots are energetically favorable stems
  9. The default parameter file is DP09. This can be changed via -P and specifying the parameter file you would like
  10. Sequence requirements:
  11. containing only characters GCAU
  12. Structure requirements:
  13. -pseudoknot free
  14. -containing only characters .x()
  15. Remarks:
  16. Restricted structure symbols:
  17. () restricted base pair
  18. . no restriction
  19. x restricted to unpaired
  20. Input file requirements:
  21. Line1: FASTA name (optional)
  22. Line2: Sequence
  23. Line3: Structure
  24. sample:
  25. >Sequence1 (optional)
  26. GCAACGAUGACAUACAUCGCUAGUCGACGC
  27. (............................)

Example:

  1. assume you are in the directory where the HFold executable is loacted
  2. ./build/HFold -i "/home/username/Desktop/myinputfile.txt"
  3. ./build/HFold -i "/home/username/Desktop/myinputfile.txt" -o "outputfile.txt"
  4. ./build/HFold -r "(............................)" GCAACGAUGACAUACAUCGCUAGUCGACGC
  5. ./build/HFold -r "(((((.........................)))))................" -d1 GGGGGAAAAAAAGGGGGGGGGGAAAAAAAACCCCCAAAAAACCCCCCCCCC
  6. ./build/HFold -p -r "(............................)" -o "/home/username/Desktop/some_folder/outputfile.txt" GCAACGAUGACAUACAUCGCUAGUCGACGC
  7. ./build/HFold -n 3 -r "(............................)" -o "/home/username/Desktop/some_folder/outputfile.txt" GCAACGAUGACAUACAUCGCUAGUCGACGC
  8. ./build/HFold -k -r "(............................)" GCAACGAUGACAUACAUCGCUAGUCGACGC
  9. ./build/HFold -P "params/rna_Turner04.par" -r "(............................)" GCAACGAUGACAUACAUCGCUAGUCGACGC

Changes

  1. (Mateo 03/11/24) HFold has been given a full rework and has been changed from the simfold style of code to the ViennaRNA style.
  2. Many of the original files have been condensed or removed due to this.
  3. Along with this, is the use of a partial library from ViennaRNA. This change comes with ~60-70x faster prediction time. Users can also
  4. use pk-only prediction -- see Iterative HFold, and pseudoknot-free if desired.

Bug fixes

  1. (Mateo 03/11/24) VP case 1-3 did not allow for pseudoknots within multiloops, this has been fixed.
  2. VM did not previously update with the pseudoknots predicted. This was fixed in the rework as the prediction was combined
  3. WMBP case 1 did not allow for kissing hairpins where the middle band was in G. The bounds for l have been changed to bp(i,l) to
  4. Bp(l,j) to fix this

Questions

For questions, you can email mateo2@ualberta.ca