项目作者: jithin8mathew

项目描述 :
Python code to extract features from Protein sequences for Machine Learning/Deep Learning
高级语言: Python
项目地址: git://github.com/jithin8mathew/Protein-feature-extraction.git
创建时间: 2018-11-16T16:36:29Z
项目社区:https://github.com/jithin8mathew/Protein-feature-extraction

开源协议:MIT License

下载


Build Status
License: MIT
codecov
GitHub version
GitHub issues

Downloads
Downloads
Downloads

Languages
forks
stars


Protein Feature Extraction for Machine Learning


Python code to extract features from Protein sequences for Machine Learning/Deep Learning

Protein feature extraction is carried out using Biopython package

Radar Plot
Format:

Features (27 features):

  1. AA-count (20x features)
  2. aromaticity (1x)
  3. secondary_structure_fraction (3x)
  4. isoelectric_point (1x)
  5. molecular_weight (1x)
  6. instability_index (1x)

Packages required (other than built-in) for the execution of code…
-Pandas
-pickle
-Biopython
-subprocess

Top N features for identifying Insuliin protein sequence

insulin best N features
Format:

Installation

For windows
Windows users have to specify the path to fasta files and output folder in linux style of referencing directory using / slash rather than \
eg C:/folder_name/file_name.fasta
This issue will be fixed in future updates

  1. pip install discere

For linux

  1. pip3 install discere

Usage

  1. import discere.discere as di
  2. di.extract_feature('./Documents/positive_training.fasta',
  3. './Documents/negative_training.fasta',
  4. './Documents')

di.extract_feature(input_file1, input_file2, output_directory)

output

Outputs are stored in user_specified_path/output in .txt, .arff and .csv formats