Python code to extract features from Protein sequences for Machine Learning/Deep Learning
Python code to extract features from Protein sequences for Machine Learning/Deep Learning
Protein feature extraction is carried out using Biopython package
Format:
Packages required (other than built-in) for the execution of code…
-Pandas
-pickle
-Biopython
-subprocess
Format:
For windows
Windows users have to specify the path to fasta files and output folder in linux style of referencing directory using /
slash rather than \
eg C:/folder_name/file_name.fasta
This issue will be fixed in future updates
pip install discere
For linux
pip3 install discere
import discere.discere as di
di.extract_feature('./Documents/positive_training.fasta',
'./Documents/negative_training.fasta',
'./Documents')
di.extract_feature(input_file1, input_file2, output_directory)
Outputs are stored in user_specified_path/output in .txt, .arff and .csv formats