项目作者: mrzResearchArena

项目描述 :
Anticancer Peptide Identification employing Multi-headed Deep-CNN
高级语言: Jupyter Notebook
项目地址: git://github.com/mrzResearchArena/Anticancer-Peptides-CNN.git
创建时间: 2020-03-14T17:38:31Z
项目社区:https://github.com/mrzResearchArena/Anticancer-Peptides-CNN

开源协议:GNU General Public License v3.0

下载


ACP-MHCNN: An Accurate Multi-Headed Deep-Convolutional Neural Network to Predict Anticancer Peptides

Abstract:

Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention
globally, still the primary methods such as chemotherapy have significant downsides and low specificity.
Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic
alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab
experiments is expensive and time-consuming. Hence, computational methods have emerged as viable
alternatives. During the past few years, several computational ACP identification techniques using hand-
engineered features have been proposed to solve this problem. In this study, we propose a new multi headed
deep convolutional neural network model called ACP-MHCNN, for extracting and combining
discriminative features from different information sources in an interactive way. Our model extracts
sequence, physicochemical, and evolutionary based features for ACP identification using different
numerical peptide representations while restraining parameter overhead. It is evident through rigorous
experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models
for anticancer peptide identification by a substantial margin. ACP-MHCNN outperforms state-of-the-art
model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and
MCC respectively.

A. Model Architecture:

B. Datasets:

We used three anticancer peptide dataset for our experiment. The datasets are ACP-740 [1], ACP-240 [1], and ACP-500/164 [2].
The raw dataset are FASTA sequences; the raw datasets are available on repository in FASTA format. Afterward, we extract feature using (A) Binary Profile Feature (BPF) Encoding, (B) Physiochemical Property based Encoding, and (C) Evolutionary Information based Encoding. The extracted datasets are also available on repository in NumPy format.

C. Implementation:

  1. Implementation for the ACP-740 dataset.
  2. Implementation for the ACP-240 dataset.
  3. Implementation for the ACP-500/164 dataset.

D. Model Architecture:

You can find the model architectures with parameters from the given link.

E. ROC Curve:

You can find the ROC curves from the given link.

References:

[1]. https://doi.org/10.1016/j.omtn.2019.04.025

[2]. https://doi.org/10.1093/bioinformatics/bty451