项目作者: montilab

项目描述 :
Contextualization of protein-protein interaction databases by cell line
高级语言: Python
项目地址: git://github.com/montilab/ppi-context.git
创建时间: 2020-06-18T18:55:51Z
项目社区:https://github.com/montilab/ppi-context

开源协议:GNU General Public License v3.0

下载


PPI-Context

Contextualization of protein-protein interaction databases by cell line

Clone repository

  1. $ git clone https://github.com/montilab/ppi-context

Install requirements

  1. $ cd ppi-context
  2. $ pip install -r requirements.txt

The data

If you just want the data it’s easy to load into R…

  1. $ R
  1. ppi <- read.delim("data/v_1_00/PPI-Context.txt", header=TRUE, sep="\t", stringsAsFactors=FALSE)
  1. data.frame(sort(table(ppi$cell_name), decreasing=TRUE)) %>%
  2. set_colnames(c("var", "freq")) %>%
  3. head(30) %>%
  4. ggbarplot(x="var", y="freq", fill="freq") +
  5. labs(title="", x="Cell Line Name", y="PPI") +
  6. scale_fill_viridis_c(option="inferno", begin=0, end=0.8) +
  7. theme(legend.position="none",
  8. axis.text.x=element_text(angle=45, hjust=1, size=12, face="bold"))

Pre-processing the data

  1. | PPI - Context (v1.0)
  2. usage: ppictx.py [-h] [-r] [-d]
  3. [-fh PATH_HIPPIE]
  4. [-fp PATH_PUBTATOR]
  5. [-fc PATH_CELLOSAURUS]
  6. optional arguments:
  7. -h, --help show this help message and exit
  8. -r, --run run pipeline
  9. -d, --download download raw data first
  10. -fh PATH_HIPPIE path to downloaded Hippie data (optional)
  11. -fp PATH_PUBTATOR path to downloaded Pubtator data (optional)
  12. -fc PATH_CELLOSAURUS path to downloaded Cellosaurus data (optional)

In most cases you will need to download the latest bulk data first and
then process it…

  1. $ python ppictx.py --download --run
  1. | PPI - Context (v1.0)
  2. | Downloading raw data...
  3. | Processing raw data
  4. ~ [PPI]
  5. ~ [PID -> CLA]
  6. ~ [CLA -> CID]
  7. ~ [PPI -> PID -> CLA -> CID]

In other cases, you might have the previous versions of the data to
process…

  1. $ python ppictx.py --run \
  2. -fh path/to/HIPPIE.mitab \
  3. -fp path/to/PUBTATOR.gz \
  4. -fc path/to/CELLOSAURUS.txt

Special considerations

  • Cell lines that are primarily used in research due to their
    efficiency as an expression vector (e.g. HeLa, HEK, CHO, Sf9) may
    not be useful representations of cell-specific protein dynamics.
    However it may be useful to filter out PPIs annotated with these
    cell lines.

  • Cellosaurus contains synonymous cell lines, therefore some
    annotations such as HEK (CVCL_M624) and HEK293 (CVCL_0045)
    refer to the same cell line. Users should be aware of synonymous
    cell lines relevant to their interests and filter accordingly.

Cite

Federico A, Monti S (2021) Contextualized Protein-Protein Interactions.
Patterns. https://doi.org/10.1016/j.patter.2020.100153.