项目作者: xjtu-omics

项目描述 :
MSIsensor-RNA: Microsatellite instability detection using RNA sequencing data
高级语言: Python
项目地址: git://github.com/xjtu-omics/msisensor-rna.git
创建时间: 2021-03-11T01:07:34Z
项目社区:https://github.com/xjtu-omics/msisensor-rna

开源协议:Other

下载


GitHub last commit
GitHub Release Date
Docker Cloud Build Status

Docker Pulls
GitHub all releases
GitHub Repo stars
GitHub release (latest by date)
PyPI
Docker Image Version (latest by date)

MSIsensor-RNA

MSIsensor-RNA is a member of MSIsensor family for microsatellite instability (MSI) detection using RNA expression data including Microarray, RNA-seq, and single RNA-seq (scRNA-seq). MSIsensor-RNA compute MSI by the expression of MSI associated genes. MSIsensor-RNA shows efficient performance in AUC, sensitivity, specificity and robustness.


Citation

Peng Jia, Xuanhao Yang, Xiaofei Yang, Tingjie Wang, Yu Xu, Kai Ye, MSIsensor-RNA: Microsatellite Instability Detection for Bulk and Single-cell Gene Expression Data, Genomics, Proteomics & Bioinformatics, 2024;, qzae004, https://doi.org/10.1093/gpbjnl/qzae004


Authors

MSIsensor-RNA is free for non-commercial use
by academic, government, and non-profit/not-for-profit institutions. A
commercial version of the software is available and licensed through
Xi’an Jiaotong University. For more information, please contact with
Peng Jia (pengjia@stu.xjtu.edu.cn) or Kai Ye (kaiye@xjtu.edu.cn).


Scopes of MSIsensor-RNA

Microsatellite Instability is an indispensable biomarker in cancer therapies and prognosis,
particularly in immunotherapy. Our previous work for MSI detection based on
next-generation-sequencing data, MSIsensor and
MSIsensor-pro, are widely used in clinical
research projects. In particular, MSIsensor is the chosen MSI scoring method in the first
FDA-approved pan cancer panel, MSK-IMPACT. However, most of those DNA-based methods,
including MSIsensor and MSIsensor-pro, quantify MSI evaluation of genome mutations as
consequence of MSI status rather than the direct cause of MSI, the deficiency of mismatch
repair (MMR) system. In addition, selection of detected microsatellite sites and thresholds
for different populations, sequencing panels and cancer types impedes the standardized
detection of MSI in clinical. To solve these problems, we launched a new member for
MSIsensor family, MSIsensor-RNA, a standalone software for MSI detection with MMR
associated genes from tumor RNA sequencing data. MSIsensor-RNA shows efficient
performance in AUC, sensitivity, specificity and robustness. MSIsensor-RNA also costs
less in aspect of sequencing and computation, and does not need selection of
microsatellite sites and threshold for different populations compared to the NGS-based
methods, including MSIsensor and MSIsensor-pro.


How to install MSIsensor-RNA?

Install with pip3

```shell script
conda create -n myenv python>=3.6
conda activate myenv
git clone https://github.com/xjtu-omics/msisensor-rna.git
pip3 install .

  1. ### Install with docker
  2. ```shell script
  3. docker pull pengjia1110/msisensor-rna:latest
  4. docker run -v /local/path:/docker/path pengjia1110/msisensor-rna:latest msisensor-rna

How to use MSI ?

Usage:

```shell script
msisensor-rna [options]

  1. ### Key Commands:
  2. #### **genes**
  3. * **Function**. Select informative genes for microsatellite instability detection.
  4. * **Parameters**
  1. -h, --help show this help message and exit
  2. -i INPUT, --input INPUT
  3. The path of input file. e.g. xxx.csv [required]
  4. -o OUTPUT, --output OUTPUT
  5. The output file of gene information. e.g. xxx.csv [required]
  6. -thresh_t THREADS, --threads THREADS
  7. The threads used to run this program. [default=4]
  8. -thresh_cov THRESH_COV, --thresh_cov THRESH_COV
  9. Threshold for coefficient of variation of gene expression value of all samples (Mean/Std). [default=0.5]
  10. -thresh_p THRESH_P_RANKSUM, --thresh_p_ranksum THRESH_P_RANKSUM
  11. Threshold for Pvalue of rank sum test between MSI-H and MSS samples. [default=0.01]
  12. -thresh_auc THRESH_AUCSCORE, --thresh_AUCscore THRESH_AUCSCORE
  13. Threshold for AUC score: AUC score was calculating by the sklearn package. [default=0.65]
  14. -p POSITIVE_NUM, --positive_num POSITIVE_NUM
  15. The minimum positive sample of MSI for training. [default = 10]
  16. ```

train

  • Function. Train custom model for microsatellite instability detection.
  • Parameters

    1. -h, --help show this help message and exit
    2. -i INPUT, --input INPUT
    3. The path of input file. [required]
    4. -m MODEL, --model MODEL
    5. The trained model of the input file. [required]
    6. -t CANCER_TYPE, --cancer_type CANCER_TYPE
    7. The cancer type for this training. e.g. CRC, STAD,
    8. PanCancer etc.
    9. -c {SVM,RandomForest,LogisticRegression,MLPClassifier,GaussianNB,AdaBoostClassifier}, --classifier {SVM,RandomForest,LogisticRegression,MLPClassifier,GaussianNB,AdaBoostClassifier}
    10. The machine learning classifier for MSI detection.
    11. [default = RandomForest]
    12. -di INPUT_DESCRIPTION, --input_description INPUT_DESCRIPTION
    13. The description of the input file. [default = None]
    14. -dm MODEL_DESCRIPTION, --model_description MODEL_DESCRIPTION
    15. Description for this trained model.
    16. -p POSITIVE_NUM, --positive_num POSITIVE_NUM
    17. The minimum positive sample of MSI for training.
    18. [default = 10]
    19. -a AUTHOR, --author AUTHOR
    20. The author who trained the model. [default = None]
    21. -e EMAIL, --email EMAIL
    22. The email of the author. [default = None]

show

  • Function. Show the information of the model and add more details.

  • Parameters

    1. -h, --help show this help message and exit
    2. -m MODEL, --model MODEL
    3. The trained model path. [required]
    4. -t CANCER_TYPE, --cancer_type CANCER_TYPE
    5. Rename the cancer type. e.g. CRC, STAD, PanCancer etc.
    6. [default = None]
    7. -di INPUT_DESCRIPTION, --input_description INPUT_DESCRIPTION
    8. Add description for the input file. [default = None]
    9. -dm MODEL_DESCRIPTION, --model_description MODEL_DESCRIPTION
    10. Add description for this trained model. [default = None]
    11. -g GENE_LIST, --gene_list GENE_LIST
    12. The path for the genes must be included for this
    13. model. [default = None]

detection

  • Function. Microsatellite instability detection.

  • Parameters

    1. -h, --help show this help message and exit
    2. -i INPUT, --input INPUT
    3. The path of input file. [required]
    4. -o OUTPUT, --output OUTPUT
    5. The path of output file prefix. [required]
    6. -m MODEL, --model MODEL
    7. The path of the microsatellite regions. [required]
    8. -d RUN_DIRECTLY, --run_directly RUN_DIRECTLY
    9. Run the program directly without any Confirm. [default = False]

Input and output

  • The input file for informative genes selection and model training. (-i option in train command)

    You need to prepare your training file with a comma separated format (csv).
    The first columns should be sample id, the second columns should be msi status,
    and the third and other columns should be gene expression values. We recommend
    you provide a normalized expression values. (like z-score normalization with log2(FPKM+1) )

    The following is an example:

    | SampleID | msi | MLH1|LINC01006| …| NHLRC1|
    | —— | —— | —— | ——| —— | ——|
    | NA0001 | MSI-H | 0.209|1.209|…|0.393|
    | CA0002 | MSS |5.690|0.620|…|4.902|
    | … | … |…|…|…|…|
    | CA10 0 | MSS |9.960|0.920|…|5.002|

  • The trained model (-m option in train, show and detection command)

    The trained model is saved as pickle file. In train command, we recommend you add more
    description by -di,-dm,-a,-e, so that others who used this model are able to get more information.
    In show command, you can get the information of your model , and changed some descriptions by -di and -dm.
    you can also use -g option to output the genes list this model needed to a file.
    In detection command, you must check the model and input Yes or No to continue the predict step use -d True
    to ignore this reminder.

  • The input file for the detection command (-i option in detection command)

    You need to prepare your input file for MSI prediction with a comma separated format (csv).
    The first columns should be sample id, the second and other columns should be gene expression values.
    The genes name must contain the genes in the model (use -g option of show command to see the genes
    list of the model).
    The following is an example:

    | SampleID | MLH1|LINC01006| …| NHLRC1|
    | —— | —— | ——| —— | ——|
    | NA0001| 0.209|1.209|…|0.393|
    | CA0002 |5.690|0.620|…|4.902|
    | … |…|…|…|…|
    | CA100|9.960|0.920|…|5.002|


Attention

  • If you train your own model, the number of MSI-H samples must be greater than 6.

Contact

If you have any questions, please contact with Peng Jia (pengjia@stu.xjtu.edu.cn) or Kai Ye (kaiye@xjtu.edu.cn).