项目作者: lasersonlab

项目描述 :
Index and search single cell data.
高级语言: Java
项目地址: git://github.com/lasersonlab/scsearch.git
创建时间: 2018-09-19T10:42:24Z
项目社区:https://github.com/lasersonlab/scsearch

开源协议:

下载


scsearch

Index and search single cell data.

Installation

  1. mvn install
  2. alias scsearch='java -jar target/scsearch-0.0.1-SNAPSHOT-jar-with-dependencies.jar'

Install Elasticsearch locally.

You’ll need some 10x data too.

Run

First create an index. This takes 15 minutes or so (for around 1 million cells).
(Note that we only create 256 out of 320 shards to avoid an as-yet-unresolved int overflow error in netcdf.)

  1. scsearch -o index \
  2. --index 10x \
  3. --file files/1M_neurons_filtered_gene_bc_matrices_h5.h5 \
  4. --total-shards 320 \
  5. --num-shards 256

Now we can do a search to find cells that have non-zero expression levels for all genes in
a query set.
(Note that we need to supply the 10x file since it used as a source of
all the gene names.)

  1. scsearch --file files/1M_neurons_filtered_gene_bc_matrices_h5.h5 ENSMUSG00000050708
  2. Search index 10x for genes [ENSMUSG00000050708]
  3. Matching cells: 1046148
  4. barcode=GGAATAACACCTCGTT-3
  5. barcode=GGAATAAGTTTGACTG-3
  6. barcode=GGAATAATCTTCATGT-3
  7. barcode=GGACAAGAGATATACG-3
  8. barcode=GGACAAGAGTCGAGTG-3
  9. barcode=GGACAAGCAACACCCG-3
  10. barcode=GGACAAGCAAGCTGTT-3
  11. barcode=GGACAAGCAGCTGTTA-3
  12. barcode=GGACAAGTCAACTCTT-3
  13. barcode=GGACAGACACTATCTT-3
  14. ...
  15. scsearch --file files/1M_neurons_filtered_gene_bc_matrices_h5.h5 ENSMUSG00000095742
  16. Search index 10x for genes [ENSMUSG00000095742]
  17. Matching cells: 592
  18. scsearch --file files/1M_neurons_filtered_gene_bc_matrices_h5.h5 ENSMUSG00000050708 ENSMUSG00000095742
  19. Search index 10x for genes [ENSMUSG00000050708, ENSMUSG00000095742]
  20. Matching cells: 591

Delete an index with the following command:

  1. scsearch -o delete --index 10x