项目作者: stemangiola

项目描述 :
Seurat meets tidyverse. The best of both worlds.
高级语言: R
项目地址: git://github.com/stemangiola/tidyseurat.git
创建时间: 2020-08-23T06:25:33Z
项目社区:https://github.com/stemangiola/tidyseurat

开源协议:

下载


tidyseurat - part of tidytranscriptomics

Lifecycle:maturing
R build
status


Watch the video

Brings Seurat to the tidyverse!

website:
stemangiola.github.io/tidyseurat/

Please also have a look at


visual cue

Introduction

tidyseurat provides a bridge between the Seurat single-cell package
[@butler2018integrating; @stuart2019comprehensive] and the tidyverse
[@wickham2019welcome]. It creates an invisible layer that enables
viewing the Seurat object as a tidyverse tibble, and provides
Seurat-compatible dplyr, tidyr, ggplot and plotly functions.

Functions/utilities available

Seurat-compatible Functions Description
all
tidyverse Packages Description
dplyr All dplyr APIs like for any tibble
tidyr All tidyr APIs like for any tibble
ggplot2 ggplot like for any tibble
plotly plot_ly like for any tibble
Utilities Description
tidy Add tidyseurat invisible layer over a Seurat object
as_tibble Convert cell-wise information to a tbl_df
join_features Add feature-wise information, returns a tbl_df
aggregate_cells Aggregate cell gene-transcription abundance as pseudobulk tissue

Installation

From CRAN

  1. install.packages("tidyseurat")

From Github (development)

  1. devtools::install_github("stemangiola/tidyseurat")
  1. library(dplyr)
  2. library(tidyr)
  3. library(purrr)
  4. library(magrittr)
  5. library(ggplot2)
  6. library(Seurat)
  7. library(tidyseurat)

Create tidyseurat, the best of both worlds!

This is a seurat object but it is evaluated as tibble. So it is fully
compatible both with Seurat and tidyverse APIs.

  1. pbmc_small = SeuratObject::pbmc_small

It looks like a tibble

  1. pbmc_small
  1. ## # A Seurat-tibble abstraction: 80 × 15
  2. ## # [90mFeatures=230 | Cells=80 | Active assay=RNA | Assays=RNA[0m
  3. ## .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
  4. ## <chr> <fct> <dbl> <int> <fct> <fct> <chr>
  5. ## 1 ATGC… SeuratPro… 70 47 0 A g2
  6. ## 2 CATG… SeuratPro… 85 52 0 A g1
  7. ## 3 GAAC… SeuratPro… 87 50 1 B g2
  8. ## 4 TGAC… SeuratPro… 127 56 0 A g2
  9. ## 5 AGTC… SeuratPro… 173 53 0 A g2
  10. ## 6 TCTG… SeuratPro… 70 48 0 A g1
  11. ## 7 TGGT… SeuratPro… 64 36 0 A g1
  12. ## 8 GCAG… SeuratPro… 72 45 0 A g1
  13. ## 9 GATA… SeuratPro… 52 36 0 A g1
  14. ## 10 AATG… SeuratPro… 100 41 0 A g1
  15. ## # ℹ 70 more rows
  16. ## # ℹ 8 more variables: RNA_snn_res.1 <fct>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>,
  17. ## # PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

But it is a Seurat object after all

  1. pbmc_small@assays
  1. ## $RNA
  2. ## Assay data with 230 features for 80 cells
  3. ## Top 10 variable features:
  4. ## PPBP, IGLL5, VDAC3, CD1C, AKR1C3, PF4, MYL9, GNLY, TREML1, CA2

Preliminary plots

Set colours and theme for plots.

  1. # Use colourblind-friendly colours
  2. friendly_cols <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", "#44AA99", "#999933", "#882255", "#661100", "#6699CC")
  3. # Set theme
  4. my_theme <-
  5. list(
  6. scale_fill_manual(values = friendly_cols),
  7. scale_color_manual(values = friendly_cols),
  8. theme_bw() +
  9. theme(
  10. panel.border = element_blank(),
  11. axis.line = element_line(),
  12. panel.grid.major = element_line(size = 0.2),
  13. panel.grid.minor = element_line(size = 0.1),
  14. text = element_text(size = 12),
  15. legend.position = "bottom",
  16. aspect.ratio = 1,
  17. strip.background = element_blank(),
  18. axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
  19. axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))
  20. )
  21. )

We can treat pbmc_small effectively as a normal tibble for plotting.

Here we plot number of features per cell.

  1. pbmc_small %>%
  2. ggplot(aes(nFeature_RNA, fill = groups)) +
  3. geom_histogram() +
  4. my_theme

Here we plot total features per cell.

  1. pbmc_small %>%
  2. ggplot(aes(groups, nCount_RNA, fill = groups)) +
  3. geom_boxplot(outlier.shape = NA) +
  4. geom_jitter(width = 0.1) +
  5. my_theme

Here we plot abundance of two features for each group.

  1. pbmc_small %>%
  2. join_features(features = c("HLA-DRA", "LYZ")) %>%
  3. ggplot(aes(groups, .abundance_RNA + 1, fill = groups)) +
  4. geom_boxplot(outlier.shape = NA) +
  5. geom_jitter(aes(size = nCount_RNA), alpha = 0.5, width = 0.2) +
  6. scale_y_log10() +
  7. my_theme

Preprocess the dataset

Also you can treat the object as Seurat object and proceed with data
processing.

  1. pbmc_small_pca <-
  2. pbmc_small %>%
  3. SCTransform(verbose = FALSE) %>%
  4. FindVariableFeatures(verbose = FALSE) %>%
  5. RunPCA(verbose = FALSE)
  6. pbmc_small_pca
  1. ## # A Seurat-tibble abstraction: 80 × 17
  2. ## # [90mFeatures=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT[0m
  3. ## .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
  4. ## <chr> <fct> <dbl> <int> <fct> <fct> <chr>
  5. ## 1 ATGC… SeuratPro… 70 47 0 A g2
  6. ## 2 CATG… SeuratPro… 85 52 0 A g1
  7. ## 3 GAAC… SeuratPro… 87 50 1 B g2
  8. ## 4 TGAC… SeuratPro… 127 56 0 A g2
  9. ## 5 AGTC… SeuratPro… 173 53 0 A g2
  10. ## 6 TCTG… SeuratPro… 70 48 0 A g1
  11. ## 7 TGGT… SeuratPro… 64 36 0 A g1
  12. ## 8 GCAG… SeuratPro… 72 45 0 A g1
  13. ## 9 GATA… SeuratPro… 52 36 0 A g1
  14. ## 10 AATG… SeuratPro… 100 41 0 A g1
  15. ## # ℹ 70 more rows
  16. ## # ℹ 10 more variables: RNA_snn_res.1 <fct>, nCount_SCT <dbl>,
  17. ## # nFeature_SCT <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
  18. ## # PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

If a tool is not included in the tidyseurat collection, we can use
as_tibble to permanently convert tidyseurat into tibble.

  1. pbmc_small_pca %>%
  2. as_tibble() %>%
  3. select(contains("PC"), everything()) %>%
  4. GGally::ggpairs(columns = 1:5, ggplot2::aes(colour = groups)) +
  5. my_theme

Identify clusters

We proceed with cluster identification with Seurat.

  1. pbmc_small_cluster <-
  2. pbmc_small_pca %>%
  3. FindNeighbors(verbose = FALSE) %>%
  4. FindClusters(method = "igraph", verbose = FALSE)
  5. pbmc_small_cluster
  1. ## # A Seurat-tibble abstraction: 80 × 19
  2. ## # [90mFeatures=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT[0m
  3. ## .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
  4. ## <chr> <fct> <dbl> <int> <fct> <fct> <chr>
  5. ## 1 ATGC… SeuratPro… 70 47 0 A g2
  6. ## 2 CATG… SeuratPro… 85 52 0 A g1
  7. ## 3 GAAC… SeuratPro… 87 50 1 B g2
  8. ## 4 TGAC… SeuratPro… 127 56 0 A g2
  9. ## 5 AGTC… SeuratPro… 173 53 0 A g2
  10. ## 6 TCTG… SeuratPro… 70 48 0 A g1
  11. ## 7 TGGT… SeuratPro… 64 36 0 A g1
  12. ## 8 GCAG… SeuratPro… 72 45 0 A g1
  13. ## 9 GATA… SeuratPro… 52 36 0 A g1
  14. ## 10 AATG… SeuratPro… 100 41 0 A g1
  15. ## # ℹ 70 more rows
  16. ## # ℹ 12 more variables: RNA_snn_res.1 <fct>, nCount_SCT <dbl>,
  17. ## # nFeature_SCT <int>, SCT_snn_res.0.8 <fct>, seurat_clusters <fct>,
  18. ## # PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>,
  19. ## # tSNE_2 <dbl>

Now we can interrogate the object as if it was a regular tibble data
frame.

  1. pbmc_small_cluster %>%
  2. count(groups, seurat_clusters)
  1. ## # A tibble: 6 × 3
  2. ## groups seurat_clusters n
  3. ## <chr> <fct> <int>
  4. ## 1 g1 0 23
  5. ## 2 g1 1 17
  6. ## 3 g1 2 4
  7. ## 4 g2 0 17
  8. ## 5 g2 1 13
  9. ## 6 g2 2 6

We can identify cluster markers using Seurat.

  1. # Identify top 10 markers per cluster
  2. markers <-
  3. pbmc_small_cluster %>%
  4. FindAllMarkers(only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25) %>%
  5. group_by(cluster) %>%
  6. top_n(10, avg_log2FC)
  7. # Plot heatmap
  8. pbmc_small_cluster %>%
  9. DoHeatmap(
  10. features = markers$gene,
  11. group.colors = friendly_cols
  12. )

Reduce dimensions

We can calculate the first 3 UMAP dimensions using the Seurat framework.

  1. pbmc_small_UMAP <-
  2. pbmc_small_cluster %>%
  3. RunUMAP(reduction = "pca", dims = 1:15, n.components = 3L)

And we can plot them using 3D plot using plotly.

  1. pbmc_small_UMAP %>%
  2. plot_ly(
  3. x = ~`UMAP_1`,
  4. y = ~`UMAP_2`,
  5. z = ~`UMAP_3`,
  6. color = ~seurat_clusters,
  7. colors = friendly_cols[1:4]
  8. )

screenshot plotly

Cell type prediction

We can infer cell type identities using SingleR [@aran2019reference]
and manipulate the output using tidyverse.

  1. # Get cell type reference data
  2. blueprint <- celldex::BlueprintEncodeData()
  3. # Infer cell identities
  4. cell_type_df <-
  5. GetAssayData(pbmc_small_UMAP, slot = 'counts', assay = "SCT") %>%
  6. log1p() %>%
  7. Matrix::Matrix(sparse = TRUE) %>%
  8. SingleR::SingleR(
  9. ref = blueprint,
  10. labels = blueprint$label.main,
  11. method = "single"
  12. ) %>%
  13. as.data.frame() %>%
  14. as_tibble(rownames = "cell") %>%
  15. select(cell, first.labels)
  1. # Join UMAP and cell type info
  2. pbmc_small_cell_type <-
  3. pbmc_small_UMAP %>%
  4. left_join(cell_type_df, by = "cell")
  5. # Reorder columns
  6. pbmc_small_cell_type %>%
  7. select(cell, first.labels, everything())

We can easily summarise the results. For example, we can see how cell
type classification overlaps with cluster classification.

  1. pbmc_small_cell_type %>%
  2. count(seurat_clusters, first.labels)

We can easily reshape the data for building information-rich faceted
plots.

  1. pbmc_small_cell_type %>%
  2. # Reshape and add classifier column
  3. pivot_longer(
  4. cols = c(seurat_clusters, first.labels),
  5. names_to = "classifier", values_to = "label"
  6. ) %>%
  7. # UMAP plots for cell type and cluster
  8. ggplot(aes(UMAP_1, UMAP_2, color = label)) +
  9. geom_point() +
  10. facet_wrap(~classifier) +
  11. my_theme

We can easily plot gene correlation per cell category, adding
multi-layer annotations.

  1. pbmc_small_cell_type %>%
  2. # Add some mitochondrial abundance values
  3. mutate(mitochondrial = rnorm(n())) %>%
  4. # Plot correlation
  5. join_features(features = c("CST3", "LYZ"), shape = "wide") %>%
  6. ggplot(aes(CST3 + 1, LYZ + 1, color = groups, size = mitochondrial)) +
  7. geom_point() +
  8. facet_wrap(~first.labels, scales = "free") +
  9. scale_x_log10() +
  10. scale_y_log10() +
  11. my_theme

Nested analyses

A powerful tool we can use with tidyseurat is nest. We can easily
perform independent analyses on subsets of the dataset. First we
classify cell types in lymphoid and myeloid; then, nest based on the new
classification

  1. pbmc_small_nested <-
  2. pbmc_small_cell_type %>%
  3. filter(first.labels != "Erythrocytes") %>%
  4. mutate(cell_class = if_else(`first.labels` %in% c("Macrophages", "Monocytes"), "myeloid", "lymphoid")) %>%
  5. nest(data = -cell_class)
  6. pbmc_small_nested

Now we can independently for the lymphoid and myeloid subsets (i) find
variable features, (ii) reduce dimensions, and (iii) cluster using both
tidyverse and Seurat seamlessly.

  1. pbmc_small_nested_reanalysed <-
  2. pbmc_small_nested %>%
  3. mutate(data = map(
  4. data, ~ .x %>%
  5. FindVariableFeatures(verbose = FALSE) %>%
  6. RunPCA(npcs = 10, verbose = FALSE) %>%
  7. FindNeighbors(verbose = FALSE) %>%
  8. FindClusters(method = "igraph", verbose = FALSE) %>%
  9. RunUMAP(reduction = "pca", dims = 1:10, n.components = 3L, verbose = FALSE)
  10. ))
  11. pbmc_small_nested_reanalysed

Now we can unnest and plot the new classification.

  1. pbmc_small_nested_reanalysed %>%
  2. # Convert to tibble otherwise Seurat drops reduced dimensions when unifying data sets.
  3. mutate(data = map(data, ~ .x %>% as_tibble())) %>%
  4. unnest(data) %>%
  5. # Define unique clusters
  6. unite("cluster", c(cell_class, seurat_clusters), remove = FALSE) %>%
  7. # Plotting
  8. ggplot(aes(UMAP_1, UMAP_2, color = cluster)) +
  9. geom_point() +
  10. facet_wrap(~cell_class) +
  11. my_theme

Aggregating cells

Sometimes, it is necessary to aggregate the gene-transcript abundance
from a group of cells into a single value. For example, when comparing
groups of cells across different samples with fixed-effect models.

In tidyseurat, cell aggregation can be achieved using the
aggregate_cells function.

  1. pbmc_small %>%
  2. aggregate_cells(groups, assays = "RNA")