deconverse: Deconvolution using scRNA-seq references

Clarice S Groeneveld

2023-10-26

What is deconvolution?

The goal of deconvolution is to predict the makeup of a mixture in terms of its components and their fractions. i.e. if:

  • Mixture: bulk RNA-seq profiles (bulk, spot on a 10X Visium)

  • Components: cell type profiles (C)

then \(mixture = \sum_{i=1}^{n} C_iw_i\) subject to: \(\sum_{i=1}^n w_i = 1\) and \(w_i \geq 0\)

Deconvolution graphical summary

Why is it useful?

Goal: define the cell types and their proportions present in a sample

  • Identify and understand tissue heterogeneity

  • Associate cell types with clinical variables (e.g. survival, response to therapy)

  • Apply to downstream analysis: e.g. cell-to-cell communication

Challenges of deconvolution

If \(mixture = \sum_{i=1}^{n} C_iw_i\), subject to: \(\sum_{i=1}^n w_i = 1\) and \(w_i \geq 0\)

  • For the user:
    • What are the cell types present and how many are there?
    • What is a good reference for my mixture?
  • For the method:
    • Assumption: all cell types that could be present are represented in the reference
    • How do I identify the cell type profiles (C)? In what space?
    • How do I measure if I have a good fit?

TME deconvolution: first generation

First generation deconvolution methods generally used FACS-sorted gene expression as cell type profiles, often from PBMCs. References are almost always pre-computed.

TME deconvolution: examples of first generation methods

  • CIBERSORT (support vector regression)

  • MCPcounter (marker mean expression)

  • xCell (corrected ssgsea)

  • EPIC and quanTiSeq (constrained least square minimization)

Some methods don’t do deconvolution per se (don’t return proportions): inter-sample comparisons only

Methods don’t assume a complete reference: only deconvolute cell types of the TME

Cell type deconvolution: second-generation

(User-provided) single-cell reference of the same context as the sample to be deconvoluted

The single cell reference: atlases

  • Tabula Muris

  • Tabula Sapiens

  • Human Lung Atlas

  • Multiple published single-cell atlases of different tissues or pathologies

    . . .

Major problem: cell-type annotation

Example: Colon Atlas (Pelka et al., 2021)

Coarse-grained annotation

Fine-grained annotation of compartments

Cell type resolution: can we separate them?

  • Deconvolution methods are often robust when using coarse-grained annotation

  • Deconvolution often fails at separating cell types defined by ‘state’ (e.g. T CD4+/CD8+, B-naive from B-mature)

  • What is the appropriate “level” of annotation that allows for deconvolution?

deconverse: a meta-method package with benchmarking built-in

Deconvolution methods in deconverse

deconvolution_methods()
         OLS         DWLS          SVR   CIBERSORTx        MuSiC   BayesPrism 
       "ols"       "dwls"        "svr" "cibersortx"      "music" "bayesprism" 
      Bisque    AutoGeneS       scaden         CARD         RCTD    SPOTlight 
    "bisque"  "autogenes"     "scaden"       "card"       "rctd"  "spotlight" 
spatial_only_methods()
       CARD        RCTD   SPOTlight 
     "card"      "rctd" "spotlight" 

deconverse ideas

Scientific

  • Support for multiple levels of annotation at the same time

  • Correction of finer grained annotation by coarser-grained

  • Aid users in detecting what level of annotation is appropriate through benchmarking

Technical

  • Any method: same syntax

  • Run multiple methods with one command

  • A general framework: adding new methods is easy

deconverse syntax: screference

Single-cell (hierarchycal) reference

pbmc_ref <- new_hscreference(pbmc_train,
                annot_ids = c("Cell_major_identities", "Cell_minor_identities"),
                project_name = "pbmc_example",
                batch_id = "orig.ident")
pbmc_ref <- pbmc_ref |>
    compute_reference("dwls") |>
    compute_reference("autogenes")

deconverse syntax: deconvolute

deconv_res <- deconvolute_all(gexp, pbmc_ref,
                              methods = c("dwls", "ols", "svr"))

deconverse syntax: scbench

pbmc_bench <- new_scbench(pbmc_test, 
                         annot_ids = c("Cell_major_identities",
                                       "Cell_minor_identities"),
                         project_name = "pbmc_example",
                         batch_id = "orig.ident")

Generate “mixtures” for each benchmarking test (bounds can be given)

pbmc_bench <- pbmc_bench |>
    mixtures_population(nsamp = 500) |>
    mixtures_lod() |>
    mixtures_spillover()

deconverse syntax: scbench

Creates pseudobulk samples from the single-cell profiles in pbmc_test

pbmc_bench <- pseudobulks(pbmc_bench, ncells = 1000)
pbmc_bench <- deconvolute_all(pbmc_bench, pbmc_ref,
                              methods = c("dwls", "svr", "ols", 
                                          "autogenes", "bisque"))

deconverse benchmarking results: population

plt_cors_scatter(pbmc_bench, method = "dwls")

deconverse benchmarking results: compare between populations

plt_cor_heatmap(pbmc_bench, level = "l2")$heatmap

deconverse benchmarking results: spillover

plt_spillover_heatmap(pbmc_bench)$heatmap

deconverse benchmarking results: limit of detection

plt_lod_heatmap(pbmc_bench)$heatmap

Some details: Deconvolution methods in deconverse

  • Ordinary Least Squares (OLS), Support Vector (SVR) and Dampened Weighted Least Squares Regressions use the same reference cell marker matrix from Seurat::findMarkers

  • CIBERSORTx runs in a docker

  • MuSiC and DWLS were reimplemented for performance

  • (Python) Reticulate methods: AutoGeneS and scaden

Spatial deconvolution methods to be added to deconverse 0.3

Same syntax, any method:

scref <- new_screference(kidney_so,
                                annot_id = c("compartment"),
                                project_name = "kidney",
                                batch_id = "donor")
scref <- compute_reference(scref, method = "card")
spatial_obj <- deconvolute(spatial_obj, scref, method = "rctd")

Example of spatial deconvolution results

SpatialFeaturePlot(spatial_obj,
                   features = deconverse_results(spatial_obj, method = "rctd")[[1]],
                   pt.size.factor = 1.3)

Example of spatial deconvolution results

SpatialDimPlot(spatial_obj, 
               group.by = deconverse_results(spatial_obj, method = "rctd", major_population = TRUE)[1], 
               pt.size.factor = 1.3)

New methodology for spatial deconvolution?

Not all current “spatial specific” methods use spatial information on deconvolution:

  • Use spatial information: CARD, cell2location

  • Don’t use: SPOTlight, RCTD, DestVI

Graph-based? e.g. graph-NMF followed by NNLS

Thanks!

Email 📧 clarice.groeneveld@inserm.fr

Github 😺 csgroen

BlueSky (bye-bye X) 🟦 csgroen

Try Deconverse: github.com/csgroen/deconverse

Presentation available at: csgroen.github.io/posts/deconverse_bioinfoclub