In contrast to point pattern based methods, we can view the location of cells or spots as a fixed lattice and measure the corresponding marker expression at each location. HTS-based spatial transcriptomic technologies often produce data on a regular lattice (i.e., approximately evenly spaced spots or beads of uniform size and shape), whereas imaging-based technologies yield irregular lattice structures (i.e., with variable cell sizes and shapes, and non-uniform spacing).
import numpy as npimport scanpy as scimport squidpy as sqfrom esda.moran import Moran, Moran_Local, Moran_Local_BVfrom esda.join_counts import Join_Countsfrom libpysal.weights import Wimport pandas as pdimport matplotlib.pyplot as pltfrom matplotlib.colors import ListedColormap, LinearSegmentedColormapimport warningsfrom scipy.stats import false_discovery_controlwarnings.filterwarnings("ignore")df_cmap_continuous = pd.read_csv("../misc/roma_colors.csv", index_col=0)cmap_continuous = LinearSegmentedColormap.from_list("roma", list(df_cmap_continuous["roma_colors"])).reversed()
Setup and Preprocessing
We will load a dataset by (McKellar et al. 2021) using the Visium technology (Ståhl et al. 2016). The data shows a sample taken from the tibialis anterior muscle of a mouse.
# Load the datasetsfe <- SFEData::McKellarMuscleData(dataset ="full")# Take spots that are covered with tissuesfe_tissue <- sfe[, colData(sfe)$in_tissue]# Filter out genes with no countssfe_tissue <- sfe_tissue[rowSums(counts(sfe_tissue)) >0, ]# Convert counts log-transformed normalized expression valuessfe_tissue <- scater::logNormCounts(sfe_tissue)ann <- zellkonverter::SCE2AnnData(sfe_tissue)# Define the directory and file pathsdir_path <-"../data"# Check if the directory exists, and create it if it doesn'tif (!dir.exists(dir_path)) {dir.create(dir_path)}anndata::write_h5ad(ann, "../data/McKellarMuscleData_processed.h5ad")features <-c("Myh2", "Calr")
Show the code
adata = sc.read_h5ad("../data/McKellarMuscleData_processed.h5ad")adata.obsm['spatial'] = np.array(list(zip((adata.obs['x'] *-1), adata.obs['y'])))adata.var_names = adata.var["symbol"]adata.raw = adata.copy()adata.X = adata.layers["logcounts"].astype(np.float64) # necessary for Local Moran to work
Lattice data refers to spatial data collected at locations arranged in a regular or irregular grid (lattice). Each location has a defined spatial unit, and the sampling locations are fixed rather than random. This approach contrasts with point pattern analysis, where we assume that the locations were generated by a stochastic process (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Baddeley, Rubak, and Turner 2015).
As can be seen below, the Visium dataset is a regular lattice. The color shows the number of counts detected in each Visium spot. In contrast, the outlines of individual cells after segmentation (e.g., from a higher resolution spatial omics assay) could be seen as an irregular lattice. As this dataset also contains (manual) segmentations of myofibers (muscle cells stored as myofiber_simplified), we will illustrate calculations based on irregular lattices of the myofiber segmentations.
# A plot using the plotSpatialFeature from Voyagerp <-plotSpatialFeature(sfe_tissue,"nCounts",annotGeometryName ="myofiber_simplified")# This extracts the segmented cellscells <-annotGeometry(sfe_tissue, "myofiber_simplified") |>st_geometry()# We can also use ggplot and geom_sf to plot sf objectsq <-ggplot() +geom_sf(data = cells, fill =NA) +theme_void()# Using `patchwork` to combine the plotsp | q
In lattice data analysis, a key concept is the spatial weight matrix, which models the spatial relationships between units in the lattice (i.e., spots or cells). Various methods exist for constructing this matrix, such as contiguity-based (direct neighbors), graph-based, or distance-based methods. (Getis 2009; Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023). The documentation of the package spdep gives an overview of the different methods.
For Visium, the most straightforward way is to take the direct neighbours of each spot. This is done using the function findVisiumGraph.
In an irregular lattice, the task of finding a spatial weight matrix is more complex, as different options exist. One option is to base the neighbourhood graph on neighbours that are in direct contact with each other (contiguous), as implemented in the poly2nb method.
Show the code
annotGraph(sfe_tissue, "myofiber_poly2nb") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3,method ="poly2nb", # wraps spdep function with same namezero.policy =TRUE )
Alternatively, we could take the five nearest neighbours of each cell.
Show the code
annotGraph(sfe_tissue, "knn5") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3, # to use the annotation geometrymethod ="knearneigh", # wraps the spdep function with the same namek =5,zero.policy =TRUE )
As we can see below, the graphs look quite distinct. On the left side, in the contiguous neigbhbour graph (neighbours in direct contact), we notice the formation of patches, while in the \(k\)NN graph isolated patches are interconnected.
Show the code
p1 + p2
Spatial autocorrelation
Spatial autocorrelation measures the association between spatial units while considering that these spatial units are not independent measurements. These measures can be global (summarizing the entire field) or local (providing statistics for individual locations) (Pebesma and Bivand 2023).
Global Measures
Global methods give us an overview over the entire field-of-view and summarize the spatial autocorrelation metric to a single value. The metrics are a function of the weight matrix and the variables of interest. The variables of interest can be e.g. gene expression, intensity of a marker or the area of a cell. The global measures can be seen as a weighted average of the local metric, as explained below.
In general, a global spatial autocorrelation measure has the form of a double sum over all locations \(i,j\)
\[\sum_i \sum_j f(x_i,x_j) w_{ij}\]
where \(f(x_i,x_j)\) is the measure of association between features of interest and \(w_{ij}\) scales the relationship by a spatial weight as defined in the weight matrix \(W\)(Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023).
Moran’s \(I\)
Moran’s \(I\) is the most prominent measure of spatial autocorrelation. The values are bounded by \(-1\) and \(1\). The expected value is close to \(0\) for large \(n\), the exact value is given by \(\mathbb{E}(I) = -1/(n-1)\). A value higher than the expected value indicates spatial autocorrelation. Negative values indicate negative autocorrelation. Spatial autocorrelation means that similar values tend to be found together in the tissue. In fact, Moran’s \(I\) can be interpreted as the Pearson correlation between the value at location \(i\) and the averages value of the neigbours of \(i\), (neighbours as defined in the weight matrix \(W\)) (Moran 1950; Cliff and Ord 1981, 21).
In the first example we will calculate Moran’s \(I\) for the number of counts and genes measured in the Visium dataset. First, we have a look at the distribution by eye.
As we can see, the Moran’s \(I\) values both indicate significant positive spatial autocorrelation. Myh2 shows a stronger correlation according to Moran’s \(I\) than Calr.
Local Measures for Univariate Data
Often, a global measure is not enough. One number determining e.g. the spatial autocorrelation over an entire tissue slice might not be reflective of tissue heterogeneity. Therefore, local indicators of spatial associations have been developed (Pebesma and Bivand 2023; Anselin 1995).
Local Moran’s \(I\)
Local Moran’s \(I\) provides a measure of spatial autocorrelation at each location, highlighting local clusters of similarity or dissimilarity (Anselin 1995; Pebesma and Bivand 2023). It is defined as:
where the index \(i\) refers to the location for which the measure is calculated. The interpretation is analogous to global Moran’s \(I\) where a value of \(I_i\) higher than \(\mathbb{E}(I_i) = -w_i/(n-1)\) indicates spatial autocorrelation; smaller values indicate negative autocorrelation (Anselin 1995). It is important to note that, as for the global counterpart, the value of local Moran’s \(I\) could be a result from both the high or low end of the values. Here we will calculate the local Moran’s \(I\) value for the measurement of muscle fiber marker gene Myh2:
In the local version of Moran’s \(I\), the interpretation is the same as the global version. When interpreting local autocorrelation measures, it is important to consider both the effect size estimates and the significance level. Since the significance level is calculated for each spot separately, it is recommended to adjust for multiple testing. By default, runUnivariate uses the Benjamini & Hochberg correction (Benjamini and Hochberg 1995). The local Moran’s \(I\) statistics reveal locations in the tissue that have similar values to their neighbours (c.f., the lower part of the tissue).
Multivariate measures – Bivariate Moran’s \(I\)
There exists a bivariate version of Moran’s \(I\) as well.
For two continuous observations the global bivariate Moran’s \(I\) is defined as (Wartenberg 1985; Bivand 2022)
where \(a_i\) and \(b_i\) are the two variables of interest and \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\). It is a measure of the correlation of the variables \(a\) with the average of the neighboring values for variable \(b\) (also called the spatial lag of \(b\)).
There exists a local version that we will explore here. Note that the results are not symmetric, but very similar to each other.
sfe_tissue <-runBivariate(sfe_tissue, "localmoran_bv", # wraps the method from spdepc("Myh1", "Myh2"),swap_rownames ="symbol", nsim =100)# this command gives all results of localmoran_bvlocalResultFeatures(sfe_tissue, "localmoran_bv")
Please note that the result might overestimate the spatial autocorrelation of the variables due to the inherent (non-spatial) correlation of \(x\) and \(y\)(Bivand 2022).
The package spatialDM(Li et al. 2023) uses an adapted version of bivariate Moran’s \(I\) to identify ligand-receptor pairs.
Impact of neighbourhood on autocorrelation measures
Let’s compare the impact of different spatial weight matrices on local autocorrelation analysis. In this Visium dataset we do not have gene expression information for each cell because of the resolution of the spots. We will instead calculate the autocorrelation measures on the area of the segmented cells. This could be helpful when looking at local cell densities in a tissue.
First, we base the spatial weight matrix on contiguous neighbours.
As an alternative, we also base the spatial weight matrix on the 5 and 10 nearest neighbours.
Show the code
sfe_tissue <-annotGeometryUnivariate(sfe_tissue, "localmoran", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn5",include_self =FALSE, zero.policy =TRUE,name ="knn5")pKnn5 <-plotLocalResult(sfe_tissue, "knn5", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn5",divergent =TRUE, diverge_center =0) +labs(title ="Local Moran's I (li)\n5 Nearest Neighbours", fill ="li(area)")annotGraph(sfe_tissue, "knn10") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3, # to use the annotation geometrymethod ="knearneigh", # wraps the spdep function with the same namek =10,zero.policy =TRUE )sfe_tissue <-annotGeometryUnivariate(sfe_tissue, "localmoran", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn10",include_self =FALSE, zero.policy =TRUE,name ="knn10")pKnn10 <-plotLocalResult(sfe_tissue, "knn10", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn10",divergent =TRUE, diverge_center =0) +labs(title ="Local Moran's I (li)\n10 Nearest Neighbours", fill ="li(area)")pPoly + pKnn5 + pKnn10
The overall pattern of spatial autocorrelation is similar across the different weight matrices. However, locally there are some differences, as well as in the scale of the autocorrelation (note that the same color in the different plots does not correspond to the same value between the plots). There is a trend of smoothing of the value, the larger the neighbourhood in consideration.
Measures for categorical data
There also exist methods to detect spatial autocorrelation of categorical variables. As we have no categorical measurements in this dataset, we perform non-spatial clustering using Leiden clustering on the PCA space (Traag, Waltman, and Van Eck 2019).
library(BiocNeighbors)library(BiocSingular)set.seed(123)# Run PCA on the samplesfe <-runPCA(sfe_tissue, exprs_values ="logcounts", ncomponents =50, BSPARAM =IrlbaParam())# Cluster based on first 20 PC's and using leidencolData(sfe_tissue)$cluster <-clusterRows(reducedDim(sfe, "PCA")[,1:10],BLUSPARAM =KNNGraphParam(k =20,BNPARAM=AnnoyParam(ntrees=50),cluster.fun ="leiden",cluster.args =list(resolution =0.3,objective_function ="modularity")))
Show the code
np.random.seed(123)#compute a PCA on the sc.pp.pca(adata, n_comps =50, zero_center =True, svd_solver ="arpack")#compute the neighbourssc.pp.neighbors(adata, use_rep ="X_pca", knn =True, n_pcs =10)#compute leiden clusteringsc.tl.leiden(adata, resolution =0.3, flavor ="igraph", objective_function ="modularity")
We can visually inspect the spatial arrangement of the clusters. We see that they are spatially “clustered”.
Metrics such as the join count statistics can be use to analysis spatial arrangements of categorical data. In essence, the join count statistic calculates the frequency of categories among neighbours and compares this value with a theoretical distribution or permutations of the labels to get a significance score (Cliff and Ord 1981).
Here we will use an extension to multiple categories joincount.multi. The test gives information if the different clusters are in contact with itself and each other. The output shows the calculated number of contacts vs. the expected number, i.e., if the clusters were randomly assigned to the locations. The function does not report \(p\)-values, but they can be calculated at a specified significance level using the \(z\)-values.
For example, we can see here that the calculated number of contacts of clusters \(1\) and \(3\) is far below the expected value. On the other hand all clusters are in contact with itself more often than expected at random.
Therefore, in Python we add the lower triangular matrix to the upper triangle (without the diagonal) and divide the resulting interaction matrix by 2 as the interactions in R are undirected.
Summary and Considerations
Lattice data refers to spatial data collected at fixed locations arranged in regular or irregular grids, contrasting with stochastic point pattern analysis.
Regular lattices have uniform, evenly spaced spots, while irregular lattices have variable sizes and shapes with non-uniform spacing.
The spatial weight matrix models spatial relationships between lattice units. Different methods exist to define the spatial weight matrix.
Global spatial autocorrelation measures, like Moran’s \(I\), summarize spatial correlation over the entire field, while local measures identify local clusters of similarity or dissimilarity.
There exist measures of spatial autocorrelation for continuous and categorical data.
Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns. 1st ed. CRC Interdisciplinary Statistics Series. CRC Press, Taylor & Francis Group.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.”Journal of the Royal Statistical Society: Series B (Methodological) 57 (1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Bivand, Roger. 2022. “R Packages for Analyzing Spatial Data: A Comparative Case Study with Areal Data.”Geographical Analysis 54 (3): 488–518. https://doi.org/10.1111/gean.12319.
Cliff, Andrew David, and J Keith Ord. 1981. Spatial Processes: Models & Applications. Pion, London.
Li, Zhuoxuan, Tianjie Wang, Pentao Liu, and Yuanhua Huang. 2023. “SpatialDM for Rapid Identification of Spatially Co-Expressed Ligand–Receptor and Revealing Cell–Cell Communication Patterns.”Nature Communications 14 (1): 3995. https://doi.org/10.1038/s41467-023-39608-w.
McKellar, David W., Lauren D. Walter, Leo T. Song, Madhav Mantri, Michael F. Z. Wang, Iwijn De Vlaminck, and Benjamin D. Cosgrove. 2021. “Large-Scale Integration of Single-Cell Transcriptomic Data Captures Transitional Progenitor States in Mouse Skeletal Muscle Regeneration.”Communications Biology 4 (1): 1–12. https://doi.org/10.1038/s42003-021-02810-x.
Moran, P. A. P. 1950. “Notes on Continuous Stochastic Phenomena.”Biometrika 37 (1/2): 17–23. https://doi.org/10.2307/2332142.
Moses, Lambda, Pétur Helgi Einarsson, Kayla C. Jackson, Laura Luebbert, Ali Sina Booeshaghi, Sindri Emmanúel Antonsson, Nicolas Bray, Páll Melsted, and Lior Pachter. 2023. “Voyager: Exploratory Single-Cell Genomics Data Analysis with Geospatial Statistics.” bioRxiv. https://doi.org/10.1101/2023.07.20.549945.
Palla, Giovanni, Hannah Spitzer, Michal Klein, David Fischer, Anna Christina Schaar, Louis Benedikt Kuemmerle, Sergei Rybakov, et al. 2022. “Squidpy: A Scalable Framework for Spatial Omics Analysis.”Nature Methods 19 (2): 171–78. https://doi.org/10.1038/s41592-021-01358-2.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. 1st ed. New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
Rey, Sergio J., and Luc Anselin. 2010. “PySAL: A Python Library of Spatial Analytical Methods.” In Handbook of Applied Spatial Analysis, 175–93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03647-7_11.
Righelli, Dario, Lukas M Weber, Helena L Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron T L Lun, Stephanie C Hicks, and Davide Risso. 2022. “SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in R Using Bioconductor.”Bioinformatics 38 (11): 3128–31. https://doi.org/10.1093/bioinformatics/btac299.
Ståhl, Patrik L., Fredrik Salmén, Sanja Vickovic, Anna Lundmark, José Fernández Navarro, Jens Magnusson, Stefania Giacomello, et al. 2016. “Visualization and Analysis of Gene Expression in Tissue Sections by Spatial Transcriptomics.”Science 353 (6294): 78–82. https://doi.org/10.1126/science.aaf2403.
Traag, Vincent A, Ludo Waltman, and Nees Jan Van Eck. 2019. “From Louvain to Leiden: Guaranteeing Well-Connected Communities.”Scientific Reports 9 (1): 1–12. https://doi.org/10.1038/s41598-019-41695-z.
Virshup, Isaac, Sergei Rybakov, Fabian J Theis, Philipp Angerer, and F Alexander Wolf. 2024. “Anndata: Access and Store Annotated Data Matrices.”Journal of Open Source Software 9 (101): 4371. https://doi.org/10.21105/joss.04371.
Zuur, Alain F., Elena N. Ieno, and Graham M. Smith. 2007. Analysing Ecological Data. Statistics for Biology and Health. New York: Springer.
Source Code
# Lattice data analysis -- SummaryIn this vignette we will show:- An overview of lattice data analysis methods for HTS-based data.- This includes global metrics on the entire field of view and local variants thereof. - The use case is a 10x Visium data set from @mckellarLargescaleIntegrationSinglecell2021.- This overview is in parts based on the corresponding [Voyager 10x Visium vignette](https://pachterlab.github.io/voyager/articles/vig2_visium.html).# IntroductionIn contrast to point pattern based methods, we can view the location of cells or spots as a fixed lattice and measure the corresponding marker expression at each location. HTS-based spatial transcriptomic technologies often produce data on a regular lattice (i.e., approximately evenly spaced spots or beads of uniform size and shape), whereas imaging-based technologies yield irregular lattice structures (i.e., with variable cell sizes and shapes, and non-uniform spacing).For the representation of the cells, we will rely in `R` on the `r BiocStyle::Biocpkg('SpatialFeatureExperiment')` package (see below for more details). For preprocessing of the dataset and code examples, we refer the reader to this [10x Visium vignette](https://pachterlab.github.io/voyager/articles/vig2_visium.html) of the `r BiocStyle::Biocpkg('Voyager')` package [@mosesVoyagerExploratorySinglecell2023]. The `r BiocStyle::Biocpkg('Voyager')` package also provides wrapper functions around the package `r BiocStyle::CRANpkg('spdep')`[@pebesmaSpatialDataScience2023] that work directly on the `r BiocStyle::Biocpkg('SpatialFeatureExperiment')` object. The package `r BiocStyle::CRANpkg('spdep')` is designed for the analysis of spatial data with lattice structure.For the implementation in `Python` we will rely on the the packages [esda](https://pysal.org/esda/), [pysal](https://pysal.org/) and [squidpy](https://squidpy.readthedocs.io/en/stable/)[@reyPySALPythonLibrary2010; @pallaSquidpyScalableFramework2022]. For the data representation we rely on the [anndata](https://anndata.readthedocs.io/en/stable/) structure [@virshup2024anndata].::: {.panel-tabset}## `R````{r}suppressPackageStartupMessages({library(dplyr)library(ggplot2)library(patchwork)library(Voyager)library(SpatialFeatureExperiment)library(SFEData)library(spdep)library(sf)library(stringr)library(tidyr)library(magrittr)library(scater)library(bluster)library(scico)})theme_set(theme_light())roma_colors <-data.frame(roma_colors = scico::scico(256, palette ='roma'))write.csv(roma_colors , "../misc/roma_colors.csv")source('utils.R')```## `Python````{python}import numpy as npimport scanpy as scimport squidpy as sqfrom esda.moran import Moran, Moran_Local, Moran_Local_BVfrom esda.join_counts import Join_Countsfrom libpysal.weights import Wimport pandas as pdimport matplotlib.pyplot as pltfrom matplotlib.colors import ListedColormap, LinearSegmentedColormapimport warningsfrom scipy.stats import false_discovery_controlwarnings.filterwarnings("ignore")df_cmap_continuous = pd.read_csv("../misc/roma_colors.csv", index_col=0)cmap_continuous = LinearSegmentedColormap.from_list("roma", list(df_cmap_continuous["roma_colors"])).reversed()```::: ## Setup and PreprocessingWe will load a dataset by [@mckellarLargescaleIntegrationSinglecell2021] using the Visium technology [@stahlVisualizationAnalysisGene2016]. The data shows a sample taken from the tibialis anterior muscle of a mouse.::: {.panel-tabset}## `R````{r}# Load the datasetsfe <- SFEData::McKellarMuscleData(dataset ="full")# Take spots that are covered with tissuesfe_tissue <- sfe[, colData(sfe)$in_tissue]# Filter out genes with no countssfe_tissue <- sfe_tissue[rowSums(counts(sfe_tissue)) >0, ]# Convert counts log-transformed normalized expression valuessfe_tissue <- scater::logNormCounts(sfe_tissue)ann <- zellkonverter::SCE2AnnData(sfe_tissue)# Define the directory and file pathsdir_path <-"../data"# Check if the directory exists, and create it if it doesn'tif (!dir.exists(dir_path)) {dir.create(dir_path)}anndata::write_h5ad(ann, "../data/McKellarMuscleData_processed.h5ad")features <-c("Myh2", "Calr")```## `Python````{python}adata = sc.read_h5ad("../data/McKellarMuscleData_processed.h5ad")adata.obsm['spatial'] = np.array(list(zip((adata.obs['x'] *-1), adata.obs['y'])))adata.var_names = adata.var["symbol"]adata.raw = adata.copy()adata.X = adata.layers["logcounts"].astype(np.float64) # necessary for Local Moran to work```::: `SpatialFeatureExperiment`[@mosesVoyagerExploratorySinglecell2023] objects are an extension of the `r BiocStyle::Biocpkg('SpatialExperiment')` object [@righelliSpatialExperimentInfrastructureSpatiallyresolved2022]. It additionally contains geometric annotations that are encoded as simple features of the `r BiocStyle::CRANpkg('sf')` library [@pebesmaSpatialDataScience2023].For `Python` we use the `anndata` object [@virshup2024anndata].::: {.panel-tabset}## `R` ```{r}sfe_tissue```## `Python````{python}adata```::: For example, the spots of the Visium dataset are stored as a simple feature collection.```{r}colGeometry(sfe_tissue, "spotPoly") |>head()```# Lattice Data## DefinitionLattice data refers to spatial data collected at locations arranged in a regular or irregular grid (lattice). Each location has a defined spatial unit, and the sampling locations are fixed rather than random. This approach contrasts with point pattern analysis, where we assume that the locations were generated by a stochastic process [@zuurAnalysingEcologicalData2007; @pebesmaSpatialDataScience2023; @baddeleySpatialPointPatterns2015].As can be seen below, the Visium dataset is a regular lattice. The color shows the number of counts detected in each Visium spot. In contrast, the outlines of individual cells after segmentation (e.g., from a higher resolution spatial omics assay) could be seen as an irregular lattice. As this dataset also contains (manual) segmentations of myofibers (muscle cells stored as `myofiber_simplified`), we will illustrate calculations based on irregular lattices of the myofiber segmentations.::: {.panel-tabset}## `R` ```{r}# A plot using the plotSpatialFeature from Voyagerp <-plotSpatialFeature(sfe_tissue,"nCounts",annotGeometryName ="myofiber_simplified")# This extracts the segmented cellscells <-annotGeometry(sfe_tissue, "myofiber_simplified") |>st_geometry()# We can also use ggplot and geom_sf to plot sf objectsq <-ggplot() +geom_sf(data = cells, fill =NA) +theme_void()# Using `patchwork` to combine the plotsp | q```## `Python````{python}sq.pl.spatial_scatter(adata, color="nCounts", cmap="Blues", shape=None, library_id="spatial")plt.gca().set_axis_off()```:::## Spatial weight matrixIn lattice data analysis, a key concept is the *spatial weight matrix*, which models the spatial relationships between units in the lattice (i.e., spots or cells). Various methods exist for constructing this matrix, such as contiguity-based (direct neighbors), graph-based, or distance-based methods. [@getisSpatialWeightsMatrices2009; @zuurAnalysingEcologicalData2007; @pebesmaSpatialDataScience2023]. The documentation of the package `spdep` gives an [overview of the different methods](https://r-spatial.github.io/spdep/articles/nb.html).For Visium, the most straightforward way is to take the direct neighbours of each spot. This is done using the function `findVisiumGraph`.::: {.panel-tabset}## `R````{r}colGraph(sfe_tissue, "visium") <-findVisiumGraph(sfe_tissue)colGraph(sfe_tissue, "binary") <-findVisiumGraph(sfe_tissue, style ="B")``````{r}plotColGraph(sfe_tissue,colGraphName ="visium",colGeometryName ="spotPoly") +theme_void()```## `Python````{python}sq.gr.spatial_neighbors(adata, n_neighs=6, coord_type="grid")spatial_weights = W.from_sparse(adata.obsp['spatial_connectivities'])fig, ax = plt.subplots(1, 1, figsize=(4, 8))sq.pl.spatial_scatter( adata[adata.obsp["spatial_connectivities"].nonzero()[0], :], connectivity_key="spatial_connectivities", size=0.25, na_color="black", edges_color="black", edges_width=0.25, shape=None, library_id="spatial", ax=ax, fig=fig,)ax.set_axis_off()fig.suptitle("Connectivity grid", size=10, y=0.8)fig.set_dpi(200)```:::In an irregular lattice, the task of finding a spatial weight matrix is more complex, as different options exist. One option is to base the neighbourhood graph on neighbours that are in direct contact with each other (contiguous), as implemented in the `poly2nb` method.```{r}annotGraph(sfe_tissue, "myofiber_poly2nb") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3,method ="poly2nb", # wraps spdep function with same namezero.policy =TRUE )``````{r}p1 <-plotAnnotGraph(sfe_tissue,annotGraphName ="myofiber_poly2nb",annotGeometryName ="myofiber_simplified") +theme_void()```Alternatively, we could take the five nearest neighbours of each cell.```{r}annotGraph(sfe_tissue, "knn5") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3, # to use the annotation geometrymethod ="knearneigh", # wraps the spdep function with the same namek =5,zero.policy =TRUE )``````{r}p2 <-plotAnnotGraph(sfe_tissue,annotGraphName ="knn5",annotGeometryName ="myofiber_simplified") +theme_void()```As we can see below, the graphs look quite distinct. On the left side, in the contiguous neigbhbour graph (neighbours in direct contact), we notice the formation of patches, while in the $k$NN graph isolated patches are interconnected.```{r}p1 + p2```# Spatial autocorrelationSpatial autocorrelation measures the association between spatial units while considering that these spatial units are not independent measurements. These measures can be global (summarizing the entire field) or local (providing statistics for individual locations) [@pebesmaSpatialDataScience2023].## Global MeasuresGlobal methods give us an overview over the entire field-of-view and summarize the spatial autocorrelation metric to a single value. The metrics are a function of the weight matrix and the variables of interest. The variables of interest can be e.g. gene expression, intensity of a marker or the area of a cell. The global measures can be seen as a weighted average of the local metric, as explained below.In general, a global spatial autocorrelation measure has the form of a double sum over all locations $i,j$$$\sum_i \sum_j f(x_i,x_j) w_{ij}$$where $f(x_i,x_j)$ is the measure of association between features of interest and $w_{ij}$ scales the relationship by a spatial weight as defined in the weight matrix $W$ [@zuurAnalysingEcologicalData2007; @pebesmaSpatialDataScience2023].### Moran's $I$Moran's $I$ is the most prominent measure of spatial autocorrelation. The values are bounded by $-1$ and $1$. The expected value is close to $0$ for large $n$, the exact value is given by $\mathbb{E}(I) = -1/(n-1)$. A value higher than the expected value indicates spatial autocorrelation. Negative values indicate negative autocorrelation. Spatial autocorrelation means that similar values tend to be found together in the tissue. In fact, Moran's $I$ can be interpreted as the Pearson correlation between the value at location $i$ and the averages value of the neigbours of $i$, (neighbours as defined in the weight matrix $W$) [@moranNotesContinuousStochastic1950; @cliff1981spatial, p. 21].In the first example we will calculate Moran's $I$ for the number of counts and genes measured in the Visium dataset. First, we have a look at the distribution by eye.::: {.panel-tabset}## `R` ```{r}plotSpatialFeature(sfe_tissue,features = features,colGeometryName ="spotPoly",swap_rownames ="symbol")```## `Python````{python}fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 5))sq.pl.spatial_scatter(adata, color=["Myh2", "Calr"], cmap="Blues", shape=None, library_id="spatial", ax=[ax1, ax2], fig=fig, size=40, use_raw=False)ax1.set_axis_off()ax2.set_axis_off()```:::Based on these values we can calculate Moran's $I$::: {.panel-tabset}## `R````{r}sfe_tissue <-runUnivariate(sfe_tissue,type ="moran",features = features,colGraphName ="visium",swap_rownames ="symbol")rowData(sfe_tissue)[rowData(sfe_tissue)$symbol %in% features,]```## `Python````{python}moran_myh2 = Moran(adata[:, "Myh2"].X.toarray(), spatial_weights, permutations=100)moran_calr = Moran(adata[:, "Calr"].X.toarray(), spatial_weights, permutations=100)print("Calr:", moran_calr.I, "Myh2:", moran_myh2.I)```::: We can further use permutation testing to get a significance of our estimates.::: {.panel-tabset}## `R````{r}sfe_tissue <-runUnivariate(sfe_tissue,type ="moran.mc",features = features,colGraphName ="visium",swap_rownames ="symbol",nsim =100)rowData(sfe_tissue)[rowData(sfe_tissue)$symbol %in% features,]```## `Python````{python}moran_calr.p_sim, moran_myh2.p_simprint("Calr:", "I:", moran_calr.I, "p-value:", moran_calr.p_sim)print("Myh2:", "I:", moran_myh2.I, "p-value:", moran_myh2.p_sim)```::: As we can see, the Moran's $I$ values both indicate significant positive spatial autocorrelation. *Myh2* shows a stronger correlation according to Moran's $I$ than *Calr*. ## Local Measures for Univariate DataOften, a global measure is not enough. One number determining e.g. the spatial autocorrelation over an entire tissue slice might not be reflective of tissue heterogeneity. Therefore, local indicators of spatial associations have been developed [@pebesmaSpatialDataScience2023; @anselinLocalIndicatorsSpatial1995].### Local Moran's $I$Local Moran's $I$ provides a measure of spatial autocorrelation at each location, highlighting local clusters of similarity or dissimilarity [@anselinLocalIndicatorsSpatial1995; @pebesmaSpatialDataScience2023]. It is defined as:$$I_i = \frac{x_i - \bar{x}}{\sum_{k=1}^n(x_k-\bar{x})^2/(n-1)} \sum_{j=1}^n w_{ij}(x_j - \bar{x})$$where the index $i$ refers to the location for which the measure is calculated. The interpretation is analogous to global Moran's $I$ where a value of $I_i$ higher than $\mathbb{E}(I_i) = -w_i/(n-1)$ indicates spatial autocorrelation; smaller values indicate negative autocorrelation [@anselinLocalIndicatorsSpatial1995]. It is important to note that, as for the global counterpart, the value of local Moran's $I$ could be a result from both the high or low end of the values. Here we will calculate the local Moran's $I$ value for the measurement of muscle fiber marker gene *Myh2*:::: {.panel-tabset}## `R````{r}sfe_tissue <-runUnivariate(sfe_tissue,type ="localmoran",features ="Myh2",colGraphName ="visium",swap_rownames ="symbol")# plot the expression valuespExp <-plotSpatialFeature(sfe_tissue,features =c("Myh2"), colGeometryName ="spotPoly",swap_rownames ="symbol")# plot the local Moran's I valuespLi <-plotLocalResult(sfe_tissue, "localmoran",features =c("Myh2"),colGeometryName ="spotPoly", swap_rownames ="symbol",divergent =TRUE, diverge_center =0) +labs(fill ="li(Myh2)") # specify legend# plot the local Moran's I p-valuespPval <-plotLocalResult(sfe_tissue, "localmoran",features =c("Myh2"), "-log10p_adj",colGeometryName ="spotPoly", swap_rownames ="symbol",divergent =TRUE, diverge_center =-log10(0.05)) +labs(fill ="-log10(p.adj)") # specify legend# please note that the colour scales for plots 2 & 3 are differentpExp + pLi + pPval```## `Python````{python}local_moran_myh2 = Moran_Local(adata[:, "Myh2"].X.toarray(), spatial_weights, permutations=100, seed=3407)adata.obs["local_moran_myh2"] = local_moran_myh2.Isadata.obs["local_moran_myh2_p"] =-np.log10(false_discovery_control(local_moran_myh2.p_z_sim*2))fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(14, 5))sq.pl.spatial_scatter(adata, color="Myh2", cmap="Blues", shape=None, library_id="spatial", title=r"Myh2 $ln(\:\frac{y}{s}+1\:)$ counts", ax=ax1, fig=fig, size=40, use_raw=False)sq.pl.spatial_scatter(adata, color="local_moran_myh2", cmap=cmap_continuous, vmin=-adata.obs["local_moran_myh2"].max(), vcenter=0, shape=None, library_id="spatial", title="Local Moran's Is Myh2", ax=ax2, fig=fig, size=40)sq.pl.spatial_scatter(adata, color="local_moran_myh2_p", cmap=cmap_continuous, vmin=-2.5, shape=None, library_id="spatial", title="Simulated $-log_{10}(\:p_{adj}\:)$", ax=ax3, fig=fig, size=40)ax1.set_axis_off()ax2.set_axis_off()ax3.set_axis_off()```:::In the local version of Moran's $I$, the interpretation is the same as the global version. When interpreting local autocorrelation measures, it is important to consider both the effect size estimates and the significance level. Since the significance level is calculated for each spot separately, it is recommended to adjust for multiple testing. By default, `runUnivariate` uses the Benjamini & Hochberg correction [@benjamini_hochberg1995fdr]. The local Moran's $I$ statistics reveal locations in the tissue that have similar values to their neighbours (c.f., the lower part of the tissue).```{r}#| include: false#| eval: falsesfe_tissue <-runUnivariate(sfe_tissue,type ="localmoran", features ="Myh1",colGraphName ="visium", swap_rownames ="symbol")# plot the expression valuesplotSpatialFeature(sfe_tissue,features =c("Myh1"), colGeometryName ="spotPoly",swap_rownames ="symbol")# plot the local Moran's I valuesplotLocalResult(sfe_tissue, "localmoran",features =c("Myh1"),colGeometryName ="spotPoly", swap_rownames ="symbol",divergent =TRUE, diverge_center =0)# plot the local Moran's I p-valuesplotLocalResult(sfe_tissue, "localmoran",features =c("Myh1"), "-log10p_adj",colGeometryName ="spotPoly", swap_rownames ="symbol",divergent =TRUE, diverge_center =-log10(0.05))```## Multivariate measures -- Bivariate Moran's $I$There exists a bivariate version of Moran's $I$ as well.For two continuous observations the global bivariate Moran's $I$ is defined as [@wartenbergMultivariateSpatialCorrelation1985 ; @bivandPackagesAnalyzingSpatial2022]$$I_B = \frac{\sum_i(\sum_j{w_{ij}b_j\times a_i})}{\sum_i{b_i^2}}$$where $a_i$ and $b_i$ are the two variables of interest and $w_{ij}$ is the value of the spatial weights matrix for positions $i$ and $j$. It is a measure of the correlation of the variables $a$ with the average of the neighboring values for variable $b$ (also called the spatial lag of $b$).There exists a local version that we will explore here. Note that the results are not symmetric, but very similar to each other.::: {.panel-tabset}## `R````{r}sfe_tissue <-runBivariate(sfe_tissue, "localmoran_bv", # wraps the method from spdepc("Myh1", "Myh2"),swap_rownames ="symbol", nsim =100)# this command gives all results of localmoran_bvlocalResultFeatures(sfe_tissue, "localmoran_bv")plotLocalResult(sfe_tissue, "localmoran_bv", c("Myh1__Myh2", "Myh2__Myh1"),colGeometryName ="spotPoly", divergent =TRUE, diverge_center =0)```## `Python````{python}local_moran_bv_myh1_myh2 = Moran_Local_BV(adata[:, "Myh1"].X.toarray(), adata[:, "Myh2"].X.toarray(), spatial_weights, permutations=100, seed=3407)local_moran_bv_myh2_myh1 = Moran_Local_BV(adata[:, "Myh2"].X.toarray(), adata[:, "Myh1"].X.toarray(), spatial_weights, permutations=100, seed=3407)adata.obs["local_moran_bv_myh1_myh2"] = local_moran_bv_myh1_myh2.Isadata.obs["local_moran_bv_myh2_myh1"] = local_moran_bv_myh2_myh1.Isfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 5))sq.pl.spatial_scatter(adata, color="local_moran_bv_myh1_myh2", cmap=cmap_continuous, vmin=-adata.obs["local_moran_bv_myh1_myh2"].max(), shape=None, vcenter=0, library_id="spatial", title=r'Myh1 $\rightarrow$ Myh2', ax=ax1, fig=fig, size=40)sq.pl.spatial_scatter(adata, color="local_moran_bv_myh2_myh1", cmap=cmap_continuous, vmin=-adata.obs["local_moran_bv_myh2_myh1"].max(), shape=None, vcenter=0, library_id="spatial", title=r'Myh2 $\rightarrow$ Myh1', ax=ax2, fig=fig, size=40)fig.suptitle("Moran's local bivariate I", weight="bold", size=18, y=0.98)ax1.set_axis_off()ax2.set_axis_off()fig.tight_layout()```::: Please note that the result might overestimate the spatial autocorrelation of the variables due to the inherent (non-spatial) correlation of $x$ and $y$ [@bivandPackagesAnalyzingSpatial2022].The package [`spatialDM`](https://github.com/StatBiomed/SpatialDM)[@liSpatialDMRapidIdentification2023] uses an adapted version of bivariate Moran's $I$ to identify ligand-receptor pairs.## Impact of neighbourhood on autocorrelation measuresLet's compare the impact of different spatial weight matrices on local autocorrelation analysis. In this Visium dataset we do not have gene expression information for each cell because of the resolution of the spots. We will instead calculate the autocorrelation measures on the area of the segmented cells. This could be helpful when looking at local cell densities in a tissue.First, we base the spatial weight matrix on contiguous neighbours.```{r}sfe_tissue <-annotGeometryUnivariate(sfe_tissue, "localmoran", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="myofiber_poly2nb",include_self =FALSE, zero.policy =TRUE,name ="myofiber_poly2nb")pPoly <-plotLocalResult(sfe_tissue, "myofiber_poly2nb", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="myofiber_poly2nb",divergent =TRUE, diverge_center =0) +labs(title ="Local Moran's I (li)\nContiguous Neighbours", fill ="li(area)")```As an alternative, we also base the spatial weight matrix on the 5 and 10 nearest neighbours.```{r}sfe_tissue <-annotGeometryUnivariate(sfe_tissue, "localmoran", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn5",include_self =FALSE, zero.policy =TRUE,name ="knn5")pKnn5 <-plotLocalResult(sfe_tissue, "knn5", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn5",divergent =TRUE, diverge_center =0) +labs(title ="Local Moran's I (li)\n5 Nearest Neighbours", fill ="li(area)")annotGraph(sfe_tissue, "knn10") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3, # to use the annotation geometrymethod ="knearneigh", # wraps the spdep function with the same namek =10,zero.policy =TRUE )sfe_tissue <-annotGeometryUnivariate(sfe_tissue, "localmoran", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn10",include_self =FALSE, zero.policy =TRUE,name ="knn10")pKnn10 <-plotLocalResult(sfe_tissue, "knn10", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn10",divergent =TRUE, diverge_center =0) +labs(title ="Local Moran's I (li)\n10 Nearest Neighbours", fill ="li(area)")pPoly + pKnn5 + pKnn10```The overall pattern of spatial autocorrelation is similar across the different weight matrices. However, locally there are some differences, as well as in the scale of the autocorrelation (note that the same color in the different plots does not correspond to the same value between the plots). There is a trend of smoothing of the value, the larger the neighbourhood in consideration.## Measures for categorical dataThere also exist methods to detect spatial autocorrelation of categorical variables. As we have no categorical measurements in this dataset, we perform non-spatial clustering using Leiden clustering on the PCA space [@traag2019louvain].::: {.panel-tabset}## `R````{r, eval = TRUE}library(BiocNeighbors)library(BiocSingular)set.seed(123)# Run PCA on the samplesfe <-runPCA(sfe_tissue, exprs_values ="logcounts", ncomponents =50, BSPARAM =IrlbaParam())# Cluster based on first 20 PC's and using leidencolData(sfe_tissue)$cluster <-clusterRows(reducedDim(sfe, "PCA")[,1:10],BLUSPARAM =KNNGraphParam(k =20,BNPARAM=AnnoyParam(ntrees=50),cluster.fun ="leiden",cluster.args =list(resolution =0.3,objective_function ="modularity")))```## `Python````{python}np.random.seed(123)#compute a PCA on the sc.pp.pca(adata, n_comps =50, zero_center =True, svd_solver ="arpack")#compute the neighbourssc.pp.neighbors(adata, use_rep ="X_pca", knn =True, n_pcs =10)#compute leiden clusteringsc.tl.leiden(adata, resolution =0.3, flavor ="igraph", objective_function ="modularity")```:::We can visually inspect the spatial arrangement of the clusters. We see that they are spatially "clustered".::: {.panel-tabset}## `R````{r}plotSpatialFeature(sfe_tissue,"cluster",colGeometryName ="spotPoly")```## `Python````{python}sq.pl.spatial_scatter(adata, color="leiden", library_id="spatial", title="Clusters", shape=None, size=55)plt.gca().set_axis_off()plt.show()```:::Metrics such as the join count statistics can be use to analysis spatial arrangements of categorical data. In essence, the join count statistic calculates the frequency of categories among neighbours and compares this value with a theoretical distribution or permutations of the labels to get a significance score [@cliff1981spatial].Here we will use an extension to multiple categories `joincount.multi`. The test gives information if the different clusters are in contact with itself and each other. The output shows the calculated number of contacts vs. the expected number, i.e., if the clusters were randomly assigned to the locations. The function does not report $p$-values, but they can be calculated at a specified significance level using the $z$-values.::: {.panel-tabset}## `R````{r}joincount.multi(colData(sfe_tissue)$cluster,colGraph(sfe_tissue, "binary"))```## `Python````{python}sq.gr.interaction_matrix(adata, "leiden", normalized =False, connectivity_key="spatial", weights =False)df_interactions = pd.DataFrame(adata.uns["leiden_interactions"], columns=np.unique(adata.obs["leiden"]), index=np.unique(adata.obs["leiden"]))# add lower triangular matrix (w/o diagonal) to the dataframe and divide by 2array_join_counts = (df_interactions + np.tril(df_interactions, k =-1).T)/2#only print the upper triangular matrixnp.triu(array_join_counts)```::: For example, we can see here that the calculated number of contacts of clusters $1$ and $3$ is far below the expected value. On the other hand all clusters are in contact with itself more often than expected at random.Therefore, in `Python` we add the lower triangular matrix to the upper triangle (without the diagonal) and divide the resulting interaction matrix by 2 as the interactions in `R` are undirected.# Summary and Considerations- Lattice data refers to spatial data collected at fixed locations arranged in regular or irregular grids, contrasting with stochastic point pattern analysis.- Regular lattices have uniform, evenly spaced spots, while irregular lattices have variable sizes and shapes with non-uniform spacing.- The spatial weight matrix models spatial relationships between lattice units. Different methods exist to define the spatial weight matrix.- Global spatial autocorrelation measures, like Moran's $I$, summarize spatial correlation over the entire field, while local measures identify local clusters of similarity or dissimilarity.- There exist measures of spatial autocorrelation for continuous and categorical data.# Appendix## Session info```{r}sessionInfo()``````{r, include=FALSE}# Define the directory and file pathsdir_path <-"../tests/"dir_path_out <-"../tests/out"# Check if the directory exists, and create it if it doesn'tif (!dir.exists(dir_path)) {dir.create(dir_path)dir.create(dir_path_out)}``````{python, include=FALSE}# export some Python results such that we can later compare between runsmoran_calr_res = moran_calr.Imoran_myh2_res = moran_myh2.I# save for testing the runs lateradata.obs.to_csv("../tests/out/test-00-overview-lat-Python.csv")``````{r, include=FALSE}# export some R results such that we can later compare between runslibrary(reticulate)localResults(sfe)[[1]] |>as.data.frame() |>write.csv("../tests/out/test-00-overview-lat-R.csv")pyRes <-c(py$moran_calr_res, py$moran_myh2_res)rRes <-rowData(sfe_tissue)[rowData(sfe_tissue)$symbol %in% features, "moran_Vis5A"]test <-round(pyRes,6) ==round(rRes,6)data.frame(test =c("Morans_I_calr", "Morans_I_myh2"), identical = test) |>write.csv("../tests/out/test-00-overview-lat-R-Python-compare.csv")```