Published

February 4, 2025

Lattice data analysis – Summary

In this vignette we will show:

  • An overview of lattice data analysis methods for HTS-based data.

  • This includes global metrics on the entire field of view and local variants thereof.

  • The use case is a 10x Visium data set from McKellar et al. (2021).

  • This overview is in parts based on the corresponding Voyager 10X Visium vignette.

Introduction

In contrast to point pattern based methods, we can view the location of cells or spots as a fixed lattice and measure the corresponding marker expression at each location. HTS-based spatial transcriptomic technologies often produce data on a regular lattice (i.e., approximately evenly spaced spots or beads of uniform size and shape), whereas imaging-based technologies yield irregular lattice structures (i.e., with variable cell sizes and shapes, and non-uniform spacing).

For this representation of the cells, we will rely in R on the SpatialFeatureExperiment package (see below for more details). For preprocessing of the dataset and code examples, we refer the reader to this 10X Visium vignette of the Voyager package (Moses et al. 2023). The Voyager package also provides wrapper functions around the package spdep (Pebesma and Bivand 2023) that work directly on the SpatialFeatureExperiment object. The package spdep is designed for the analysis of spatial data with lattice structure.

For the implementation in Python we will rely on the the packages esda, pysal and squidpy (Rey and Anselin 2010; Palla et al. 2022). For the data representation we rely on the anndata structure (Virshup et al. 2024).

Show the code
suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
  library(patchwork)
  library(Voyager)
  library(SpatialFeatureExperiment)
  library(SFEData)
  library(spdep)
  library(sf)
  library(stringr)
  library(tidyr)
  library(magrittr)
  library(scater)
  library(bluster)
})

theme_set(theme_light())

source('utils.R')
Show the code
import numpy as np
import scanpy as sc
import squidpy as sq
from esda.moran import Moran, Moran_Local, Moran_Local_BV
from esda.join_counts import Join_Counts
from libpysal.weights import W
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import warnings
from scipy.stats import false_discovery_control

warnings.filterwarnings("ignore")

plt.rcParams['font.family'] = 'Futura'
df_cmap_discrete = pd.read_csv("../misc/ditto_colors.csv", index_col=0)
df_cmap_continuous = pd.read_csv("../misc/roma_colors.csv", index_col=0)
cmap_discrete = ListedColormap(df_cmap_discrete["ditto_colors"].values, "ditto")
cmap_continuous = LinearSegmentedColormap.from_list("roma", df_cmap_continuous["roma_colors"]).reversed()

Setup and Preprocessing

We will load a dataset by (McKellar et al. 2021) using the Visium technology (Ståhl et al. 2016). The data shows a sample taken from the tibialis anterior muscle of a mouse.

Show the code
# Load the dataset
sfe <- SFEData::McKellarMuscleData(dataset = "full")
# Take spots that are covered with tissue
sfe_tissue <- sfe[, colData(sfe)$in_tissue]
# Filter out genes with no counts
sfe_tissue <- sfe_tissue[rowSums(counts(sfe_tissue)) > 0, ]
# Convert counts log-transformed normalized expression values
sfe_tissue <- scater::logNormCounts(sfe_tissue)

ann <-  zellkonverter::SCE2AnnData(sfe_tissue)
anndata::write_h5ad(ann, "../data/McKellarMuscleData_processed.h5ad")

features <- c("Myh2", "Calr")
Show the code
adata = sc.read_h5ad("../data/McKellarMuscleData_processed.h5ad")
adata.obsm['spatial'] = np.array(list(zip((adata.obs['x'] * -1), adata.obs['y'])))
adata.var_names = adata.var["symbol"]
adata.raw = adata.copy()
adata.X = adata.layers["logcounts"].astype(np.float64) # necessary for Local Moran to work

SpatialFeatureExperiment (Moses et al. 2023) objects are an extension of the SpatialExperiment object (Righelli et al. 2022). It additionally contains geometric annotations that are encoded as simple features of the sf library (Pebesma and Bivand 2023).

For Python we use the anndata object (Virshup et al. 2024).

Show the code
sfe_tissue
class: SpatialFeatureExperiment 
dim: 15043 932 
metadata(0):
assays(2): counts logcounts
rownames(15043): ENSMUSG00000025902 ENSMUSG00000096126 ...
  ENSMUSG00000064368 ENSMUSG00000064370
rowData names(6): Ensembl symbol ... vars cv2
colnames(932): AAACATTTCCCGGATT AAACCTAAGCAGCCGG ... TTGTGTTTCCCGAAAG
  TTGTTGTGTGTCAAGA
colData names(13): barcode col ... in_tissue sizeFactor
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : imageX imageY
imgData names(1): sample_id

unit: full_res_image_pixels
Geometries:
colGeometries: spotPoly (POLYGON) 
annotGeometries: tissueBoundary (POLYGON), myofiber_full (POLYGON), myofiber_simplified (POLYGON), nuclei (POLYGON), nuclei_centroid (POINT) 

Graphs:
Vis5A: 
Show the code
adata
AnnData object with n_obs × n_vars = 932 × 15043
    obs: 'barcode', 'col', 'row', 'x', 'y', 'dia', 'tissue', 'sample_id', 'nCounts', 'nGenes', 'prop_mito', 'in_tissue', 'sizeFactor'
    var: 'Ensembl', 'symbol', 'type', 'means', 'vars', 'cv2'
    uns: 'X_name'
    obsm: 'spatial'
    layers: 'logcounts'

For example, the spots of the Visium dataset are stored as a simple feature collection.

Show the code
colGeometry(sfe_tissue, "spotPoly") |> head()
Simple feature collection with 6 features and 0 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 5857.695 ymin: 4126.151 xmax: 10206.53 ymax: 11789.24
CRS:           NA
                        geometry
1 POLYGON ((6037.973 5430.826...
2 POLYGON ((7977.344 4461.467...
3 POLYGON ((8547.633 11699.1,...
4 POLYGON ((6867.944 4463.856...
5 POLYGON ((9779.598 4216.29,...
6 POLYGON ((10206.53 9282.577...

Lattice Data

Definition

Lattice data refers to spatial data collected at locations arranged in a regular or irregular grid (lattice). Each location has a defined spatial unit, and the sampling locations are fixed rather than random. This approach contrasts with point pattern analysis, where we assume that the locations were generated by a stochastic process (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Baddeley, Rubak, and Turner 2015).

As can be seen below, the Visium dataset is a regular lattice. The color shows the number of counts detected in each Visium spot. In contrast, the outlines of individual cells after segmentation (e.g., from a higher resolution spatial omics assay) could be seen as an irregular lattice. As this dataset also contains (manual) segmentations of myofibers (muscle cells stored as myofiber_simplified), we will illustrate calculations based on irregular lattices of the myofiber segmentations.

Show the code
# A plot using the plotSpatialFeature from Voyager
p <- plotSpatialFeature(sfe_tissue,
  "nCounts",
  annotGeometryName = "myofiber_simplified"
)

# This extracts the segmented cells
cells <- annotGeometry(sfe_tissue, "myofiber_simplified") |>
  st_geometry()

# We can also use ggplot and geom_sf to plot sf objects
q <- ggplot() +
  geom_sf(data = cells, fill = NA) +
  theme_void()

# Using `patchwork` to combine the plots
p | q

Show the code
sq.pl.spatial_scatter(adata, color="nCounts", cmap="Blues", shape=None, library_id="spatial")
plt.gca().set_axis_off()

Spatial weight matrix

In lattice data analysis, a key concept is the spatial weight matrix, which models the spatial relationships between units in the lattice (i.e., spots or cells). Various methods exist for constructing this matrix, such as contiguity-based (direct neighbors), graph-based, or distance-based methods. (Getis 2009; Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023). The documentation of the package spdep gives an overview of the different methods.

For Visium, the most straightforward way is to take the direct neighbours of each spot. This is done using the function findVisiumGraph.

Show the code
colGraph(sfe_tissue, "visium") <- findVisiumGraph(sfe_tissue)
colGraph(sfe_tissue, "binary") <- findVisiumGraph(sfe_tissue, style = "B")
Show the code
plotColGraph(sfe_tissue,
  colGraphName = "visium",
  colGeometryName = "spotPoly"
) + theme_void()

Show the code
sq.gr.spatial_neighbors(adata, n_neighs=6, coord_type="grid")
spatial_weights = W.from_sparse(adata.obsp['spatial_connectivities'])

fig, ax = plt.subplots(1, 1, figsize=(4, 8))
sq.pl.spatial_scatter(
    adata[adata.obsp["spatial_connectivities"].nonzero()[0], :],
    connectivity_key="spatial_connectivities",
    size=0.25,
    na_color="black",
    edges_color="black",
    edges_width=0.25,
    shape=None,
    library_id="spatial",
    ax=ax,
    fig=fig,
)
ax.set_axis_off()
fig.suptitle("Connectivity grid", size=10, y=0.8)
fig.set_dpi(200)

In an irregular lattice, the task of finding a spatial weight matrix is more complex, as different options exist. One option is to base the neighbourhood graph on neighbours that are in direct contact with each other (contiguous), as implemented in the poly2nb method.

Show the code
annotGraph(sfe_tissue, "myofiber_poly2nb") <-
  findSpatialNeighbors(sfe_tissue,
    type = "myofiber_simplified",
    MARGIN = 3,
    method = "poly2nb", # wraps spdep function with same name
    zero.policy = TRUE
  )
Show the code
p1 <- plotAnnotGraph(sfe_tissue,
  annotGraphName = "myofiber_poly2nb",
  annotGeometryName = "myofiber_simplified"
) + theme_void()

Alternatively, we could take the five nearest neighbours of each cell.

Show the code
annotGraph(sfe_tissue, "knn5") <-
  findSpatialNeighbors(sfe_tissue,
    type = "myofiber_simplified",
    MARGIN = 3, # to use the annotation geometry
    method = "knearneigh", # wraps the spdep function with the same name
    k = 5,
    zero.policy = TRUE
  )
Show the code
p2 <- plotAnnotGraph(sfe_tissue,
  annotGraphName = "knn5",
  annotGeometryName = "myofiber_simplified"
) + theme_void()

As we can see below, the graphs look quite distinct. On the left side, in the contiguous neigbhbour graph (neighbours in direct contact), we notice the formation of patches, while in the knn graph isolated patches are interconnected.

Show the code
p1 + p2

Spatial autocorrelation

Spatial autocorrelation measures the association between spatial units while considering that these spatial units are not independent measurements. These measures can be global (summarizing the entire field) or local (providing statistics for individual locations) (Pebesma and Bivand 2023).

Global Measures

Global methods give us an overview over the entire field-of-view and summarize the spatial autocorrelation metric to a single value. The metrics are a function of the weight matrix and the variables of interest. The variables of interest can be e.g. gene expression, intensity of a marker or the area of a cell. The global measures can be seen as a weighted average of the local metric, as explained below.

In general, a global spatial autocorrelation measure has the form of a double sum over all locations \(i,j\)

\[\sum_i \sum_j f(x_i,x_j) w_{ij}\]

where \(f(x_i,x_j)\) is the measure of association between features of interest and \(w_{ij}\) scales the relationship by a spatial weight as defined in the weight matrix \(W\) (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023).

Moran’s \(I\)

Moran’s \(I\) is the most prominent measure of spatial autocorrelation. The values are bounded by \(-1\) and \(1\). The expected value is close to \(0\) for large \(n\), the exact value is given by \(\mathbb{E}(I) = -1/(n-1)\). A value higher than the expected value indicates spatial autocorrelation. Negative values indicate negative autocorrelation. Spatial autocorrelation means that similar values tend to be found together in the tissue. In fact, Moran’s \(I\) can be interpreted as the Pearson correlation between the value at location \(i\) and the averages value of the neigbours of \(i\), (neighbours as defined in the weight matrix \(W\)) (Moran 1950; Cliff and Ord 1981, 21).

In the first example we will calculate Moran’s \(I\) for the number of counts and genes measured in the Visium dataset. First, we have a look at the distribution by eye.

Show the code
plotSpatialFeature(sfe_tissue,
  features = features,
  colGeometryName = "spotPoly",
  swap_rownames = "symbol"
)

Show the code
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 5))
sq.pl.spatial_scatter(adata, color=["Myh2", "Calr"], cmap="Blues", shape=None, library_id="spatial", ax=[ax1, ax2], fig=fig, size=40, use_raw=False)
ax1.set_axis_off()
ax2.set_axis_off()

Based on these values we can calculate Moran’s \(I\)

Show the code
sfe_tissue <- runUnivariate(sfe_tissue,
  type = "moran",
  features = features,
  colGraphName = "visium",
  swap_rownames = "symbol"
)

rowData(sfe_tissue)[rowData(sfe_tissue)$symbol %in% features,]
DataFrame with 2 rows and 8 columns
                              Ensembl      symbol            type     means
                          <character> <character>     <character> <numeric>
ENSMUSG00000003814 ENSMUSG00000003814        Calr Gene Expression  0.335537
ENSMUSG00000033196 ENSMUSG00000033196        Myh2 Gene Expression  0.974760
                        vars       cv2 moran_Vis5A   K_Vis5A
                   <numeric> <numeric>   <numeric> <numeric>
ENSMUSG00000003814   1.37427   12.2065    0.233636   2.44922
ENSMUSG00000033196  24.03743   25.2984    0.625500   2.21641
Show the code
moran_myh2 = Moran(adata[:, "Myh2"].X.toarray(), spatial_weights, permutations=100)
moran_calr = Moran(adata[:, "Calr"].X.toarray(), spatial_weights, permutations=100)
print("Calr:", moran_calr.I, "Myh2:", moran_myh2.I)
Calr: 0.2336358785017223 Myh2: 0.6255000477049334

We can further use permutation testing to get a significance of our estimates.

Show the code
sfe_tissue <- runUnivariate(sfe_tissue,
  type = "moran.mc",
  features = features,
  colGraphName = "visium",
  swap_rownames = "symbol",
  nsim = 100
)

rowData(sfe_tissue)[rowData(sfe_tissue)$symbol %in% features,]
DataFrame with 2 rows and 14 columns
                              Ensembl      symbol            type     means
                          <character> <character>     <character> <numeric>
ENSMUSG00000003814 ENSMUSG00000003814        Calr Gene Expression  0.335537
ENSMUSG00000033196 ENSMUSG00000033196        Myh2 Gene Expression  0.974760
                        vars       cv2 moran_Vis5A   K_Vis5A
                   <numeric> <numeric>   <numeric> <numeric>
ENSMUSG00000003814   1.37427   12.2065    0.233636   2.44922
ENSMUSG00000033196  24.03743   25.2984    0.625500   2.21641
                   moran.mc_statistic_Vis5A moran.mc_parameter_Vis5A
                                  <numeric>                <numeric>
ENSMUSG00000003814                 0.233636                      101
ENSMUSG00000033196                 0.625500                      101
                   moran.mc_p.value_Vis5A moran.mc_alternative_Vis5A
                                <numeric>                <character>
ENSMUSG00000003814             0.00990099                    greater
ENSMUSG00000033196             0.00990099                    greater
                    moran.mc_method_Vis5A
                              <character>
ENSMUSG00000003814 Monte-Carlo simulati..
ENSMUSG00000033196 Monte-Carlo simulati..
                                        moran.mc_res_Vis5A
                                                    <list>
ENSMUSG00000003814 -0.00325035,-0.01511768, 0.00461261,...
ENSMUSG00000033196  0.04594527,-0.00278334, 0.00390283,...
Show the code
moran_calr.p_sim, moran_myh2.p_sim
(np.float64(0.009900990099009901), np.float64(0.009900990099009901))
Show the code
print("Calr:", "I:", moran_calr.I, "p-value:", moran_calr.p_sim)
Calr: I: 0.2336358785017223 p-value: 0.009900990099009901
Show the code
print("Myh2:", "I:", moran_myh2.I, "p-value:", moran_myh2.p_sim)
Myh2: I: 0.6255000477049334 p-value: 0.009900990099009901

As we can see, the Moran’s \(I\) values both indicate significant positive spatial autocorrelation. Myh2 shows a stronger correlation according to Moran’s \(I\) than Calr.

Local Measures for Univariate Data

Often, a global measure is not enough. One number determining e.g. the spatial autocorrelation over an entire tissue slice might not be reflective of tissue heterogeneity. Therefore, local indicators of spatial associations have been developed (Pebesma and Bivand 2023; Anselin 1995).

Local Moran’s \(I\)

Local Moran’s \(I\) provides a measure of spatial autocorrelation at each location, highlighting local clusters of similarity or dissimilarity (Anselin 1995; Pebesma and Bivand 2023). It is defined as:

\[I_i = \frac{x_i - \bar{x}}{\sum_{k=1}^n(x_k-\bar{x})^2/(n-1)} \sum_{j=1}^n w_{ij}(x_j - \bar{x})\]

where the index \(i\) refers to the location for which the measure is calculated. The interpretation is analogous to global Moran’s \(I\) where a value of \(I_i\) higher than \(\mathbb{E}(I_i) = -w_i/(n-1)\) indicates spatial autocorrelation; smaller values indicate negative autocorrelation (Anselin 1995). It is important to note that, as for the global counterpart, the value of local Moran’s \(I\) could be a result from both the high or low end of the values. Here we will calculate the local Moran’s \(I\) value for the measurement of muscle fiber marker gene Myh2:

Show the code
sfe_tissue <- runUnivariate(sfe_tissue,
  type = "localmoran",
  features = "Myh2",
  colGraphName = "visium",
  swap_rownames = "symbol"
)

# plot the expression values
pExp <- plotSpatialFeature(sfe_tissue,
  features = c("Myh2"), colGeometryName = "spotPoly",
  swap_rownames = "symbol"
)

# plot the local Moran's I values
pLi <- plotLocalResult(sfe_tissue, "localmoran",
  features = c("Myh2"),
  colGeometryName = "spotPoly", swap_rownames = "symbol",
  divergent = TRUE, diverge_center = 0
) + labs(fill = "li(Myh2)") # specify legend


# plot the local Moran's I p-values
pPval <- plotLocalResult(sfe_tissue, "localmoran",
  features = c("Myh2"), "-log10p_adj",
  colGeometryName = "spotPoly", swap_rownames = "symbol",
  divergent = TRUE, diverge_center = -log10(0.05)
) + labs(fill = "-log10(p.adj)") # specify legend

# please note that the colour scales for plots 2 & 3 are different
pExp + pLi + pPval

Show the code
local_moran_myh2 = Moran_Local(adata[:, "Myh2"].X.toarray(), spatial_weights, permutations=100, seed=3407)
adata.obs["local_moran_myh2"] = local_moran_myh2.Is
adata.obs["local_moran_myh2_p"] = -np.log10(false_discovery_control(local_moran_myh2.p_z_sim*2))
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(14, 5))
sq.pl.spatial_scatter(adata, color="Myh2", cmap="Blues", shape=None, library_id="spatial", title=r"Myh2 $ln(\:\frac{y}{s}+1\:)$ counts", ax=ax1, fig=fig, size=40, use_raw=False)
sq.pl.spatial_scatter(adata, color="local_moran_myh2", cmap=cmap_continuous, vmin=-adata.obs["local_moran_myh2"].max(), vcenter=0, shape=None, library_id="spatial", title="Local Moran's Is Myh2", ax=ax2, fig=fig, size=40)
sq.pl.spatial_scatter(adata, color="local_moran_myh2_p", cmap=cmap_continuous, vmin=-2.5, shape=None, library_id="spatial", title="Simulated $-log_{10}(\:p_{adj}\:)$", ax=ax3, fig=fig, size=40)
ax1.set_axis_off()
ax2.set_axis_off()
ax3.set_axis_off()

In the local version of Moran’s \(I\), the interpretation is the same as the global version. When interpreting local autocorrelation measures, it is important to consider both the effect size estimates and the significance level. Since the significance level is calculated for each spot separately, it is recommended to adjust for multiple testing. By default, runUnivariate uses the Benjamini & Hochberg correction (Benjamini and Hochberg 1995). The local Moran’s \(I\) statistics reveal locations in the tissue that have similar values to their neighbours (c.f., the lower part of the tissue).

Multivariate measures – Bivariate Moran’s \(I\)

There exists a bivariate version of Moran’s \(I\) as well.

For two continuous observations the global bivariate Moran’s \(I\) is defined as (Wartenberg 1985; Bivand 2022)

\[I_B = \frac{\Sigma_i(\Sigma_j{w_{ij}b_j\times a_i})}{\Sigma_i{b_i^2}}\]

where \(a_i\) and \(b_i\) are the two variables of interest and \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\). It is a measure of the correlation of the variables \(a\) with the average of the neighboring values for variable \(b\) (also called the spatial lag of \(b\)).

There exists a local version that we will explore here. Note that the results are not symmetric, but very similar to each other.

Show the code
sfe_tissue <- runBivariate(sfe_tissue, "localmoran_bv", # wraps the method from spdep
  c("Myh1", "Myh2"),
  swap_rownames = "symbol", nsim = 100
)

# this command gives all results of localmoran_bv
localResultFeatures(sfe_tissue, "localmoran_bv")
[1] "Myh1__Myh1" "Myh2__Myh1" "Myh1__Myh2" "Myh2__Myh2"
Show the code
plotLocalResult(sfe_tissue, "localmoran_bv", c("Myh1__Myh2", "Myh2__Myh1"),
  colGeometryName = "spotPoly", divergent = TRUE, diverge_center = 0
)

Show the code
local_moran_bv_myh1_myh2 = Moran_Local_BV(adata[:, "Myh1"].X.toarray(), adata[:, "Myh2"].X.toarray(), spatial_weights, permutations=100, seed=3407)
local_moran_bv_myh2_myh1 = Moran_Local_BV(adata[:, "Myh2"].X.toarray(), adata[:, "Myh1"].X.toarray(), spatial_weights, permutations=100, seed=3407)
adata.obs["local_moran_bv_myh1_myh2"] = local_moran_bv_myh1_myh2.Is
adata.obs["local_moran_bv_myh2_myh1"] = local_moran_bv_myh2_myh1.Is

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 5))
sq.pl.spatial_scatter(adata, color="local_moran_bv_myh1_myh2", cmap=cmap_continuous, vmin=-adata.obs["local_moran_bv_myh1_myh2"].max(), shape=None, vcenter=0, library_id="spatial", title=r'Myh1 $\rightarrow$ Myh2', ax=ax1, fig=fig, size=40)
sq.pl.spatial_scatter(adata, color="local_moran_bv_myh2_myh1", cmap=cmap_continuous, vmin=-adata.obs["local_moran_bv_myh2_myh1"].max(), shape=None, vcenter=0, library_id="spatial", title=r'Myh2 $\rightarrow$ Myh1', ax=ax2, fig=fig, size=40)
fig.suptitle("Moran's local bivariate I", weight="bold", size=18, y=0.98)
ax1.set_axis_off()
ax2.set_axis_off()
fig.tight_layout()

Please note that the result might overestimate the spatial autocorrelation of the variables due to the inherent (non-spatial) correlation of \(x\) and \(y\) (Bivand 2022).

The package spatialDM (Li et al. 2023) uses an adapted version of bivariate Moran’s \(I\) to identify ligand-receptor pairs.

Impact of neighbourhood on autocorrelation measures

Let’s compare the impact of different spatial weight matrices on local autocorrelation analysis. In this Visium dataset we do not have gene expression information for each cell because of the resolution of the spots. We will instead calculate the autocorrelation measures on the area of the segmented cells. This could be helpful when looking at local cell densities in a tissue.

First, we base the spatial weight matrix on contiguous neighbours.

Show the code
sfe_tissue <- annotGeometryUnivariate(sfe_tissue, "localmoran", "area",
  annotGeometryName = "myofiber_simplified",
  annotGraphName = "myofiber_poly2nb",
  include_self = FALSE, zero.policy = TRUE,
  name = "myofiber_poly2nb"
)

pPoly <- plotLocalResult(sfe_tissue, "myofiber_poly2nb", "area",
  annotGeometryName = "myofiber_simplified",
  annotGraphName = "myofiber_poly2nb",
  divergent = TRUE, diverge_center = 0
) + labs(title = "Local Moran's I (li)\nContiguous Neighbours", 
         fill = "li(area)")

As an alternative, we also base the spatial weight matrix on the 5 and 10 nearest neighbours.

Show the code
sfe_tissue <- annotGeometryUnivariate(sfe_tissue, "localmoran", "area",
  annotGeometryName = "myofiber_simplified",
  annotGraphName = "knn5",
  include_self = FALSE, zero.policy = TRUE,
  name = "knn5"
)

pKnn5 <- plotLocalResult(sfe_tissue, "knn5", "area",
  annotGeometryName = "myofiber_simplified",
  annotGraphName = "knn5",
  divergent = TRUE, diverge_center = 0
) + labs(title = "Local Moran's I (li)\n5 Nearest Neighbours", 
         fill = "li(area)")

annotGraph(sfe_tissue, "knn10") <-
  findSpatialNeighbors(sfe_tissue,
    type = "myofiber_simplified",
    MARGIN = 3, # to use the annotation geometry
    method = "knearneigh", # wraps the spdep function with the same name
    k = 10,
    zero.policy = TRUE
  )

sfe_tissue <- annotGeometryUnivariate(sfe_tissue, "localmoran", "area",
  annotGeometryName = "myofiber_simplified",
  annotGraphName = "knn10",
  include_self = FALSE, zero.policy = TRUE,
  name = "knn10"
)

pKnn10 <- plotLocalResult(sfe_tissue, "knn10", "area",
  annotGeometryName = "myofiber_simplified",
  annotGraphName = "knn10",
  divergent = TRUE, diverge_center = 0
) + labs(title = "Local Moran's I (li)\n10 Nearest Neighbours", 
         fill = "li(area)")

pPoly + pKnn5 + pKnn10

The overall pattern of spatial autocorrelation is similar across the different weight matrices. However, locally there are some differences, as well as in the scale of the autocorrelation (note that the same color in the different plots does not correspond to the same value between the plots). There is a trend of smoothing of the value, the larger the neighbourhood in consideration.

Measures for categorical data

There also exist methods to detect spatial autocorrelation of categorical variables. As we have no categorical measurements in this dataset, we perform non-spatial clustering using Leiden clustering on the PCA space (Traag, Waltman, and Van Eck 2019).

Show the code
library(BiocNeighbors)
library(BiocSingular)

set.seed(123)
# Run PCA on the sample
sfe <- runPCA(sfe_tissue, exprs_values = "logcounts", ncomponents = 50, BSPARAM = IrlbaParam())
# Cluster based on first 20 PC's and using leiden
colData(sfe_tissue)$cluster <- clusterRows(reducedDim(sfe, "PCA")[,1:10],
                                    BLUSPARAM = KNNGraphParam(
                                      k = 20,
                                      BNPARAM=AnnoyParam(ntrees=50),
                                      cluster.fun = "leiden",
                                      cluster.args = list(
                                          resolution = 0.3,
                                          objective_function = "modularity")))

We can visually inspect the spatial arrangement of the clusters. We see that they are spatially “clustered”.

Show the code
from sklearn_ann.kneighbors.annoy import AnnoyTransformer

np.random.seed(123)
#compute a PCA on the 
sc.pp.pca(adata, n_comps = 50, zero_center = True, svd_solver = "arpack")
#compute the neighbours
sc.pp.neighbors(adata, use_rep = "X_pca", knn = True, n_pcs = 10, transformer=AnnoyTransformer(n_neighbors=20, n_trees=50))
#compute leiden clustering
sc.tl.leiden(adata, resolution = 0.3, flavor = "igraph", objective_function = "modularity")
Show the code
plotSpatialFeature(sfe_tissue,
  "cluster",
  colGeometryName = "spotPoly"
)

Show the code
sq.pl.spatial_scatter(adata, color="leiden", palette=ListedColormap(cmap_discrete.colors[:len(np.unique(adata.obs["leiden"]))]), library_id="spatial", title="Clusters", shape=None, size=55)
plt.gca().set_axis_off()
plt.show()

Metrics such as the join count statistics can be use to analysis spatial arrangements of categorical data. In essence, the join count statistic calculates the frequency of categories among neighbours and compares this value with a theoretical distribution or permutations of the labels to get a significance score (Cliff and Ord 1981).

Here we will use an extension to multiple categories joincount.multi. The test gives information if the different clusters are in contact with itself and each other. The output shows the calculated number of contacts vs. the expected number, i.e., if the clusters were randomly assigned to the locations. The function does not report p-values, but they can be calculated at a specified significance level using the z-values.

Show the code
joincount.multi(colData(sfe_tissue)$cluster,
             colGraph(sfe_tissue, "binary"))
     Joincount Expected Variance z-value
1:1    557.000  186.832  107.930  35.631
2:2    928.000  386.413  165.885  42.050
3:3    340.000  106.948   71.591  27.544
4:4    343.000   62.437   46.257  41.252
2:1     58.000  539.232  311.201 -27.279
3:1    133.000  284.046  188.219 -11.010
3:2    188.000  408.244  243.208 -14.123
4:1     30.000  217.212  148.797 -15.347
4:2      2.000  312.187  190.246 -22.489
4:3     89.000  164.448  119.562  -6.900
Jtot   500.000 1925.369  496.929 -63.941
Show the code
sq.gr.interaction_matrix(adata, "leiden", normalized = False, connectivity_key="spatial", weights = False)
df_interactions = pd.DataFrame(adata.uns["leiden_interactions"], columns=np.unique(adata.obs["leiden"]), index=np.unique(adata.obs["leiden"]))
# add lower triangular matrix (w/o diagonal) to the dataframe and divide by 2
array_join_counts = (df_interactions + np.tril(df_interactions, k = -1).T)/2
#only print the upper triangular matrix
np.triu(array_join_counts)
array([[529.,  11.,  93.,  38.,  13.],
       [  0., 743., 167., 159.,   3.],
       [  0.,   0., 343.,  68.,  72.],
       [  0.,   0.,   0.,  91.,  11.],
       [  0.,   0.,   0.,   0., 327.]])

For example, we can see here that the calculated number of contacts of clusters 1 and 3 is far below the expected value. On the other hand all clusters are in contact with itself more often than expected at random.

Therefore, in Python we add the lower triangular matrix to the upper triangle (without the diagonal) and divide the resulting interaction matrix by 2 as the interactions in R are undirected.

Summary and Considerations

  • Lattice data refers to spatial data collected at fixed locations arranged in regular or irregular grids, contrasting with stochastic point pattern analysis.
  • Regular lattices have uniform, evenly spaced spots, while irregular lattices have variable sizes and shapes with non-uniform spacing.
  • The spatial weight matrix models spatial relationships between lattice units. Different methods exist to define the spatial weight matrix.
  • Global spatial autocorrelation measures, like Moran’s \(I\), summarize spatial correlation over the entire field, while local measures identify local clusters of similarity or dissimilarity.
  • There exist measures of spatial autocorrelation for continous and categorical data.

Appendix

Session info

Show the code
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.7.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] BiocSingular_1.16.0            BiocNeighbors_1.18.0          
 [3] dixon_0.0-9                    splancs_2.01-45               
 [5] sp_2.1-4                       tmap_3.3-4                    
 [7] scran_1.28.2                   rgeoda_0.0.11-1               
 [9] digest_0.6.37                  reshape2_1.4.4                
[11] STexampleData_1.8.0            ExperimentHub_2.8.1           
[13] AnnotationHub_3.8.0            BiocFileCache_2.8.0           
[15] dbplyr_2.3.4                   rlang_1.1.4                   
[17] spatstat_3.3-0                 spatstat.linnet_3.2-3         
[19] spatstat.model_3.3-3           rpart_4.1.24                  
[21] spatstat.explore_3.3-3         nlme_3.1-166                  
[23] spatstat.random_3.3-2          spatstat.geom_3.3-4           
[25] spatstat.univar_3.1-1          spatstat.data_3.1-4           
[27] SpatialExperiment_1.10.0       bluster_1.10.0                
[29] scater_1.28.0                  scuttle_1.10.3                
[31] SingleCellExperiment_1.22.0    SummarizedExperiment_1.30.2   
[33] Biobase_2.60.0                 GenomicRanges_1.52.1          
[35] GenomeInfoDb_1.36.4            IRanges_2.34.1                
[37] S4Vectors_0.38.2               BiocGenerics_0.46.0           
[39] MatrixGenerics_1.12.3          matrixStats_1.5.0             
[41] magrittr_2.0.3                 tidyr_1.3.1                   
[43] stringr_1.5.1                  spdep_1.3-8                   
[45] sf_1.0-19                      spData_2.3.3                  
[47] SFEData_1.2.0                  SpatialFeatureExperiment_1.2.3
[49] Voyager_1.2.7                  patchwork_1.3.0               
[51] ggplot2_3.5.1                  dplyr_1.1.4                   

loaded via a namespace (and not attached):
  [1] spatstat.sparse_3.1-0         bitops_1.0-9                 
  [3] httr_1.4.7                    RColorBrewer_1.1-3           
  [5] tools_4.3.1                   R6_2.5.1                     
  [7] HDF5Array_1.28.1              mgcv_1.9-1                   
  [9] anndata_0.7.5.6               rhdf5filters_1.12.1          
 [11] withr_3.0.2                   gridExtra_2.3                
 [13] leaflet_2.2.2                 leafem_0.2.3                 
 [15] cli_3.6.3                     labeling_0.4.3               
 [17] proxy_0.4-27                  dbscan_1.2-0                 
 [19] R.utils_2.12.3                dichromat_2.0-0.1            
 [21] scico_1.5.0                   limma_3.56.2                 
 [23] rstudioapi_0.17.1             RSQLite_2.3.9                
 [25] generics_0.1.3                crosstalk_1.2.1              
 [27] Matrix_1.5-4.1                ggbeeswarm_0.7.2             
 [29] abind_1.4-8                   R.methodsS3_1.8.2            
 [31] terra_1.8-5                   lifecycle_1.0.4              
 [33] yaml_2.3.10                   edgeR_3.42.4                 
 [35] rhdf5_2.44.0                  tmaptools_3.1-1              
 [37] grid_4.3.1                    blob_1.2.4                   
 [39] promises_1.3.2                dqrng_0.4.1                  
 [41] crayon_1.5.3                  dir.expiry_1.8.0             
 [43] lattice_0.22-6                beachmat_2.16.0              
 [45] KEGGREST_1.40.1               magick_2.8.5                 
 [47] pillar_1.10.1                 knitr_1.49                   
 [49] metapod_1.7.0                 rjson_0.2.23                 
 [51] boot_1.3-28.1                 codetools_0.2-19             
 [53] wk_0.9.4                      glue_1.8.0                   
 [55] vctrs_0.6.5                   png_0.1-8                    
 [57] gtable_0.3.6                  assertthat_0.2.1             
 [59] cachem_1.1.0                  xfun_0.50                    
 [61] S4Arrays_1.0.6                mime_0.12                    
 [63] DropletUtils_1.20.0           units_0.8-5                  
 [65] statmod_1.5.0                 interactiveDisplayBase_1.38.0
 [67] bit64_4.5.2                   filelock_1.0.3               
 [69] irlba_2.3.5.1                 vipor_0.4.7                  
 [71] KernSmooth_2.23-26            colorspace_2.1-1             
 [73] DBI_1.2.3                     zellkonverter_1.9.0          
 [75] raster_3.6-30                 tidyselect_1.2.1             
 [77] bit_4.5.0.1                   compiler_4.3.1               
 [79] curl_6.1.0                    basilisk.utils_1.12.1        
 [81] DelayedArray_0.26.7           scales_1.3.0                 
 [83] classInt_0.4-10               rappdirs_0.3.3               
 [85] goftest_1.2-3                 spatstat.utils_3.1-2         
 [87] rmarkdown_2.29                basilisk_1.11.2              
 [89] XVector_0.40.0                htmltools_0.5.8.1            
 [91] pkgconfig_2.0.3               base64enc_0.1-3              
 [93] sparseMatrixStats_1.12.2      fastmap_1.2.0                
 [95] htmlwidgets_1.6.4             shiny_1.10.0                 
 [97] DelayedMatrixStats_1.22.6     farver_2.1.2                 
 [99] jsonlite_1.8.9                BiocParallel_1.34.2          
[101] R.oo_1.27.0                   RCurl_1.98-1.16              
[103] GenomeInfoDbData_1.2.10       s2_1.1.7                     
[105] Rhdf5lib_1.22.1               munsell_0.5.1                
[107] Rcpp_1.0.13-1                 ggnewscale_0.5.0             
[109] viridis_0.6.5                 reticulate_1.40.0            
[111] stringi_1.8.4                 leafsync_0.1.0               
[113] zlibbioc_1.46.0               plyr_1.8.9                   
[115] parallel_4.3.1                ggrepel_0.9.6                
[117] deldir_2.0-4                  Biostrings_2.68.1            
[119] stars_0.6-7                   splines_4.3.1                
[121] tensor_1.5                    locfit_1.5-9.10              
[123] igraph_2.1.2                  ScaledMatrix_1.8.1           
[125] BiocVersion_3.17.1            XML_3.99-0.18                
[127] evaluate_1.0.1                BiocManager_1.30.25          
[129] httpuv_1.6.15                 purrr_1.0.2                  
[131] polyclip_1.10-7               rsvd_1.0.5                   
[133] lwgeom_0.2-14                 xtable_1.8-4                 
[135] e1071_1.7-16                  RSpectra_0.16-2              
[137] later_1.4.1                   viridisLite_0.4.2            
[139] class_7.3-22                  tibble_3.2.1                 
[141] memoise_2.0.1                 beeswarm_0.4.0               
[143] AnnotationDbi_1.62.2          cluster_2.1.4                
[145] BiocStyle_2.28.1             

©2024 The pasta authors. Content is published under Creative Commons CC-BY-4.0 License for the text and GPL-3 License for any code.

References

Anselin, Luc. 1995. “Local Indicators of Spatial AssociationLISA.” Geographical Analysis 27 (2): 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x.
Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns. 1st ed. CRC Interdisciplinary Statistics Series. CRC Press, Taylor & Francis Group.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological) 57 (1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Bivand, Roger. 2022. “R Packages for Analyzing Spatial Data: A Comparative Case Study with Areal Data.” Geographical Analysis 54 (3): 488–518. https://doi.org/10.1111/gean.12319.
Cliff, Andrew David, and J Keith Ord. 1981. Spatial Processes: Models & Applications. Pion, London.
Getis, Arthur. 2009. “Spatial Weights Matrices.” Geographical Analysis 41 (4): 404–10. https://doi.org/10.1111/j.1538-4632.2009.00768.x.
Li, Zhuoxuan, Tianjie Wang, Pentao Liu, and Yuanhua Huang. 2023. SpatialDM for Rapid Identification of Spatially Co-Expressed Ligand–Receptor and Revealing Cell–Cell Communication Patterns.” Nature Communications 14 (1): 3995. https://doi.org/10.1038/s41467-023-39608-w.
McKellar, David W., Lauren D. Walter, Leo T. Song, Madhav Mantri, Michael F. Z. Wang, Iwijn De Vlaminck, and Benjamin D. Cosgrove. 2021. “Large-Scale Integration of Single-Cell Transcriptomic Data Captures Transitional Progenitor States in Mouse Skeletal Muscle Regeneration.” Communications Biology 4 (1): 1–12. https://doi.org/10.1038/s42003-021-02810-x.
Moran, P. A. P. 1950. “Notes on Continuous Stochastic Phenomena.” Biometrika 37 (1/2): 17–23. https://doi.org/10.2307/2332142.
Moses, Lambda, Pétur Helgi Einarsson, Kayla C. Jackson, Laura Luebbert, Ali Sina Booeshaghi, Sindri Emmanúel Antonsson, Nicolas Bray, Páll Melsted, and Lior Pachter. 2023. “Voyager: Exploratory Single-Cell Genomics Data Analysis with Geospatial Statistics.” bioRxiv. https://doi.org/10.1101/2023.07.20.549945.
Palla, Giovanni, Hannah Spitzer, Michal Klein, David Fischer, Anna Christina Schaar, Louis Benedikt Kuemmerle, Sergei Rybakov, et al. 2022. “Squidpy: A Scalable Framework for Spatial Omics Analysis.” Nature Methods 19 (2): 171–78. https://doi.org/10.1038/s41592-021-01358-2.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. 1st ed. New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
Rey, Sergio J., and Luc Anselin. 2010. PySAL: A Python Library of Spatial Analytical Methods.” In Handbook of Applied Spatial Analysis, 175–93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03647-7_11.
Righelli, Dario, Lukas M Weber, Helena L Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron T L Lun, Stephanie C Hicks, and Davide Risso. 2022. SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in R Using Bioconductor.” Bioinformatics 38 (11): 3128–31. https://doi.org/10.1093/bioinformatics/btac299.
Ståhl, Patrik L., Fredrik Salmén, Sanja Vickovic, Anna Lundmark, José Fernández Navarro, Jens Magnusson, Stefania Giacomello, et al. 2016. “Visualization and Analysis of Gene Expression in Tissue Sections by Spatial Transcriptomics.” Science 353 (6294): 78–82. https://doi.org/10.1126/science.aaf2403.
Traag, Vincent A, Ludo Waltman, and Nees Jan Van Eck. 2019. “From Louvain to Leiden: Guaranteeing Well-Connected Communities.” Scientific Reports 9 (1): 1–12. https://doi.org/10.1038/s41598-019-41695-z.
Virshup, Isaac, Sergei Rybakov, Fabian J Theis, Philipp Angerer, and F Alexander Wolf. 2024. “Anndata: Access and Store Annotated Data Matrices.” Journal of Open Source Software 9 (101): 4371. https://doi.org/10.21105/joss.04371.
Wartenberg, Daniel. 1985. “Multivariate Spatial Correlation: A Method for Exploratory Geographical Analysis.” Geographical Analysis 17 (4): 263–83. https://doi.org/10.1111/j.1538-4632.1985.tb00849.x.
Zuur, Alain F., Elena N. Ieno, and Graham M. Smith. 2007. Analysing Ecological Data. Statistics for Biology and Health. New York: Springer.