Published

January 30, 2025

Lattice data analysis – univariate methods for HTS-based data

In this vignette we will show:

Dependencies

Show the code
source("utils.R")
theme_set(theme_light())
Show the code
import numpy as np
import scanpy as sc
import squidpy as sq
from esda.moran import Moran, Moran_Local
from esda.geary import Geary
from esda import Geary_Local, G_Local, LOSH
from esda.getisord import G
from scipy.stats import false_discovery_control
from libpysal.weights import W
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import warnings

warnings.filterwarnings("ignore")

plt.rcParams['font.family'] = 'Futura'
df_cmap_discrete = pd.read_csv("../misc/ditto_colors.csv", index_col=0)
df_cmap_continuous = pd.read_csv("../misc/roma_colors.csv", index_col=0)
cmap_discrete = ListedColormap(df_cmap_discrete["ditto_colors"].values, "ditto")
cmap_continuous = LinearSegmentedColormap.from_list("roma", df_cmap_continuous["roma_colors"]).reversed()

#load the dataset and store as anndata object
adata = sq.datasets.visium("Visium_FFPE_Mouse_Kidney")
# normalise counts already here that both R and python have same values
adata.raw = adata.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# save file as h5ad
adata.write("../data/adata_neighbors.h5ad")

Setup and Preprocessing

Show the code
library(SpatialExperiment)

sce <- zellkonverter::readH5AD("../data/adata_neighbors.h5ad")
colnames(reducedDim(sce, "spatial")) <- c("x", "y")
spe <- toSpatialExperiment(sce)

spatialCoords(spe) <- reducedDim(sce, "spatial")

sfe <- SpatialFeatureExperiment::toSpatialFeatureExperiment(spe)

# the counts are already normalised
assayNames(sfe) <- "logcounts"

sfe
class: SpatialFeatureExperiment 
dim: 19465 3124 
metadata(2): log1p spatial
assays(1): logcounts
rownames(19465): Xkr4 Rp1 ... Gm3376 Gm20830
rowData names(3): gene_ids feature_types genome
colnames(3124): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ...
  TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
colData names(4): in_tissue array_row array_col sample_id
reducedDimNames(1): spatial
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : x y
imgData names(0):

unit:
Geometries:
colGeometries: centroids (POINT) 

Graphs:
sample01: 
Show the code
# invert the spatial coordinates
adata.obsm["spatial"][:, 1] = adata.obsm["spatial"][:, 1].max() - adata.obsm["spatial"][:, 1]
adata
AnnData object with n_obs × n_vars = 3124 × 19465
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial', 'log1p'
    obsm: 'spatial'

Given this data from 10x we choose two genes to analyse henceforth, named Slc5a12 (solute carrier family 5, member 12) and Calr (calreticulin) (10x Genomics 2021). For further information on the preprocessing of visium datasets, please refer to the Voyager 10X Visium vignette.

Here we set the arguments for the examples below.

Show the code
features <- c("Slc5a12", "Calr")
colGraphName <- "visium"
colGeometryName <- "centroids"
pointsize <- 3
Show the code
# predefine genes
features = ["Slc5a12", "Calr"]
figsize = (len(features)*5, 5)
pointsize = 25

Regular lattice and spatial weight matrix

Spot based data is collected along a regular spaced grid where all sample areas have the same size. Such a grid is also called a regular lattice. In more rigorous terms the data \(Y\) is a realisation of a random variable fixed along a lattice \(D\). The lattice \(D\) does not have to be regular but in the scope of spot based data it is. Spot-based data is generated by a defined sampling strategy, whereas point pattern data is the result of a stochastic process (Zuur, Ieno, and Smith 2007).

The lattice is composed of individual spatial units

\[D = \{A_1, A_2,...,A_n\}\]

where these units do not overlap

\[A_i \cap A_j = \emptyset \forall i \neq j\]

The data is then a random variable of the spatial unit along the lattice

\[Y_i = Y(A_i)\]

Most lattice data analysis technique build on the concept of neighbours. Therefore, the spatial relationship can be modelled with e.g. a spatial weight matrix \(W\). There are a lot of ways to define a spatial weight matrix \(W\). For example, the spatial weight matrix can be defined as a binary contiguity matrix where adjacent units are indicated by one and non-adjacent units as zero.

\[w_{ij} = \begin{cases} 1 \text{ if } A_i \text{ and } A_j \text{ are adjacent}\\ 0 \text{ otw} \end{cases}\]

other options to specify the weight matrix \(W\) are mentioned in Zuur, Ieno, and Smith (2007).

Voyager has a special function for the construction of the weight matrix in Visium data findVisiumGraph. This function defines the neigbours to be the surrounding spots.

Show the code
# there is an empty spot we have to exclude as it has no neighbours
sfe <- sfe[, colnames(sfe) != "GAGATCTGCTTGGCAT-1"]

# construct the weight matrix using the Voyager function
colGraph(sfe, "visium") <- findVisiumGraph(sfe)
colGraph(sfe, "binary") <- findVisiumGraph(sfe, style = "B")

plotColGraph(sfe,
  colGraphName = colGraphName
) + theme_void()

Show the code
sq.gr.spatial_neighbors(adata, n_neighs=6, coord_type="grid")
# remove unconnected spots
adata = adata[adata.obsp["spatial_connectivities"].sum(1).A1 > 0, :]
# calculate spatial weights
spatial_weights = W.from_sparse(adata.obsp["spatial_connectivities"])

# define a wider neighbourhood graph - result is more smoothed
sq.gr.spatial_neighbors(adata, n_neighs=50, coord_type="grid", key_added="wide_spatial")
spatial_weights_wide = W.from_sparse(adata.obsp["wide_spatial_connectivities"])

fig, ax = plt.subplots(1, 1, figsize=(7, 7), layout = "tight")
sq.pl.spatial_scatter(
    adata[adata.obsp["spatial_connectivities"].nonzero()[0], :],
    connectivity_key="spatial_connectivities",
    size=0.1,
    na_color="black",
    edges_color="black",
    edges_width=0.1,
    shape=None,
    library_id="spatial",
    ax=ax,
    fig=fig,
)

ax.set_axis_off()
fig.suptitle("Connectivity grid", size=10, y=0.9)
fig.set_dpi(200)

Global Measures

Global methods give us an overview over the entire field-of-view and summarize spatial autocorrelation metrics to a single value. The metrics are a function of the weight matrix and the variables of interest. The variables of interest can be gene expression, intensity of a marker or the area of the cell. The global measures can be seen as a weighted average of the local metric, as explained below.

In general, a global spatial autocorrelation measure has the form of a double sum over all locations \(i,j\)

\[\sum_i \sum_j f(x_i,x_j) w_{ij}\]

where \(f(x_i,x_j)\) is the measure of association between features of interest and \(w_{ij}\) scales the relationship by a spatial weight as defined in the weight matrix \(W\). If \(i\) and \(j\) are not neighbours, i.e. we assume they do not have any spatial association, the corresponding element of the weight matrix is \(0\) (i.e., \(w_{ij} = 0\)). Below we will see that the function \(f\) varies between the different spatial autocorrelation measures (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Anselin 2024).

Global Moran’s \(I\) coefficient

The global Moran’s \(I\) (Moran 1950) coefficient is a measure of spatial autocorrelation, defined as:

\[I = \frac{n}{\sum_i\sum_j w_{ij}} \frac{\sum_i\sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}.\]

where \(x_i\) and \(x_j\) represent the values of the variable of interest at locations \(i\) and \(j\), \(\bar{x}\) is the mean of all \(x\) and \(w_{ij}\) is the spatial weight between the locations of \(i\) and \(j\). The expected value is close to \(0\) for large \(n\) (\(\mathbb{E}(I) = -1/(n-1)\)), whereas a value higher than this expectation indicates spatial autocorrelation. Negative values indicate negative autocorrelation. Moran’s \(I\) can be interpreted as the Pearson correlation between the value at location \(i\) and the average value of the neigbours of \(i\), (neighbours as defined in the weight matrix \(W\)) (Moran 1950; Bivand, Pebesma, and Gómez-Rubio 2013, 258–61; Cliff and Ord 1981, 21).

Show the code
resMoran <- calculateMoransI(
  sfe,
  features = features,
  colGraphName = colGraphName,
  exprs_values = "logcounts"
)

We can also use the moran.mc function to calculate the Moran’s \(I\) coefficient. This function uses a Monte Carlo simulation to calculate the p-value.

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     exprs_values = "logcounts",
                     type = "moran.mc",
                     nsim = 200)

res <- rowData(sfe)[features,]
res
DataFrame with 2 rows and 9 columns
                  gene_ids   feature_types   genome moran.mc_statistic_sample01
               <character>        <factor> <factor>                   <numeric>
Slc5a12 ENSMUSG00000041644 Gene Expression     mm10                    0.551439
Calr    ENSMUSG00000003814 Gene Expression     mm10                    0.265931
        moran.mc_parameter_sample01 moran.mc_p.value_sample01
                          <numeric>                 <numeric>
Slc5a12                         201                0.00497512
Calr                            201                0.00497512
        moran.mc_alternative_sample01 moran.mc_method_sample01
                          <character>              <character>
Slc5a12                       greater   Monte-Carlo simulati..
Calr                          greater   Monte-Carlo simulati..
                          moran.mc_res_sample01
                                         <list>
Slc5a12 -0.01113331,-0.01143513,-0.00125527,...
Calr    -0.01351057, 0.00701399,-0.00633325,...
Show the code
for feature in features:
    moran = Moran(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100)
    adata.uns[f"moran_{feature}"] = moran.I
    adata.uns[f"moran_{feature}_p_sim"] = moran.p_sim

for key in filter(lambda x: x.startswith("moran_"), adata.uns.keys()):
    print(f"{key}: {adata.uns[key].round(4)}")
moran_Slc5a12: 0.5514
moran_Slc5a12_p_sim: 0.0099
moran_Calr: 0.2659
moran_Calr_p_sim: 0.0099

The result of global Moran’s \(I\) for the gene Slc5a12 is 0.551, the associated p-values is 0.005. This means that gene Slc5a12 shows positive autocorrelation

The result of global Moran’s \(I\) for the gene Calr is 0.266, the associated p-values is 0.005. This means that gene Calr shows positive autocorrelation

One should note that the result is dependent on the weight matrix. Different weight matrices will give different results. To compare Moran’s \(I\) coefficients of different features, we need to use the same weight matrix.

Global’s Geary’s \(C\) coefficient

Geary’s \(C\) (Geary 1954) is a different measure of global autocorrelation and is very closely related to Moran’s \(I\). However, it focuses on spatial dissimilarity rather than similarity. Geary’s \(C\) is defined by

\[C = \frac{(n-1) \sum_i \sum_j w_{ij}(x_i-x_j)^2}{2\sum_i \sum_j w_{ij}\sum_i(x_i-\bar{x})^2}\]

where \(x_i\) and \(x_j\) represent the values of the variable of interest at locations \(i\) and \(j\), \(\bar{x}\) is the mean of all \(x\), \(w_{ij}\) is the spatial weight between the locations of \(i\) and \(j\) and \(n\) the total number of locations. The interpretation is opposite to Moran’s \(I\): a value smaller than \(1\) indicates positive autocorrelation whereas a value greater than \(1\) represents negative autocorrelation (Cliff and Ord 1981, 17; Pebesma and Bivand 2023; Fischer, Getis, et al. 2010, 255–65).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     nsim = 200,
                     type = "geary.mc")

res <- rowData(sfe)[features,]
res
DataFrame with 2 rows and 15 columns
                  gene_ids   feature_types   genome moran.mc_statistic_sample01
               <character>        <factor> <factor>                   <numeric>
Slc5a12 ENSMUSG00000041644 Gene Expression     mm10                    0.551439
Calr    ENSMUSG00000003814 Gene Expression     mm10                    0.265931
        moran.mc_parameter_sample01 moran.mc_p.value_sample01
                          <numeric>                 <numeric>
Slc5a12                         201                0.00497512
Calr                            201                0.00497512
        moran.mc_alternative_sample01 moran.mc_method_sample01
                          <character>              <character>
Slc5a12                       greater   Monte-Carlo simulati..
Calr                          greater   Monte-Carlo simulati..
                          moran.mc_res_sample01 geary.mc_statistic_sample01
                                         <list>                   <numeric>
Slc5a12 -0.01113331,-0.01143513,-0.00125527,...                    0.445674
Calr    -0.01351057, 0.00701399,-0.00633325,...                    0.737407
        geary.mc_parameter_sample01 geary.mc_p.value_sample01
                          <numeric>                 <numeric>
Slc5a12                           1                0.00497512
Calr                              1                0.00497512
        geary.mc_alternative_sample01 geary.mc_method_sample01
                          <character>              <character>
Slc5a12                       greater   Monte-Carlo simulati..
Calr                          greater   Monte-Carlo simulati..
                 geary.mc_res_sample01
                                <list>
Slc5a12    0.97882,1.01143,1.00015,...
Calr    1.017143,0.995712,1.003311,...
Show the code
for feature in features:
    geary = Geary(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100)
    adata.uns[f"geary_{feature}"] = geary.C
    adata.uns[f"geary_{feature}_p_sim"] = geary.p_sim

for key in filter(lambda x: x.startswith("geary_"), adata.uns.keys()):
    print(f"{key}: {adata.uns[key].round(4)}")
geary_Slc5a12: 0.4457
geary_Slc5a12_p_sim: 0.0099
geary_Calr: 0.7374
geary_Calr_p_sim: 0.0099

The result of global Geary’s \(C\) for the gene Slc5a12 is 0.446, the associated p-values is 0.005. This means that gene Slc5a12 shows positive autocorrelation

The result of global Geary’s \(C\) for the gene Calr is 0.737, the associated p-values is 0.005. This means that gene Calr shows positive autocorrelation

Global Getis-Ord \(G\) statistic

The global \(G\) (Getis and Ord 1992) statistic is a generalisation of the local version (see below) and summarises the contributions of all pairs of values \((x_i, x_j)\) in the dataset. Formally that is

\[G(d) = \frac{\sum_{i = 1}^n \sum_{j=1}^n w_{ij}(d)x_ix_j}{\sum_{i = 1}^n \sum_{j=1}^n x_i x_j} \text{s.t } j \neq i.\]

The interpretation works by calculating a z-statistic to compare the observed with the expected \(G(d)\) statistic. If this value is positive, we refer to this as a hot spot and if it is negative as a cold spot (Getis and Ord 1992; Anselin 2024).

The global \(G(d)\) statistic is very similar to global Moran’s \(I\). The global \(G(d)\) statistic is based on the sum of the products of the data points whereas global Moran’s \(I\) is based on the sum of the covariances. Since these two approaches capture different aspects of a structure, their values will differ as well. A good approach would be to not use one statistic in isolation but rather consider both (Getis and Ord 1992).

This framework was established for binary weights but can be extended to nonbinary weights as well (J. K. Ord and Getis 1995). We will use the spdep package directly to calculate the global \(G\) statistic.

Show the code
# Get the weight matrix from sfe object
weights_neighbourhoods_binary <- colGraph(sfe, 'binary')
# Change it to binary weights
weights_neighbourhoods_binary$style <- "B" 
# Calculate the global G statistic
res <- spdep::globalG.test(x = logcounts(sfe)[features[1],], 
                    listw = weights_neighbourhoods_binary)
res 

    Getis-Ord global G statistic

data:  logcounts(sfe)[features[1], ] 
weights: weights_neighbourhoods_binary   

standard deviate = 44.785, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      2.803149e-03       1.847999e-03       4.548509e-10 
Show the code
for feature in features:
    getisord = G(adata[:, feature].X.toarray(), spatial_weights, permutations=0)
    print("Global G statistic", feature, getisord.z_norm)
    print("Global G statistic (raw)", feature, getisord.G)
    print("Global G statistic (expected)", feature, getisord.EG)
    print("p_norm", feature, getisord.p_norm)
Global G statistic Slc5a12 44.80918977439834
Global G statistic (raw) Slc5a12 0.002803149147715426
Global G statistic (expected) Slc5a12 0.0018479988627699306
p_norm Slc5a12 0.0
Global G statistic Calr 17.704129082439636
Global G statistic (raw) Calr 0.0018533756524448373
Global G statistic (expected) Calr 0.0018479988627699306
p_norm Calr 0.0

The result of global Getis-Ord \(G\) for the gene Slc5a12 is 0.0028, the z-statistic is 44.78545 the associated p-values is 0. This means that gene Slc5a12 shows a hot spot

Local measures

Unlike global measures that give an overview over the entire field of view, local measures report information about the statistic at each location (cell). There exist local analogs of Moran’s \(I\) and Geary’s \(C\) for which the global statistic can be represented as a weighted sum of the local statistics. As above, the local coefficients are based on both the spatial weights matrix and the values of the measurement of interest (Anselin 1995).

Local Moran’s \(I\) coefficient

The local Moran’s \(I\) coefficient (Anselin 1995) is a measure of spatial autocorrelation at each location of interest. It is defined as:

\[I_i = \frac{x_i - \bar{x}}{\sum_{k=1}^n(x_k-\bar{x})^2/(n-1)} \sum_{j=1}^n w_{ij}(x_j - \bar{x})\]

where the index \(i\) refers to the location for which the measure is calculated. The interpretation is analogous to the global Moran’s \(I\) where a value of \(I_i\) higher than \(\mathbb{E}(I_i) = -w_i/(n-1)\) indicates spatial autocorrelation; smaller values indicate negative autocorrelation (Anselin 1995). It is important to note that, as for the global counterpart, the value of local Moran’s \(I\) could be a result from both the high or low end of the values. Since we measure and test a large number of locations simultaneously, we need to correct for multiple testing (e.g., using the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995)) (Pebesma and Bivand 2023).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     type = "localmoran")

plotLocalResult(sfe, "localmoran",
                features = features, ncol = 2,
                colGeometryName = colGeometryName,
                divergent = TRUE, diverge_center = 0, size = pointsize)

Show the code
for feature in features:
    local_moran = Moran_Local(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100, seed=3407)
    adata.obs[f"local_moran_{feature}"] = local_moran.Is
    adata.obs[f"local_moran_{feature}_p"] = -np.log10(false_discovery_control(local_moran.p_z_sim*2))
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"local_moran_{feature}",
        cmap=cmap_continuous,
        vmin=-adata.obs[f"local_moran_{feature}"].abs().max(),
        vmax=adata.obs[f"local_moran_{feature}"].abs().max(),
        vcenter=0,
        shape=None,
        library_id="spatial",
        title=f"Local Moran's Is: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize,
        scalebar_kwargs={"height_fraction":0.5}
     )
    ax.set_axis_off()

Local Geary’s \(C\) coefficient

Similar to local Moran’s \(I\), there is a local Geary’s \(C\) (Anselin 1995) coefficient. It is defined as

\[C_i = \sum_{j=1}^n w_{ij}(x_i-x_j)^2\]

The interpretation is analogous to the global Geary’s \(C\) (value less than \(1\) indicates positive autocorrelation, a value greater than \(1\) highlights negative autocorrelation) (Anselin 1995).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     type = "localC")

plotLocalResult(sfe, "localC",
                features = features, ncol = 2,
                colGeometryName = colGeometryName,
                divergent = TRUE, diverge_center = 0, size = pointsize)

Show the code
for feature in features:
    local_geary = Geary_Local(connectivity=spatial_weights, permutations=100, seed=3407)
    local_geary_result = local_geary.fit(adata[:, feature].X.toarray().astype(np.float64))
    adata.obs[f"local_geary_{feature}"] = local_geary_result.localG
    adata.obs[f"local_geary_{feature}_p"] = -np.log10(false_discovery_control(local_geary_result.p_sim))
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"local_geary_{feature}",
        cmap=cmap_continuous,
        vmin=-adata.obs[f"local_geary_{feature}"].abs().max(),
        vmax=adata.obs[f"local_geary_{feature}"].abs().max(),
        vcenter=0,
        shape=None,
        library_id="spatial",
        title=f"Local Geary's Cs: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize
    )
    ax.set_axis_off()

Local Getis-Ord statistic

The local Getis-Ord \(G_i\) (J. K. Ord and Getis 1995; Getis and Ord 1992) statistic quantifies the weighted concentration of points within a radius \(d\) and in a local region \(i\), according to:

\[G_i(d) = \frac{\sum_{j \neq i } w_{ij}(d)x_j}{\sum_{j \neq i} x_j}\]

There is a variant of this statistic, \(G_i^*(d)\), which is the same as \(G_i(d)\) except that the contribution when \(j=i\) is included in the term (Getis and Ord 1996).

The interpretation of the local Getis-Ord statistic \(G_i(d)\) is the same as for the global variant. The easiest is to calculate a z-statistic. If this value is positive, we call the region a hotspot, if it is negative, we call it a cold spot (Anselin 2024).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     include_self = FALSE, # this would specify G_i^*(d),
                     colGraphName = "binary",
                     type = "localG")

plotLocalResult(sfe, "localG",
                features = features,
                ncol = 2,
                colGeometryName = colGeometryName,
                divergent = TRUE, diverge_center = 0, size = pointsize)

Show the code
for feature in features:
    local_getis_ord = G_Local(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transform="B", star=False, permutations=1000, seed=3407)
    adata.obs[f"local_getis_ord_{feature}"] = local_getis_ord.Zs
    adata.obs[f"local_getis_ord_{feature}_p"] = -np.log10(false_discovery_control(local_getis_ord.p_norm))
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"local_getis_ord_{feature}",
        cmap=cmap_continuous,
        vmin=-adata.obs[f"local_getis_ord_{feature}"].abs().max(),
        vmax=adata.obs[f"local_getis_ord_{feature}"].abs().max(),
        vcenter=0,
        shape=None,
        library_id="spatial",
        title=f"Local Getis-Ord $G_i$: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize
    )
    ax.set_axis_off()

The results above give an estimate of the local Getis-Ord statistic for each cell, but no significance value. This can be done by using a permutation approach using the localG_perm argument.

Positive values indicate clustering of high values, i.e., hot spots, and negative values indicate clustering of low values, i.e., cold spots. The method does not detect outlier values because, unlike in local Moran’s \(I\), there is no cross-product between \(i\) and \(j\). But unlike local Moran’s \(I\), we know the type of interaction (high-high or low-low) between \(i\) and \(j\) (Anselin, Syabri, and Kho 2009; Anselin 2024).

Local spatial heterosceadiscity (LOSH)

The local spatial heteroscedasticity (LOSH) (J. Keith Ord and Getis 2012) is a measure of spatial autocorrelation that is based on the variance of the local neighbourhood. Unlike the other measures, this method does not assume homoscedastic variance over the whole tissue region. LOSH is defined as:

\[H_i(d) = \frac{\sum_j w_{ij}(d)|e_j(d)|^a}{\sum_j w_{ij}(d)}\]

where \(e_j(d) = x_j - \bar{x}_i(d), j \in N(i,d)\) are the local residuals that are subtracted from the local mean. The power \(a\) modulates the interpretation of the residuals (\(a=1\): residuals are interpreted as absolute deviations from the local mean; \(a=2\): residuals are interpreted as deviations from the local variance) (J. Keith Ord and Getis 2012).

The LOSH should be interpreted in combination with the local Getis-Ord \(G_i^*\) statistic. The \(G_i^*\) quantifies the local mean of the variable of interest, while \(H_i\) quantifies the local variance. This table provided by Ord and Getis (J. Keith Ord and Getis 2012) summarizes the interpretation of the combination of \(G_i^*\) and \(H_i\).

high \(H_i\) low \(H_i\)
large \(\|G_i^*\|\) A hot spot with heterogeneous local conditions A hot spot with similar surrounding areas; the map would indicate whether the affected region is larger than the single “cell”
small \(\|G_i^*\|\) Heterogeneous local conditions but at a low average level (an unlikely event) Homogeneous local conditions and a low average level
Show the code
# run localG with permutation test
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     type = "LOSH")


plotLocalResult(sfe, "LOSH",
                features = features,
                colGeometryName = colGeometryName,
                size = pointsize)

Show the code
for feature in features:
    losh = LOSH(connectivity=spatial_weights,  inference="chi-square")
    losh_result = losh.fit(adata[:, feature].X.toarray().astype(np.float64))
    adata.obs[f"losh_{feature}"] = losh_result.Hi
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"losh_{feature}",
        cmap="Blues",
        vmin=0,
        vmax=adata.obs[f"losh_{feature}"].abs().max(),
        shape=None,
        library_id="spatial",
        title=f"Local spatial heteroscedasticity: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize
    )
    ax.set_axis_off()

A note of caution

The local methods presented above should always be interpreted with care, since we face the problem of multiple testing when calculating them for each cell. Moreover, the presented methods should mainly serve as exploratory measures to identify interesting regions in the data. Multiple processes can lead to the same pattern, thus from identifying the pattern we cannot infer the underlying process. Indication of clustering does not explain why this occurs. On the one hand, clustering can be the result of spatial interaction between the variables of interest. We have an accumulation of a gene of interest in one region of the tissue. On the other hand clustering can be the result spatial heterogeneity, when local similarity is created by structural heterogeneity in the tissue, e.g., that cells with uniform expression of a gene of interest are grouped together which then creates the apparent clustering of the gene expression measurement (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Anselin 2024).

Appendix

Session info

Show the code
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.7.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] dixon_0.0-9                    splancs_2.01-45               
 [3] sp_2.1-4                       bluster_1.10.0                
 [5] magrittr_2.0.3                 stringr_1.5.1                 
 [7] spdep_1.3-8                    spData_2.3.3                  
 [9] tmap_3.3-4                     scater_1.28.0                 
[11] scran_1.28.2                   scuttle_1.10.3                
[13] SFEData_1.2.0                  SpatialFeatureExperiment_1.2.3
[15] Voyager_1.2.7                  rgeoda_0.0.11-1               
[17] digest_0.6.37                  sf_1.0-19                     
[19] reshape2_1.4.4                 patchwork_1.3.0               
[21] STexampleData_1.8.0            ExperimentHub_2.8.1           
[23] AnnotationHub_3.8.0            BiocFileCache_2.8.0           
[25] dbplyr_2.3.4                   rlang_1.1.4                   
[27] ggplot2_3.5.1                  dplyr_1.1.4                   
[29] spatstat_3.3-0                 spatstat.linnet_3.2-3         
[31] spatstat.model_3.3-3           rpart_4.1.24                  
[33] spatstat.explore_3.3-3         nlme_3.1-166                  
[35] spatstat.random_3.3-2          spatstat.geom_3.3-4           
[37] spatstat.univar_3.1-1          spatstat.data_3.1-4           
[39] SpatialExperiment_1.10.0       SingleCellExperiment_1.22.0   
[41] SummarizedExperiment_1.30.2    Biobase_2.60.0                
[43] GenomicRanges_1.52.1           GenomeInfoDb_1.36.4           
[45] IRanges_2.34.1                 S4Vectors_0.38.2              
[47] BiocGenerics_0.46.0            MatrixGenerics_1.12.3         
[49] matrixStats_1.5.0             

loaded via a namespace (and not attached):
  [1] spatstat.sparse_3.1-0         bitops_1.0-9                 
  [3] httr_1.4.7                    RColorBrewer_1.1-3           
  [5] tools_4.3.1                   R6_2.5.1                     
  [7] HDF5Array_1.28.1              mgcv_1.9-1                   
  [9] rhdf5filters_1.12.1           withr_3.0.2                  
 [11] gridExtra_2.3                 leaflet_2.2.2                
 [13] leafem_0.2.3                  cli_3.6.3                    
 [15] labeling_0.4.3                proxy_0.4-27                 
 [17] dbscan_1.1-11                 R.utils_2.12.3               
 [19] dichromat_2.0-0.1             scico_1.5.0                  
 [21] limma_3.56.2                  rstudioapi_0.15.0            
 [23] RSQLite_2.3.9                 generics_0.1.3               
 [25] crosstalk_1.2.1               Matrix_1.5-4.1               
 [27] ggbeeswarm_0.7.2              abind_1.4-8                  
 [29] R.methodsS3_1.8.2             terra_1.8-5                  
 [31] lifecycle_1.0.4               yaml_2.3.10                  
 [33] edgeR_3.42.4                  rhdf5_2.44.0                 
 [35] tmaptools_3.1-1               grid_4.3.1                   
 [37] blob_1.2.4                    promises_1.3.2               
 [39] dqrng_0.4.1                   crayon_1.5.3                 
 [41] dir.expiry_1.8.0              lattice_0.22-6               
 [43] beachmat_2.16.0               KEGGREST_1.40.1              
 [45] magick_2.8.5                  pillar_1.10.1                
 [47] knitr_1.49                    metapod_1.7.0                
 [49] rjson_0.2.23                  boot_1.3-28.1                
 [51] codetools_0.2-19              wk_0.9.4                     
 [53] glue_1.8.0                    vctrs_0.6.5                  
 [55] png_0.1-8                     gtable_0.3.6                 
 [57] cachem_1.1.0                  xfun_0.50                    
 [59] S4Arrays_1.0.6                mime_0.12                    
 [61] DropletUtils_1.20.0           units_0.8-5                  
 [63] statmod_1.5.0                 interactiveDisplayBase_1.38.0
 [65] bit64_4.5.2                   filelock_1.0.3               
 [67] irlba_2.3.5.1                 vipor_0.4.7                  
 [69] KernSmooth_2.23-26            colorspace_2.1-1             
 [71] DBI_1.2.3                     zellkonverter_1.9.0          
 [73] raster_3.6-30                 tidyselect_1.2.1             
 [75] bit_4.5.0.1                   compiler_4.3.1               
 [77] curl_6.1.0                    BiocNeighbors_1.18.0         
 [79] basilisk.utils_1.12.1         DelayedArray_0.26.7          
 [81] scales_1.3.0                  classInt_0.4-10              
 [83] rappdirs_0.3.3                goftest_1.2-3                
 [85] spatstat.utils_3.1-2          rmarkdown_2.29               
 [87] basilisk_1.11.2               XVector_0.40.0               
 [89] htmltools_0.5.8.1             pkgconfig_2.0.3              
 [91] base64enc_0.1-3               sparseMatrixStats_1.12.2     
 [93] fastmap_1.2.0                 htmlwidgets_1.6.4            
 [95] shiny_1.10.0                  DelayedMatrixStats_1.22.6    
 [97] farver_2.1.2                  jsonlite_1.8.9               
 [99] BiocParallel_1.34.2           R.oo_1.27.0                  
[101] BiocSingular_1.16.0           RCurl_1.98-1.16              
[103] GenomeInfoDbData_1.2.10       s2_1.1.7                     
[105] Rhdf5lib_1.22.1               munsell_0.5.1                
[107] Rcpp_1.0.13-1                 ggnewscale_0.5.0             
[109] viridis_0.6.5                 reticulate_1.40.0            
[111] stringi_1.8.4                 leafsync_0.1.0               
[113] zlibbioc_1.46.0               plyr_1.8.9                   
[115] parallel_4.3.1                ggrepel_0.9.6                
[117] deldir_2.0-4                  Biostrings_2.68.1            
[119] stars_0.6-7                   splines_4.3.1                
[121] tensor_1.5                    locfit_1.5-9.10              
[123] igraph_2.1.2                  ScaledMatrix_1.8.1           
[125] BiocVersion_3.17.1            XML_3.99-0.18                
[127] evaluate_1.0.1                renv_1.0.11                  
[129] BiocManager_1.30.25           httpuv_1.6.15                
[131] purrr_1.0.2                   polyclip_1.10-7              
[133] rsvd_1.0.5                    lwgeom_0.2-14                
[135] xtable_1.8-4                  e1071_1.7-16                 
[137] RSpectra_0.16-2               later_1.4.1                  
[139] viridisLite_0.4.2             class_7.3-22                 
[141] tibble_3.2.1                  memoise_2.0.1                
[143] beeswarm_0.4.0                AnnotationDbi_1.62.2         
[145] cluster_2.1.4                 BiocStyle_2.28.1             

©2024 The pasta authors. Content is published under Creative Commons CC-BY-4.0 License for the text and GPL-3 License for any code.

References

10x Genomics. 2021. “Adult Mouse Kidney (FFPE), Spatial Gene Expression Dataset Analyzed Using Space Ranger 1.3.0.”
Anselin, Luc. 1995. “Local Indicators of Spatial AssociationLISA.” Geographical Analysis 27 (2): 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x.
———. 2024. An Introduction to Spatial Data Science with GeoDa: Volume 1: Exploring Spatial Data. CRC Press.
Anselin, Luc, Ibnu Syabri, and Youngihn Kho. 2009. GeoDa: An Introduction to Spatial Data Analysis.” In Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications, 73–89. Springer.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological) 57 (1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY.
Cliff, Andrew David, and J Keith Ord. 1981. Spatial Processes: Models & Applications. Pion, London.
Fischer, Manfred M, Arthur Getis, et al. 2010. Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications. Springer.
Geary, R. C. 1954. “The Contiguity Ratio and Statistical Mapping.” The Incorporated Statistician 5 (3): 115–46. https://doi.org/10.2307/2986645.
Getis, Arthur, and J. K. Ord. 1992. “The Analysis of Spatial Association by Use of Distance Statistics.” Geographical Analysis 24 (3): 189–206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x.
Getis, Arthur, and JK Ord. 1996. “Local Spatial Statistics: An Overview. Spatial Analysis: Modeling in a GIS Environment. Longley, P., and M. Batty.”
Moran, P. A. P. 1950. “Notes on Continuous Stochastic Phenomena.” Biometrika 37 (1/2): 17–23. https://doi.org/10.2307/2332142.
Moses, Lambda, Pétur Helgi Einarsson, Kayla C. Jackson, Laura Luebbert, Ali Sina Booeshaghi, Sindri Emmanúel Antonsson, Nicolas Bray, Páll Melsted, and Lior Pachter. 2023. “Voyager: Exploratory Single-Cell Genomics Data Analysis with Geospatial Statistics.” bioRxiv. https://doi.org/10.1101/2023.07.20.549945.
Ord, J. Keith, and Arthur Getis. 2012. “Local Spatial Heteroscedasticity (LOSH).” The Annals of Regional Science 48 (2): 529–39. https://doi.org/10.1007/s00168-011-0492-y.
Ord, J. K., and Arthur Getis. 1995. “Local Spatial Autocorrelation Statistics: Distributional Issues and an Application.” Geographical Analysis 27 (4): 286–306. https://doi.org/10.1111/j.1538-4632.1995.tb00912.x.
Palla, Giovanni, Hannah Spitzer, Michal Klein, David Fischer, Anna Christina Schaar, Louis Benedikt Kuemmerle, Sergei Rybakov, et al. 2022. “Squidpy: A Scalable Framework for Spatial Omics Analysis.” Nature Methods 19 (2): 171–78. https://doi.org/10.1038/s41592-021-01358-2.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. 1st ed. New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
Rey, Sergio J., and Luc Anselin. 2010. PySAL: A Python Library of Spatial Analytical Methods.” In Handbook of Applied Spatial Analysis, 175–93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03647-7_11.
Virshup, Isaac, Sergei Rybakov, Fabian J Theis, Philipp Angerer, and F Alexander Wolf. 2024. “Anndata: Access and Store Annotated Data Matrices.” Journal of Open Source Software 9 (101): 4371. https://doi.org/10.21105/joss.04371.
Zuur, Alain F., Elena N. Ieno, and Graham M. Smith. 2007. Analysing Ecological Data. Statistics for Biology and Health. New York: Springer.