Published

January 30, 2025

Lattice data analysis – univariate methods for imaging-based data

In this vignette we will show:

Dependencies

Show the code
source("utils.R")
theme_set(theme_minimal())
Show the code
import numpy as np
import scanpy as sc
import squidpy as sq
from esda.moran import Moran, Moran_Local
from esda.geary import Geary
from esda import Geary_Local, G_Local, LOSH
from esda.getisord import G
from scipy.stats import false_discovery_control
from libpysal.weights import W, KNN
from libpysal.cg import KDTree
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import warnings

warnings.filterwarnings("ignore")

plt.rcParams['font.family'] = 'Futura'
df_cmap_discrete = pd.read_csv("../misc/ditto_colors.csv", index_col=0)
df_cmap_continuous = pd.read_csv("../misc/roma_colors.csv", index_col=0)
cmap_discrete = ListedColormap(df_cmap_discrete["ditto_colors"].values, "ditto")
cmap_continuous = LinearSegmentedColormap.from_list("roma", df_cmap_continuous["roma_colors"]).reversed()

Until now, we have considered the cells to be represented in a point pattern. However, as cells have a shape and area, this might be an oversimplification in some cases. Alternatively, we can rely on the segmentation of individual cells that are available for various datasets. The outline of each cell is represented by a polygon and the collection of all cells can be seen as an irregular lattice. Unlike a regular lattice (e.g., spot-based spatial transcriptomics data), the sample areas in an irregular lattice can have different sizes and are not necessarily regularly distributed over the sample space.

For this representation of the cells we will rely on the SpatialFeatureExperiment package. For preprocessing of the dataset and code examples we refer the reader to the vignette of the voyager package (Moses et al. 2023). The voyager package provides wrapper functions around the package spdep (Pebesma and Bivand 2023) that work directly on the SpatialFeatureExperiment object.

Setup and Preprocessing

Show the code
#taken from https://pachterlab.github.io/voyager/articles/vig4_cosmx.html
(sfe <- HeNSCLCData())
class: SpatialFeatureExperiment 
dim: 980 100290 
metadata(0):
assays(1): counts
rownames(980): AATK ABL1 ... NegPrb22 NegPrb23
rowData names(3): means vars cv2
colnames(100290): 1_1 1_2 ... 30_4759 30_4760
colData names(17): Area AspectRatio ... nCounts nGenes
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : CenterX_global_px CenterY_global_px
imgData names(1): sample_id

unit: full_res_image_pixels
Geometries:
colGeometries: centroids (POINT), cellSeg (POLYGON) 

Graphs:
sample01: 
Show the code
# Empty cells
colData(sfe)$is_empty <- colData(sfe)$nCounts < 1
# Select, sum negative control probes
(neg_inds <- str_detect(rownames(sfe), "^NegPrb")) %>% sum
[1] 20
Show the code
colData(sfe)$prop_neg <- colSums(counts(sfe)[neg_inds,])/colData(sfe)$nCounts
# Remove low quality cells
sfe <- sfe[,!sfe$is_empty & sfe$prop_neg < 0.1]
# Re-calculate stats
rowData(sfe)$is_neg <- neg_inds
# log Counts
sfe <- logNormCounts(sfe)

# save for python
colData(sfe)$CenterX_global_px <- spatialCoords(sfe)[,1]
colData(sfe)$CenterY_global_px <- spatialCoords(sfe)[,2]

For some examples we will show a subset of the tissue.

Show the code
bbox_use <- st_as_sfc(st_bbox(c(xmin = 3200, xmax = 16800, ymin = 155200, ymax = 166200)))
Show the code
sfe <- sfe[,st_intersects(colGeometries(sfe)$centroids, bbox_use, sparse = FALSE)]
ann <-  zellkonverter::SCE2AnnData(sfe, X_name = "counts")
anndata::write_h5ad(ann, "../data/imaging_sfe.h5ad")
Show the code
adata = sc.read_h5ad("../data/imaging_sfe.h5ad")
adata.obsm["spatial"] = np.column_stack([adata.obs["CenterX_global_px"], adata.obs["CenterY_global_px"]])
# normalise counts
adata.raw = adata.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# invert the spatial coordinates
adata.obsm["spatial"][:, 1] = adata.obsm["spatial"][:, 1].max() - adata.obsm["spatial"][:, 1]
adata
AnnData object with n_obs × n_vars = 27205 × 980
    obs: 'Area', 'AspectRatio', 'Width', 'Height', 'Mean.MembraneStain', 'Max.MembraneStain', 'Mean.PanCK', 'Max.PanCK', 'Mean.CD45', 'Max.CD45', 'Mean.CD3', 'Max.CD3', 'Mean.DAPI', 'Max.DAPI', 'sample_id', 'nCounts', 'nGenes', 'is_empty', 'prop_neg', 'sizeFactor', 'CenterX_global_px', 'CenterY_global_px'
    var: 'means', 'vars', 'cv2', 'is_neg'
    uns: 'X_name', 'log1p'
    obsm: 'spatial'
    layers: 'logcounts'

Here we set the arguments for the examples below. We will continue with the two features KRT17 (basal cells) and TAGLN (smooth muscle cells) and specify a \(k\)-nearest neighbour weight matrix with \(k=5\).

Show the code
features <- c("KRT17", "TAGLN")
colGraphName <- "knn5"
colGeometryName <- "centroids"
pointsize <- 1
Show the code
# predefine genes
features = ["KRT17", "TAGLN"]
figsize = (len(features)*10, 10)
pointsize = 30
Show the code
plotSpatialFeature(sfe, features,
                   colGeometryName = "centroids", 
                   nrow = 2, size = 2, scattermore = FALSE) +
  theme_void()

Show the code
fig, axes = plt.subplots(len(features), 1, figsize=(10, 15), layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(adata, color=feature, shape=None, size=10, cmap="Blues", ax=ax, fig=fig, use_raw=False)
    ax.set_axis_off()

Lattice data

A lattice consists of individual spatial units \(D = \{A_1, A_2,...,A_n\}\) where the units do not overlap. The data is then a realisation of a random variable along the lattice \(Y_i = Y (A_i)\) (Zuur, Ieno, and Smith 2007). The lattice is irregular, if the units have variable size and are not spaced regularly, such as is the case with cells in tissue.

More details about lattices can be found on here.

Spatial weight matrix

One of the challenges when working with (irregular) lattice data is the construction of a neighbourhood graph (Pebesma and Bivand 2023). The main question is, what to consider as neighbours, as this will affect downstream analyses. Various methods exist to define neighbours, such as contiguity-based neighbours (neighbours in direct contact), graph-based neighbours (e.g., \(k\)-nearest neighbours), distance-based neighbours or higher order neighbours (Getis 2009; Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023). The documentation of the package spdep provides an overview of the different methods (R. Bivand 2022).

We consider first contiguity-based neighbours. As cell segmentation is notoriously imperfect, we add a snap value, which means that we consider all cells with distance 20 or less as contiguous (Pebesma and Bivand 2023; Wang 2019).

Show the code
colGraph(sfe, "poly2nb") <-
  findSpatialNeighbors(sfe,
    type = "cellSeg",
    method = "poly2nb", # wraps the spdep function with the same name
    style = "W",
    snap = 20 # all cells with less distance  apart are considered contiguous
  )

p1 <- plotColGraph(sfe,
  colGraphName = "poly2nb",
  colGeometryName = "cellSeg",
  bbox =  c(xmin = 3500, xmax = 10000, ymin = 157200, ymax = 162200)
) + theme_void()

Alternatively, we can use a \(k\)-nearest neighbours approach. The choice of the number \(k\) is somewhat arbitrary.

Show the code
colGraph(sfe, "knn5") <-
  findSpatialNeighbors(sfe,
    method = "knearneigh", # wraps the spdep function with the same name
    k = 5,
    zero.policy = TRUE
  )

p2 <- plotColGraph(sfe,
  colGraphName = "knn5",
  colGeometryName = "cellSeg",
  bbox = c(xmin = 3500, xmax = 10000, ymin = 157200, ymax = 162200)
) + theme_void()

#calculate binary nearest neighbour weight matrix too
colGraph(sfe, "binary") <-
  findSpatialNeighbors(sfe,
    method = "knearneigh", # wraps the spdep function with the same name
    k = 5,
    zero.policy = TRUE,
    style = "B"
  )
Show the code
spatial_weights = KNN(KDTree(adata.obsm['spatial']), 5)
adata.obsp['spatial_connectivities'] = spatial_weights.sparse

The graphs below show noticeable differences. In the contiguous neighbour graph on the left (neighbours in direct contact), we can see the formation of distinct patches that are not connected to the rest of the tissue. In addition, some cells do not have any direct neighbours. In contrast, the \(k\)-nearest neighbours (kNN) graph on the right reveals that these patches tend to be connected to the rest of the structure.

Show the code
p1 + p2

Show the code
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
sq.pl.spatial_scatter(
    adata[adata.obsp["spatial_connectivities"].nonzero()[0], :],
    connectivity_key="spatial_connectivities",
    size=0.1,
    na_color="black",
    edges_color="black",
    edges_width=0.1,
    shape=None,
    library_id="spatial",
    ax=ax,
    fig=fig,
    crop_coord = (3500, 3999, 10000, 8999)
)

Global Measures

Global methods give us an overview over the entire field-of-view and summarize spatial autocorrelation metrics to a single value. The metrics are a function of the weight matrix and the variables of interest. The variables of interest can be gene expression, intensity of a marker or the area of the cell. The global measures can be seen as a weighted average of the local metric, as explained below.

In general, a global spatial autocorrelation measure has the form of a double sum over all locations \(i,j\)

\[\sum_i \sum_j f(x_i,x_j) w_{ij}\]

where \(f(x_i,x_j)\) is the measure of association between features of interest and \(w_{ij}\) scales the relationship by a spatial weight as defined in the weight matrix \(W\). If \(i\) and \(j\) are not neighbours, i.e. we assume they do not have any spatial association, the corresponding element of the weight matrix is \(0\) (i.e., \(w_{ij} = 0\)). Below we will see that the function \(f\) varies between the different spatial autocorrelation measures (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Anselin 2024).

Global Moran’s \(I\) coefficient

The global Moran’s \(I\) (Moran 1950) coefficient is a measure of spatial autocorrelation, defined as:

\[I = \frac{n}{\sum_i\sum_j w_{ij}} \frac{\sum_i\sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}.\]

where \(x_i\) and \(x_j\) represent the values of the variable of interest at locations \(i\) and \(j\), \(\bar{x}\) is the mean of all \(x\) and \(w_{ij}\) is the spatial weight between the locations of \(i\) and \(j\). The expected value is close to \(0\) for large \(n\) (\(\mathbb{E}(I) = -1/(n-1)\)), whereas a value higher than this expectation indicates spatial autocorrelation. Negative values indicate negative autocorrelation. Moran’s \(I\) can be interpreted as the Pearson correlation between the value at location \(i\) and the average value of the neigbours of \(i\), (neighbours as defined in the weight matrix \(W\)) (Moran 1950; R. S. Bivand, Pebesma, and Gómez-Rubio 2013, 258–61; Cliff and Ord 1981, 21).

Show the code
resMoran <- calculateMoransI(
  sfe,
  features = features,
  colGraphName = colGraphName,
  exprs_values = "logcounts"
)

We can also use the moran.mc function to calculate the Moran’s \(I\) coefficient. This function uses a Monte Carlo simulation to calculate the p-value.

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     exprs_values = "logcounts",
                     type = "moran.mc",
                     nsim = 200)

res <- rowData(sfe)[features,]
res
DataFrame with 2 rows and 10 columns
          means      vars       cv2    is_neg moran.mc_statistic_sample01
      <numeric> <numeric> <numeric> <logical>                   <numeric>
KRT17  1.378333  14.67222   7.72303     FALSE                    0.704628
TAGLN  0.714079   3.76205   7.37788     FALSE                    0.244683
      moran.mc_parameter_sample01 moran.mc_p.value_sample01
                        <numeric>                 <numeric>
KRT17                         201                0.00497512
TAGLN                         201                0.00497512
      moran.mc_alternative_sample01 moran.mc_method_sample01
                        <character>              <character>
KRT17                       greater   Monte-Carlo simulati..
TAGLN                       greater   Monte-Carlo simulati..
                        moran.mc_res_sample01
                                       <list>
KRT17 -0.00438618, 0.00266151,-0.00462597,...
TAGLN  0.00457890, 0.00114938,-0.00181819,...
Show the code
for feature in features:
    moran = Moran(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100)
    adata.uns[f"moran_{feature}"] = moran.I
    adata.uns[f"moran_{feature}_p_sim"] = moran.p_sim

for key in filter(lambda x: x.startswith("moran_"), adata.uns.keys()):
    print(f"{key}: {adata.uns[key].round(4)}")
moran_KRT17: 0.7045
moran_KRT17_p_sim: 0.0099
moran_TAGLN: 0.2447
moran_TAGLN_p_sim: 0.0099

The result of global Moran’s \(I\) for the gene KRT17 is 0.705, the associated p-values is 0.005. This means that gene KRT17 shows positive autocorrelation

The result of global Moran’s \(I\) for the gene TAGLN is 0.245, the associated p-values is 0.005. This means that gene TAGLN shows positive autocorrelation

One should note that the result is dependent on the weight matrix. Different weight matrices will give different results. To compare Moran’s \(I\) coefficients of different features, we need to use the same weight matrix.

Global’s Geary’s \(C\) coefficient

Geary’s \(C\) (Geary 1954) is a different measure of global autocorrelation and is very closely related to Moran’s \(I\). However, it focuses on spatial dissimilarity rather than similarity. Geary’s \(C\) is defined by

\[C = \frac{(n-1) \sum_i \sum_j w_{ij}(x_i-x_j)^2}{2\sum_i \sum_j w_{ij}\sum_i(x_i-\bar{x})^2}\]

where \(x_i\) and \(x_j\) represent the values of the variable of interest at locations \(i\) and \(j\), \(\bar{x}\) is the mean of all \(x\), \(w_{ij}\) is the spatial weight between the locations of \(i\) and \(j\) and \(n\) the total number of locations. The interpretation is opposite to Moran’s \(I\): a value smaller than \(1\) indicates positive autocorrelation whereas a value greater than \(1\) represents negative autocorrelation (Cliff and Ord 1981, 17; Pebesma and Bivand 2023; Fischer, Getis, et al. 2010, 255–65).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     nsim = 200,
                     type = "geary.mc")

res <- rowData(sfe)[features,]
res
DataFrame with 2 rows and 16 columns
          means      vars       cv2    is_neg moran.mc_statistic_sample01
      <numeric> <numeric> <numeric> <logical>                   <numeric>
KRT17  1.378333  14.67222   7.72303     FALSE                    0.704628
TAGLN  0.714079   3.76205   7.37788     FALSE                    0.244683
      moran.mc_parameter_sample01 moran.mc_p.value_sample01
                        <numeric>                 <numeric>
KRT17                         201                0.00497512
TAGLN                         201                0.00497512
      moran.mc_alternative_sample01 moran.mc_method_sample01
                        <character>              <character>
KRT17                       greater   Monte-Carlo simulati..
TAGLN                       greater   Monte-Carlo simulati..
                        moran.mc_res_sample01 geary.mc_statistic_sample01
                                       <list>                   <numeric>
KRT17 -0.00438618, 0.00266151,-0.00462597,...                    0.295724
TAGLN  0.00457890, 0.00114938,-0.00181819,...                    0.748246
      geary.mc_parameter_sample01 geary.mc_p.value_sample01
                        <numeric>                 <numeric>
KRT17                           1                0.00497512
TAGLN                           1                0.00497512
      geary.mc_alternative_sample01 geary.mc_method_sample01
                        <character>              <character>
KRT17                       greater   Monte-Carlo simulati..
TAGLN                       greater   Monte-Carlo simulati..
               geary.mc_res_sample01
                              <list>
KRT17 1.003556,0.997671,1.002076,...
TAGLN 1.003166,0.994311,0.999439,...
Show the code
for feature in features:
    geary = Geary(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100)
    adata.uns[f"geary_{feature}"] = geary.C
    adata.uns[f"geary_{feature}_p_sim"] = geary.p_sim

for key in filter(lambda x: x.startswith("geary_"), adata.uns.keys()):
    print(f"{key}: {adata.uns[key].round(4)}")
geary_KRT17: 0.2959
geary_KRT17_p_sim: 0.0099
geary_TAGLN: 0.7482
geary_TAGLN_p_sim: 0.0099

The result of global Geary’s \(C\) for the gene KRT17 is 0.296, the associated p-values is 0.005. This means that gene KRT17 shows positive autocorrelation

The result of global Geary’s \(C\) for the gene TAGLN is 0.748, the associated p-values is 0.005. This means that gene TAGLN shows positive autocorrelation

Global Getis-Ord \(G\) statistic

The global \(G\) (Getis and Ord 1992) statistic is a generalisation of the local version (see below) and summarises the contributions of all pairs of values \((x_i, x_j)\) in the dataset. Formally that is

\[G(d) = \frac{\sum_{i = 1}^n \sum_{j=1}^n w_{ij}(d)x_ix_j}{\sum_{i = 1}^n \sum_{j=1}^n x_i x_j} \text{s.t } j \neq i.\]

The interpretation works by calculating a z-statistic to compare the observed with the expected \(G(d)\) statistic. If this value is positive, we refer to this as a hot spot and if it is negative as a cold spot (Getis and Ord 1992; Anselin 2024).

The global \(G(d)\) statistic is very similar to global Moran’s \(I\). The global \(G(d)\) statistic is based on the sum of the products of the data points whereas global Moran’s \(I\) is based on the sum of the covariances. Since these two approaches capture different aspects of a structure, their values will differ as well. A good approach would be to not use one statistic in isolation but rather consider both (Getis and Ord 1992).

This framework was established for binary weights but can be extended to nonbinary weights as well (J. K. Ord and Getis 1995). We will use the spdep package directly to calculate the global \(G\) statistic.

Show the code
# Get the weight matrix from sfe object
weights_neighbourhoods_binary <- colGraph(sfe, 'binary')
# Change it to binary weights
weights_neighbourhoods_binary$style <- "B" 
# Calculate the global G statistic
res <- spdep::globalG.test(x = logcounts(sfe)[features[1],], 
                    listw = weights_neighbourhoods_binary)
res 

    Getis-Ord global G statistic

data:  logcounts(sfe)[features[1], ] 
weights: weights_neighbourhoods_binary   

standard deviate = 183.94, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Global G statistic        Expectation           Variance 
      4.940037e-04       1.837965e-04       2.844296e-12 
Show the code
for feature in features:
    getisord = G(adata[:, feature].X.toarray(), spatial_weights, permutations=0)
    print("Global G statistic", feature, getisord.z_norm)
    print("Global G statistic (raw)", feature, getisord.G)
    print("Global G statistic (expected)", feature, getisord.EG)
    print("p_norm", feature, getisord.p_norm)
Global G statistic KRT17 183.9060551202554
Global G statistic (raw) KRT17 0.0004943689285951892
Global G statistic (expected) KRT17 0.0001837965005146302
p_norm KRT17 0.0
Global G statistic TAGLN 63.77135359684779
Global G statistic (raw) TAGLN 0.00037284757367978574
Global G statistic (expected) TAGLN 0.0001837965005146302
p_norm TAGLN 0.0

The result of global Getis-Ord \(G\) for the gene KRT17 is 4.9^{-4}, the z-statistic is 183.93503 the associated p-values is 0. This means that gene KRT17 shows a hot spot

Local measures

Unlike global measures that give an overview over the entire field of view, local measures report information about the statistic at each location (cell). There exist local analogs of Moran’s \(I\) and Geary’s \(C\) for which the global statistic can be represented as a weighted sum of the local statistics. As above, the local coefficients are based on both the spatial weights matrix and the values of the measurement of interest (Anselin 1995).

Local Moran’s \(I\) coefficient

The local Moran’s \(I\) coefficient (Anselin 1995) is a measure of spatial autocorrelation at each location of interest. It is defined as:

\[I_i = \frac{x_i - \bar{x}}{\sum_{k=1}^n(x_k-\bar{x})^2/(n-1)} \sum_{j=1}^n w_{ij}(x_j - \bar{x})\]

where the index \(i\) refers to the location for which the measure is calculated. The interpretation is analogous to the global Moran’s \(I\) where a value of \(I_i\) higher than \(\mathbb{E}(I_i) = -w_i/(n-1)\) indicates spatial autocorrelation; smaller values indicate negative autocorrelation (Anselin 1995). It is important to note that, as for the global counterpart, the value of local Moran’s \(I\) could be a result from both the high or low end of the values. Since we measure and test a large number of locations simultaneously, we need to correct for multiple testing (e.g., using the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995)) (Pebesma and Bivand 2023).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     type = "localmoran")

plotLocalResult(sfe, "localmoran",
                features = features, ncol = 2,
                colGeometryName = colGeometryName,
                divergent = TRUE, diverge_center = 0, size = pointsize)

Show the code
for feature in features:
    local_moran = Moran_Local(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100, seed=3407)
    adata.obs[f"local_moran_{feature}"] = local_moran.Is
    adata.obs[f"local_moran_{feature}_p"] = -np.log10(false_discovery_control(local_moran.p_z_sim*2))
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"local_moran_{feature}",
        cmap=cmap_continuous,
        vmin=-adata.obs[f"local_moran_{feature}"].abs().max(),
        vmax=adata.obs[f"local_moran_{feature}"].abs().max(),
        vcenter=0,
        shape=None,
        library_id="spatial",
        title=f"Local Moran's Is: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize,
        scalebar_kwargs={"height_fraction":0.5}
     )
    ax.set_axis_off()

Local Geary’s \(C\) coefficient

Similar to local Moran’s \(I\), there is a local Geary’s \(C\) (Anselin 1995) coefficient. It is defined as

\[C_i = \sum_{j=1}^n w_{ij}(x_i-x_j)^2\]

The interpretation is analogous to the global Geary’s \(C\) (value less than \(1\) indicates positive autocorrelation, a value greater than \(1\) highlights negative autocorrelation) (Anselin 1995).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     type = "localC")

plotLocalResult(sfe, "localC",
                features = features, ncol = 2,
                colGeometryName = colGeometryName,
                divergent = TRUE, diverge_center = 0, size = pointsize)

Show the code
for feature in features:
    local_geary = Geary_Local(connectivity=spatial_weights, permutations=100, seed=3407)
    local_geary_result = local_geary.fit(adata[:, feature].X.toarray().astype(np.float64))
    adata.obs[f"local_geary_{feature}"] = local_geary_result.localG
    adata.obs[f"local_geary_{feature}_p"] = -np.log10(false_discovery_control(local_geary_result.p_sim))
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"local_geary_{feature}",
        cmap=cmap_continuous,
        vmin=-adata.obs[f"local_geary_{feature}"].abs().max(),
        vmax=adata.obs[f"local_geary_{feature}"].abs().max(),
        vcenter=0,
        shape=None,
        library_id="spatial",
        title=f"Local Geary's Cs: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize
    )
    ax.set_axis_off()

Local Getis-Ord statistic

The local Getis-Ord \(G_i\) (J. K. Ord and Getis 1995; Getis and Ord 1992) statistic quantifies the weighted concentration of points within a radius \(d\) and in a local region \(i\), according to:

\[G_i(d) = \frac{\sum_{j \neq i } w_{ij}(d)x_j}{\sum_{j \neq i} x_j}\]

There is a variant of this statistic, \(G_i^*(d)\), which is the same as \(G_i(d)\) except that the contribution when \(j=i\) is included in the term (Getis and Ord 1996).

The interpretation of the local Getis-Ord statistic \(G_i(d)\) is the same as for the global variant. The easiest is to calculate a z-statistic. If this value is positive, we call the region a hotspot, if it is negative, we call it a cold spot (Anselin 2024).

Show the code
sfe <- runUnivariate(sfe,
                     features = features,
                     include_self = FALSE, # this would specify G_i^*(d),
                     colGraphName = "binary",
                     type = "localG")

plotLocalResult(sfe, "localG",
                features = features,
                ncol = 2,
                colGeometryName = colGeometryName,
                divergent = TRUE, diverge_center = 0, size = pointsize)

Show the code
for feature in features:
    local_getis_ord = G_Local(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transform="B", star=False, permutations=1000, seed=3407)
    adata.obs[f"local_getis_ord_{feature}"] = local_getis_ord.Zs
    adata.obs[f"local_getis_ord_{feature}_p"] = -np.log10(false_discovery_control(local_getis_ord.p_norm))
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"local_getis_ord_{feature}",
        cmap=cmap_continuous,
        vmin=-adata.obs[f"local_getis_ord_{feature}"].abs().max(),
        vmax=adata.obs[f"local_getis_ord_{feature}"].abs().max(),
        vcenter=0,
        shape=None,
        library_id="spatial",
        title=f"Local Getis-Ord $G_i$: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize
    )
    ax.set_axis_off()

The results above give an estimate of the local Getis-Ord statistic for each cell, but no significance value. This can be done by using a permutation approach using the localG_perm argument.

Positive values indicate clustering of high values, i.e., hot spots, and negative values indicate clustering of low values, i.e., cold spots. The method does not detect outlier values because, unlike in local Moran’s \(I\), there is no cross-product between \(i\) and \(j\). But unlike local Moran’s \(I\), we know the type of interaction (high-high or low-low) between \(i\) and \(j\) (Anselin, Syabri, and Kho 2009; Anselin 2024).

Local spatial heterosceadiscity (LOSH)

The local spatial heteroscedasticity (LOSH) (J. Keith Ord and Getis 2012) is a measure of spatial autocorrelation that is based on the variance of the local neighbourhood. Unlike the other measures, this method does not assume homoscedastic variance over the whole tissue region. LOSH is defined as:

\[H_i(d) = \frac{\sum_j w_{ij}(d)|e_j(d)|^a}{\sum_j w_{ij}(d)}\]

where \(e_j(d) = x_j - \bar{x}_i(d), j \in N(i,d)\) are the local residuals that are subtracted from the local mean. The power \(a\) modulates the interpretation of the residuals (\(a=1\): residuals are interpreted as absolute deviations from the local mean; \(a=2\): residuals are interpreted as deviations from the local variance) (J. Keith Ord and Getis 2012).

The LOSH should be interpreted in combination with the local Getis-Ord \(G_i^*\) statistic. The \(G_i^*\) quantifies the local mean of the variable of interest, while \(H_i\) quantifies the local variance. This table provided by Ord and Getis (J. Keith Ord and Getis 2012) summarizes the interpretation of the combination of \(G_i^*\) and \(H_i\).

high \(H_i\) low \(H_i\)
large \(\|G_i^*\|\) A hot spot with heterogeneous local conditions A hot spot with similar surrounding areas; the map would indicate whether the affected region is larger than the single “cell”
small \(\|G_i^*\|\) Heterogeneous local conditions but at a low average level (an unlikely event) Homogeneous local conditions and a low average level
Show the code
# run localG with permutation test
sfe <- runUnivariate(sfe,
                     features = features,
                     colGraphName = colGraphName,
                     type = "LOSH")


plotLocalResult(sfe, "LOSH",
                features = features,
                colGeometryName = colGeometryName,
                size = pointsize)

Show the code
for feature in features:
    losh = LOSH(connectivity=spatial_weights,  inference="chi-square")
    losh_result = losh.fit(adata[:, feature].X.toarray().astype(np.float64))
    adata.obs[f"losh_{feature}"] = losh_result.Hi
    
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
    sq.pl.spatial_scatter(
        adata,
        color=f"losh_{feature}",
        cmap="Blues",
        vmin=0,
        vmax=adata.obs[f"losh_{feature}"].abs().max(),
        shape=None,
        library_id="spatial",
        title=f"Local spatial heteroscedasticity: {feature}",
        ax=ax,
        fig=fig,
        size=pointsize
    )
    ax.set_axis_off()

A note of caution

The local methods presented above should always be interpreted with care, since we face the problem of multiple testing when calculating them for each cell. Moreover, the presented methods should mainly serve as exploratory measures to identify interesting regions in the data. Multiple processes can lead to the same pattern, thus from identifying the pattern we cannot infer the underlying process. Indication of clustering does not explain why this occurs. On the one hand, clustering can be the result of spatial interaction between the variables of interest. We have an accumulation of a gene of interest in one region of the tissue. On the other hand clustering can be the result spatial heterogeneity, when local similarity is created by structural heterogeneity in the tissue, e.g., that cells with uniform expression of a gene of interest are grouped together which then creates the apparent clustering of the gene expression measurement (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Anselin 2024).

Appendix

Session info

Show the code
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.7.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] dixon_0.0-9                    splancs_2.01-45               
 [3] sp_2.1-4                       bluster_1.10.0                
 [5] magrittr_2.0.3                 stringr_1.5.1                 
 [7] spdep_1.3-8                    spData_2.3.3                  
 [9] tmap_3.3-4                     scater_1.28.0                 
[11] scran_1.28.2                   scuttle_1.10.3                
[13] SFEData_1.2.0                  SpatialFeatureExperiment_1.2.3
[15] Voyager_1.2.7                  rgeoda_0.0.11-1               
[17] digest_0.6.37                  sf_1.0-19                     
[19] reshape2_1.4.4                 patchwork_1.3.0               
[21] STexampleData_1.8.0            ExperimentHub_2.8.1           
[23] AnnotationHub_3.8.0            BiocFileCache_2.8.0           
[25] dbplyr_2.3.4                   rlang_1.1.4                   
[27] ggplot2_3.5.1                  dplyr_1.1.4                   
[29] spatstat_3.3-0                 spatstat.linnet_3.2-3         
[31] spatstat.model_3.3-3           rpart_4.1.24                  
[33] spatstat.explore_3.3-3         nlme_3.1-166                  
[35] spatstat.random_3.3-2          spatstat.geom_3.3-4           
[37] spatstat.univar_3.1-1          spatstat.data_3.1-4           
[39] SpatialExperiment_1.10.0       SingleCellExperiment_1.22.0   
[41] SummarizedExperiment_1.30.2    Biobase_2.60.0                
[43] GenomicRanges_1.52.1           GenomeInfoDb_1.36.4           
[45] IRanges_2.34.1                 S4Vectors_0.38.2              
[47] BiocGenerics_0.46.0            MatrixGenerics_1.12.3         
[49] matrixStats_1.5.0             

loaded via a namespace (and not attached):
  [1] spatstat.sparse_3.1-0         bitops_1.0-9                 
  [3] httr_1.4.7                    RColorBrewer_1.1-3           
  [5] tools_4.3.1                   R6_2.5.1                     
  [7] HDF5Array_1.28.1              mgcv_1.9-1                   
  [9] anndata_0.7.5.6               rhdf5filters_1.12.1          
 [11] withr_3.0.2                   gridExtra_2.3                
 [13] leaflet_2.2.2                 leafem_0.2.3                 
 [15] cli_3.6.3                     labeling_0.4.3               
 [17] proxy_0.4-27                  R.utils_2.12.3               
 [19] dichromat_2.0-0.1             scico_1.5.0                  
 [21] limma_3.56.2                  rstudioapi_0.15.0            
 [23] RSQLite_2.3.9                 generics_0.1.3               
 [25] crosstalk_1.2.1               Matrix_1.5-4.1               
 [27] ggbeeswarm_0.7.2              abind_1.4-8                  
 [29] R.methodsS3_1.8.2             terra_1.8-5                  
 [31] lifecycle_1.0.4               yaml_2.3.10                  
 [33] edgeR_3.42.4                  rhdf5_2.44.0                 
 [35] tmaptools_3.1-1               grid_4.3.1                   
 [37] blob_1.2.4                    promises_1.3.2               
 [39] dqrng_0.4.1                   crayon_1.5.3                 
 [41] dir.expiry_1.8.0              lattice_0.22-6               
 [43] beachmat_2.16.0               KEGGREST_1.40.1              
 [45] magick_2.8.5                  pillar_1.10.1                
 [47] knitr_1.49                    metapod_1.7.0                
 [49] rjson_0.2.23                  boot_1.3-28.1                
 [51] codetools_0.2-19              wk_0.9.4                     
 [53] glue_1.8.0                    vctrs_0.6.5                  
 [55] png_0.1-8                     gtable_0.3.6                 
 [57] assertthat_0.2.1              cachem_1.1.0                 
 [59] xfun_0.50                     S4Arrays_1.0.6               
 [61] mime_0.12                     DropletUtils_1.20.0          
 [63] units_0.8-5                   statmod_1.5.0                
 [65] interactiveDisplayBase_1.38.0 bit64_4.5.2                  
 [67] filelock_1.0.3                irlba_2.3.5.1                
 [69] vipor_0.4.7                   KernSmooth_2.23-26           
 [71] colorspace_2.1-1              DBI_1.2.3                    
 [73] zellkonverter_1.9.0           raster_3.6-30                
 [75] tidyselect_1.2.1              bit_4.5.0.1                  
 [77] compiler_4.3.1                curl_6.1.0                   
 [79] BiocNeighbors_1.18.0          basilisk.utils_1.12.1        
 [81] DelayedArray_0.26.7           scales_1.3.0                 
 [83] classInt_0.4-10               rappdirs_0.3.3               
 [85] goftest_1.2-3                 spatstat.utils_3.1-2         
 [87] rmarkdown_2.29                basilisk_1.11.2              
 [89] XVector_0.40.0                htmltools_0.5.8.1            
 [91] pkgconfig_2.0.3               base64enc_0.1-3              
 [93] sparseMatrixStats_1.12.2      fastmap_1.2.0                
 [95] htmlwidgets_1.6.4             shiny_1.10.0                 
 [97] DelayedMatrixStats_1.22.6     farver_2.1.2                 
 [99] jsonlite_1.8.9                BiocParallel_1.34.2          
[101] R.oo_1.27.0                   BiocSingular_1.16.0          
[103] RCurl_1.98-1.16               GenomeInfoDbData_1.2.10      
[105] s2_1.1.7                      Rhdf5lib_1.22.1              
[107] munsell_0.5.1                 Rcpp_1.0.13-1                
[109] ggnewscale_0.5.0              viridis_0.6.5                
[111] reticulate_1.40.0             stringi_1.8.4                
[113] leafsync_0.1.0                zlibbioc_1.46.0              
[115] plyr_1.8.9                    parallel_4.3.1               
[117] ggrepel_0.9.6                 deldir_2.0-4                 
[119] Biostrings_2.68.1             stars_0.6-7                  
[121] splines_4.3.1                 tensor_1.5                   
[123] locfit_1.5-9.10               igraph_2.1.2                 
[125] ScaledMatrix_1.8.1            BiocVersion_3.17.1           
[127] XML_3.99-0.18                 evaluate_1.0.1               
[129] renv_1.0.11                   BiocManager_1.30.25          
[131] httpuv_1.6.15                 purrr_1.0.2                  
[133] polyclip_1.10-7               rsvd_1.0.5                   
[135] lwgeom_0.2-14                 xtable_1.8-4                 
[137] e1071_1.7-16                  RSpectra_0.16-2              
[139] later_1.4.1                   viridisLite_0.4.2            
[141] class_7.3-22                  tibble_3.2.1                 
[143] memoise_2.0.1                 beeswarm_0.4.0               
[145] AnnotationDbi_1.62.2          cluster_2.1.4                
[147] BiocStyle_2.28.1             

©2024 The pasta authors. Content is published under Creative Commons CC-BY-4.0 License for the text and GPL-3 License for any code.

References

Anselin, Luc. 1995. “Local Indicators of Spatial AssociationLISA.” Geographical Analysis 27 (2): 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x.
———. 2024. An Introduction to Spatial Data Science with GeoDa: Volume 1: Exploring Spatial Data. CRC Press.
Anselin, Luc, Ibnu Syabri, and Youngihn Kho. 2009. GeoDa: An Introduction to Spatial Data Analysis.” In Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications, 73–89. Springer.
Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological) 57 (1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
Bivand, Roger. 2022. “R Packages for Analyzing Spatial Data: A Comparative Case Study with Areal Data.” Geographical Analysis 54 (3): 488–518. https://doi.org/10.1111/gean.12319.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gómez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY.
Cliff, Andrew David, and J Keith Ord. 1981. Spatial Processes: Models & Applications. Pion, London.
Fischer, Manfred M, Arthur Getis, et al. 2010. Handbook of Applied Spatial Analysis: Software Tools, Methods and Applications. Springer.
Geary, R. C. 1954. “The Contiguity Ratio and Statistical Mapping.” The Incorporated Statistician 5 (3): 115–46. https://doi.org/10.2307/2986645.
Getis, Arthur. 2009. “Spatial Weights Matrices.” Geographical Analysis 41 (4): 404–10. https://doi.org/10.1111/j.1538-4632.2009.00768.x.
Getis, Arthur, and J. K. Ord. 1992. “The Analysis of Spatial Association by Use of Distance Statistics.” Geographical Analysis 24 (3): 189–206. https://doi.org/10.1111/j.1538-4632.1992.tb00261.x.
Getis, Arthur, and JK Ord. 1996. “Local Spatial Statistics: An Overview. Spatial Analysis: Modeling in a GIS Environment. Longley, P., and M. Batty.”
He, Shanshan, Ruchir Bhatt, Carl Brown, Emily A. Brown, Derek L. Buhr, Kan Chantranuvatana, Patrick Danaher, et al. 2022. “High-Plex Imaging of RNA and Proteins at Subcellular Resolution in Fixed Tissue by Spatial Molecular Imaging.” Nature Biotechnology 40 (12): 1794–1806. https://doi.org/10.1038/s41587-022-01483-z.
Moran, P. A. P. 1950. “Notes on Continuous Stochastic Phenomena.” Biometrika 37 (1/2): 17–23. https://doi.org/10.2307/2332142.
Moses, Lambda, Pétur Helgi Einarsson, Kayla C. Jackson, Laura Luebbert, Ali Sina Booeshaghi, Sindri Emmanúel Antonsson, Nicolas Bray, Páll Melsted, and Lior Pachter. 2023. “Voyager: Exploratory Single-Cell Genomics Data Analysis with Geospatial Statistics.” bioRxiv. https://doi.org/10.1101/2023.07.20.549945.
Ord, J. Keith, and Arthur Getis. 2012. “Local Spatial Heteroscedasticity (LOSH).” The Annals of Regional Science 48 (2): 529–39. https://doi.org/10.1007/s00168-011-0492-y.
Ord, J. K., and Arthur Getis. 1995. “Local Spatial Autocorrelation Statistics: Distributional Issues and an Application.” Geographical Analysis 27 (4): 286–306. https://doi.org/10.1111/j.1538-4632.1995.tb00912.x.
Palla, Giovanni, Hannah Spitzer, Michal Klein, David Fischer, Anna Christina Schaar, Louis Benedikt Kuemmerle, Sergei Rybakov, et al. 2022. “Squidpy: A Scalable Framework for Spatial Omics Analysis.” Nature Methods 19 (2): 171–78. https://doi.org/10.1038/s41592-021-01358-2.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. 1st ed. New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
Rey, Sergio J., and Luc Anselin. 2010. PySAL: A Python Library of Spatial Analytical Methods.” In Handbook of Applied Spatial Analysis, 175–93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03647-7_11.
Virshup, Isaac, Sergei Rybakov, Fabian J Theis, Philipp Angerer, and F Alexander Wolf. 2024. “Anndata: Access and Store Annotated Data Matrices.” Journal of Open Source Software 9 (101): 4371. https://doi.org/10.21105/joss.04371.
Wang, Zhenzhou. 2019. “Cell Segmentation for Image Cytometry: Advances, Insufficiencies, and Challenges.” Cytometry Part A 95 (7): 708–11. https://doi.org/10.1002/cyto.a.23686.
Zuur, Alain F., Elena N. Ieno, and Graham M. Smith. 2007. Analysing Ecological Data. Statistics for Biology and Health. New York: Springer.