Show the code
source("utils.R")
theme_set(theme_minimal())
This website is still under active development - all content subject to change
January 30, 2025
In this vignette we will show:
Univariate lattice data analysis methods for imaging-based approaches.
This includes global metrics on the entire field of view and local variants thereof.
The use case is a CosMx data set from He et al. (2022).
The R
implementations rely on the Voyager package. The data is represented as SpatialFeatureExperiment (Moses et al. 2023). Complementary resources using this data and methods are found in the Voyager CosMx, vignette.
Python
implementations rely on the the packages esda, pysal and squidpy (Rey and Anselin 2010; Palla et al. 2022). Data representation rely on the anndata structure (Virshup et al. 2024).
import numpy as np
import scanpy as sc
import squidpy as sq
from esda.moran import Moran, Moran_Local
from esda.geary import Geary
from esda import Geary_Local, G_Local, LOSH
from esda.getisord import G
from scipy.stats import false_discovery_control
from libpysal.weights import W, KNN
from libpysal.cg import KDTree
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import warnings
warnings.filterwarnings("ignore")
plt.rcParams['font.family'] = 'Futura'
df_cmap_discrete = pd.read_csv("../misc/ditto_colors.csv", index_col=0)
df_cmap_continuous = pd.read_csv("../misc/roma_colors.csv", index_col=0)
cmap_discrete = ListedColormap(df_cmap_discrete["ditto_colors"].values, "ditto")
cmap_continuous = LinearSegmentedColormap.from_list("roma", df_cmap_continuous["roma_colors"]).reversed()
Until now, we have considered the cells to be represented in a point pattern. However, as cells have a shape and area, this might be an oversimplification in some cases. Alternatively, we can rely on the segmentation of individual cells that are available for various datasets. The outline of each cell is represented by a polygon and the collection of all cells can be seen as an irregular lattice. Unlike a regular lattice (e.g., spot-based spatial transcriptomics data), the sample areas in an irregular lattice can have different sizes and are not necessarily regularly distributed over the sample space.
For this representation of the cells we will rely on the SpatialFeatureExperiment package. For preprocessing of the dataset and code examples we refer the reader to the vignette of the voyager package (Moses et al. 2023). The voyager package provides wrapper functions around the package spdep (Pebesma and Bivand 2023) that work directly on the SpatialFeatureExperiment object.
class: SpatialFeatureExperiment
dim: 980 100290
metadata(0):
assays(1): counts
rownames(980): AATK ABL1 ... NegPrb22 NegPrb23
rowData names(3): means vars cv2
colnames(100290): 1_1 1_2 ... 30_4759 30_4760
colData names(17): Area AspectRatio ... nCounts nGenes
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : CenterX_global_px CenterY_global_px
imgData names(1): sample_id
unit: full_res_image_pixels
Geometries:
colGeometries: centroids (POINT), cellSeg (POLYGON)
Graphs:
sample01:
[1] 20
colData(sfe)$prop_neg <- colSums(counts(sfe)[neg_inds,])/colData(sfe)$nCounts
# Remove low quality cells
sfe <- sfe[,!sfe$is_empty & sfe$prop_neg < 0.1]
# Re-calculate stats
rowData(sfe)$is_neg <- neg_inds
# log Counts
sfe <- logNormCounts(sfe)
# save for python
colData(sfe)$CenterX_global_px <- spatialCoords(sfe)[,1]
colData(sfe)$CenterY_global_px <- spatialCoords(sfe)[,2]
For some examples we will show a subset of the tissue.
adata = sc.read_h5ad("../data/imaging_sfe.h5ad")
adata.obsm["spatial"] = np.column_stack([adata.obs["CenterX_global_px"], adata.obs["CenterY_global_px"]])
# normalise counts
adata.raw = adata.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# invert the spatial coordinates
adata.obsm["spatial"][:, 1] = adata.obsm["spatial"][:, 1].max() - adata.obsm["spatial"][:, 1]
adata
AnnData object with n_obs × n_vars = 27205 × 980
obs: 'Area', 'AspectRatio', 'Width', 'Height', 'Mean.MembraneStain', 'Max.MembraneStain', 'Mean.PanCK', 'Max.PanCK', 'Mean.CD45', 'Max.CD45', 'Mean.CD3', 'Max.CD3', 'Mean.DAPI', 'Max.DAPI', 'sample_id', 'nCounts', 'nGenes', 'is_empty', 'prop_neg', 'sizeFactor', 'CenterX_global_px', 'CenterY_global_px'
var: 'means', 'vars', 'cv2', 'is_neg'
uns: 'X_name', 'log1p'
obsm: 'spatial'
layers: 'logcounts'
Here we set the arguments for the examples below. We will continue with the two features KRT17 (basal cells) and TAGLN (smooth muscle cells) and specify a \(k\)-nearest neighbour weight matrix with \(k=5\).
A lattice consists of individual spatial units \(D = \{A_1, A_2,...,A_n\}\) where the units do not overlap. The data is then a realisation of a random variable along the lattice \(Y_i = Y (A_i)\) (Zuur, Ieno, and Smith 2007). The lattice is irregular, if the units have variable size and are not spaced regularly, such as is the case with cells in tissue.
More details about lattices can be found on here.
One of the challenges when working with (irregular) lattice data is the construction of a neighbourhood graph (Pebesma and Bivand 2023). The main question is, what to consider as neighbours, as this will affect downstream analyses. Various methods exist to define neighbours, such as contiguity-based neighbours (neighbours in direct contact), graph-based neighbours (e.g., \(k\)-nearest neighbours), distance-based neighbours or higher order neighbours (Getis 2009; Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023). The documentation of the package spdep
provides an overview of the different methods (R. Bivand 2022).
We consider first contiguity-based neighbours. As cell segmentation is notoriously imperfect, we add a snap
value, which means that we consider all cells with distance 20 or less as contiguous (Pebesma and Bivand 2023; Wang 2019).
colGraph(sfe, "poly2nb") <-
findSpatialNeighbors(sfe,
type = "cellSeg",
method = "poly2nb", # wraps the spdep function with the same name
style = "W",
snap = 20 # all cells with less distance apart are considered contiguous
)
p1 <- plotColGraph(sfe,
colGraphName = "poly2nb",
colGeometryName = "cellSeg",
bbox = c(xmin = 3500, xmax = 10000, ymin = 157200, ymax = 162200)
) + theme_void()
Alternatively, we can use a \(k\)-nearest neighbours approach. The choice of the number \(k\) is somewhat arbitrary.
colGraph(sfe, "knn5") <-
findSpatialNeighbors(sfe,
method = "knearneigh", # wraps the spdep function with the same name
k = 5,
zero.policy = TRUE
)
p2 <- plotColGraph(sfe,
colGraphName = "knn5",
colGeometryName = "cellSeg",
bbox = c(xmin = 3500, xmax = 10000, ymin = 157200, ymax = 162200)
) + theme_void()
#calculate binary nearest neighbour weight matrix too
colGraph(sfe, "binary") <-
findSpatialNeighbors(sfe,
method = "knearneigh", # wraps the spdep function with the same name
k = 5,
zero.policy = TRUE,
style = "B"
)
The graphs below show noticeable differences. In the contiguous neighbour graph on the left (neighbours in direct contact), we can see the formation of distinct patches that are not connected to the rest of the tissue. In addition, some cells do not have any direct neighbours. In contrast, the \(k\)-nearest neighbours (kNN) graph on the right reveals that these patches tend to be connected to the rest of the structure.
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
sq.pl.spatial_scatter(
adata[adata.obsp["spatial_connectivities"].nonzero()[0], :],
connectivity_key="spatial_connectivities",
size=0.1,
na_color="black",
edges_color="black",
edges_width=0.1,
shape=None,
library_id="spatial",
ax=ax,
fig=fig,
crop_coord = (3500, 3999, 10000, 8999)
)
Global methods give us an overview over the entire field-of-view and summarize spatial autocorrelation metrics to a single value. The metrics are a function of the weight matrix and the variables of interest. The variables of interest can be gene expression, intensity of a marker or the area of the cell. The global measures can be seen as a weighted average of the local metric, as explained below.
In general, a global spatial autocorrelation measure has the form of a double sum over all locations \(i,j\)
\[\sum_i \sum_j f(x_i,x_j) w_{ij}\]
where \(f(x_i,x_j)\) is the measure of association between features of interest and \(w_{ij}\) scales the relationship by a spatial weight as defined in the weight matrix \(W\). If \(i\) and \(j\) are not neighbours, i.e. we assume they do not have any spatial association, the corresponding element of the weight matrix is \(0\) (i.e., \(w_{ij} = 0\)). Below we will see that the function \(f\) varies between the different spatial autocorrelation measures (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Anselin 2024).
The global Moran’s \(I\) (Moran 1950) coefficient is a measure of spatial autocorrelation, defined as:
\[I = \frac{n}{\sum_i\sum_j w_{ij}} \frac{\sum_i\sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2}.\]
where \(x_i\) and \(x_j\) represent the values of the variable of interest at locations \(i\) and \(j\), \(\bar{x}\) is the mean of all \(x\) and \(w_{ij}\) is the spatial weight between the locations of \(i\) and \(j\). The expected value is close to \(0\) for large \(n\) (\(\mathbb{E}(I) = -1/(n-1)\)), whereas a value higher than this expectation indicates spatial autocorrelation. Negative values indicate negative autocorrelation. Moran’s \(I\) can be interpreted as the Pearson correlation between the value at location \(i\) and the average value of the neigbours of \(i\), (neighbours as defined in the weight matrix \(W\)) (Moran 1950; R. S. Bivand, Pebesma, and Gómez-Rubio 2013, 258–61; Cliff and Ord 1981, 21).
We can also use the moran.mc
function to calculate the Moran’s \(I\) coefficient. This function uses a Monte Carlo simulation to calculate the p-value.
DataFrame with 2 rows and 10 columns
means vars cv2 is_neg moran.mc_statistic_sample01
<numeric> <numeric> <numeric> <logical> <numeric>
KRT17 1.378333 14.67222 7.72303 FALSE 0.704628
TAGLN 0.714079 3.76205 7.37788 FALSE 0.244683
moran.mc_parameter_sample01 moran.mc_p.value_sample01
<numeric> <numeric>
KRT17 201 0.00497512
TAGLN 201 0.00497512
moran.mc_alternative_sample01 moran.mc_method_sample01
<character> <character>
KRT17 greater Monte-Carlo simulati..
TAGLN greater Monte-Carlo simulati..
moran.mc_res_sample01
<list>
KRT17 -0.00438618, 0.00266151,-0.00462597,...
TAGLN 0.00457890, 0.00114938,-0.00181819,...
for feature in features:
moran = Moran(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100)
adata.uns[f"moran_{feature}"] = moran.I
adata.uns[f"moran_{feature}_p_sim"] = moran.p_sim
for key in filter(lambda x: x.startswith("moran_"), adata.uns.keys()):
print(f"{key}: {adata.uns[key].round(4)}")
moran_KRT17: 0.7045
moran_KRT17_p_sim: 0.0099
moran_TAGLN: 0.2447
moran_TAGLN_p_sim: 0.0099
The result of global Moran’s \(I\) for the gene KRT17 is 0.705, the associated p-values is 0.005. This means that gene KRT17 shows positive autocorrelation
The result of global Moran’s \(I\) for the gene TAGLN is 0.245, the associated p-values is 0.005. This means that gene TAGLN shows positive autocorrelation
One should note that the result is dependent on the weight matrix. Different weight matrices will give different results. To compare Moran’s \(I\) coefficients of different features, we need to use the same weight matrix.
Geary’s \(C\) (Geary 1954) is a different measure of global autocorrelation and is very closely related to Moran’s \(I\). However, it focuses on spatial dissimilarity rather than similarity. Geary’s \(C\) is defined by
\[C = \frac{(n-1) \sum_i \sum_j w_{ij}(x_i-x_j)^2}{2\sum_i \sum_j w_{ij}\sum_i(x_i-\bar{x})^2}\]
where \(x_i\) and \(x_j\) represent the values of the variable of interest at locations \(i\) and \(j\), \(\bar{x}\) is the mean of all \(x\), \(w_{ij}\) is the spatial weight between the locations of \(i\) and \(j\) and \(n\) the total number of locations. The interpretation is opposite to Moran’s \(I\): a value smaller than \(1\) indicates positive autocorrelation whereas a value greater than \(1\) represents negative autocorrelation (Cliff and Ord 1981, 17; Pebesma and Bivand 2023; Fischer, Getis, et al. 2010, 255–65).
DataFrame with 2 rows and 16 columns
means vars cv2 is_neg moran.mc_statistic_sample01
<numeric> <numeric> <numeric> <logical> <numeric>
KRT17 1.378333 14.67222 7.72303 FALSE 0.704628
TAGLN 0.714079 3.76205 7.37788 FALSE 0.244683
moran.mc_parameter_sample01 moran.mc_p.value_sample01
<numeric> <numeric>
KRT17 201 0.00497512
TAGLN 201 0.00497512
moran.mc_alternative_sample01 moran.mc_method_sample01
<character> <character>
KRT17 greater Monte-Carlo simulati..
TAGLN greater Monte-Carlo simulati..
moran.mc_res_sample01 geary.mc_statistic_sample01
<list> <numeric>
KRT17 -0.00438618, 0.00266151,-0.00462597,... 0.295724
TAGLN 0.00457890, 0.00114938,-0.00181819,... 0.748246
geary.mc_parameter_sample01 geary.mc_p.value_sample01
<numeric> <numeric>
KRT17 1 0.00497512
TAGLN 1 0.00497512
geary.mc_alternative_sample01 geary.mc_method_sample01
<character> <character>
KRT17 greater Monte-Carlo simulati..
TAGLN greater Monte-Carlo simulati..
geary.mc_res_sample01
<list>
KRT17 1.003556,0.997671,1.002076,...
TAGLN 1.003166,0.994311,0.999439,...
for feature in features:
geary = Geary(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100)
adata.uns[f"geary_{feature}"] = geary.C
adata.uns[f"geary_{feature}_p_sim"] = geary.p_sim
for key in filter(lambda x: x.startswith("geary_"), adata.uns.keys()):
print(f"{key}: {adata.uns[key].round(4)}")
geary_KRT17: 0.2959
geary_KRT17_p_sim: 0.0099
geary_TAGLN: 0.7482
geary_TAGLN_p_sim: 0.0099
The result of global Geary’s \(C\) for the gene KRT17 is 0.296, the associated p-values is 0.005. This means that gene KRT17 shows positive autocorrelation
The result of global Geary’s \(C\) for the gene TAGLN is 0.748, the associated p-values is 0.005. This means that gene TAGLN shows positive autocorrelation
The global \(G\) (Getis and Ord 1992) statistic is a generalisation of the local version (see below) and summarises the contributions of all pairs of values \((x_i, x_j)\) in the dataset. Formally that is
\[G(d) = \frac{\sum_{i = 1}^n \sum_{j=1}^n w_{ij}(d)x_ix_j}{\sum_{i = 1}^n \sum_{j=1}^n x_i x_j} \text{s.t } j \neq i.\]
The interpretation works by calculating a z-statistic to compare the observed with the expected \(G(d)\) statistic. If this value is positive, we refer to this as a hot spot and if it is negative as a cold spot (Getis and Ord 1992; Anselin 2024).
The global \(G(d)\) statistic is very similar to global Moran’s \(I\). The global \(G(d)\) statistic is based on the sum of the products of the data points whereas global Moran’s \(I\) is based on the sum of the covariances. Since these two approaches capture different aspects of a structure, their values will differ as well. A good approach would be to not use one statistic in isolation but rather consider both (Getis and Ord 1992).
This framework was established for binary weights but can be extended to nonbinary weights as well (J. K. Ord and Getis 1995). We will use the spdep
package directly to calculate the global \(G\) statistic.
# Get the weight matrix from sfe object
weights_neighbourhoods_binary <- colGraph(sfe, 'binary')
# Change it to binary weights
weights_neighbourhoods_binary$style <- "B"
# Calculate the global G statistic
res <- spdep::globalG.test(x = logcounts(sfe)[features[1],],
listw = weights_neighbourhoods_binary)
res
Getis-Ord global G statistic
data: logcounts(sfe)[features[1], ]
weights: weights_neighbourhoods_binary
standard deviate = 183.94, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Global G statistic Expectation Variance
4.940037e-04 1.837965e-04 2.844296e-12
for feature in features:
getisord = G(adata[:, feature].X.toarray(), spatial_weights, permutations=0)
print("Global G statistic", feature, getisord.z_norm)
print("Global G statistic (raw)", feature, getisord.G)
print("Global G statistic (expected)", feature, getisord.EG)
print("p_norm", feature, getisord.p_norm)
Global G statistic KRT17 183.9060551202554
Global G statistic (raw) KRT17 0.0004943689285951892
Global G statistic (expected) KRT17 0.0001837965005146302
p_norm KRT17 0.0
Global G statistic TAGLN 63.77135359684779
Global G statistic (raw) TAGLN 0.00037284757367978574
Global G statistic (expected) TAGLN 0.0001837965005146302
p_norm TAGLN 0.0
The result of global Getis-Ord \(G\) for the gene KRT17 is 4.9^{-4}, the z-statistic is 183.93503 the associated p-values is 0. This means that gene KRT17 shows a hot spot
Unlike global measures that give an overview over the entire field of view, local measures report information about the statistic at each location (cell). There exist local analogs of Moran’s \(I\) and Geary’s \(C\) for which the global statistic can be represented as a weighted sum of the local statistics. As above, the local coefficients are based on both the spatial weights matrix and the values of the measurement of interest (Anselin 1995).
The local Moran’s \(I\) coefficient (Anselin 1995) is a measure of spatial autocorrelation at each location of interest. It is defined as:
\[I_i = \frac{x_i - \bar{x}}{\sum_{k=1}^n(x_k-\bar{x})^2/(n-1)} \sum_{j=1}^n w_{ij}(x_j - \bar{x})\]
where the index \(i\) refers to the location for which the measure is calculated. The interpretation is analogous to the global Moran’s \(I\) where a value of \(I_i\) higher than \(\mathbb{E}(I_i) = -w_i/(n-1)\) indicates spatial autocorrelation; smaller values indicate negative autocorrelation (Anselin 1995). It is important to note that, as for the global counterpart, the value of local Moran’s \(I\) could be a result from both the high or low end of the values. Since we measure and test a large number of locations simultaneously, we need to correct for multiple testing (e.g., using the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995)) (Pebesma and Bivand 2023).
for feature in features:
local_moran = Moran_Local(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transformation="r", permutations=100, seed=3407)
adata.obs[f"local_moran_{feature}"] = local_moran.Is
adata.obs[f"local_moran_{feature}_p"] = -np.log10(false_discovery_control(local_moran.p_z_sim*2))
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
sq.pl.spatial_scatter(
adata,
color=f"local_moran_{feature}",
cmap=cmap_continuous,
vmin=-adata.obs[f"local_moran_{feature}"].abs().max(),
vmax=adata.obs[f"local_moran_{feature}"].abs().max(),
vcenter=0,
shape=None,
library_id="spatial",
title=f"Local Moran's Is: {feature}",
ax=ax,
fig=fig,
size=pointsize,
scalebar_kwargs={"height_fraction":0.5}
)
ax.set_axis_off()
Similar to local Moran’s \(I\), there is a local Geary’s \(C\) (Anselin 1995) coefficient. It is defined as
\[C_i = \sum_{j=1}^n w_{ij}(x_i-x_j)^2\]
The interpretation is analogous to the global Geary’s \(C\) (value less than \(1\) indicates positive autocorrelation, a value greater than \(1\) highlights negative autocorrelation) (Anselin 1995).
for feature in features:
local_geary = Geary_Local(connectivity=spatial_weights, permutations=100, seed=3407)
local_geary_result = local_geary.fit(adata[:, feature].X.toarray().astype(np.float64))
adata.obs[f"local_geary_{feature}"] = local_geary_result.localG
adata.obs[f"local_geary_{feature}_p"] = -np.log10(false_discovery_control(local_geary_result.p_sim))
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
sq.pl.spatial_scatter(
adata,
color=f"local_geary_{feature}",
cmap=cmap_continuous,
vmin=-adata.obs[f"local_geary_{feature}"].abs().max(),
vmax=adata.obs[f"local_geary_{feature}"].abs().max(),
vcenter=0,
shape=None,
library_id="spatial",
title=f"Local Geary's Cs: {feature}",
ax=ax,
fig=fig,
size=pointsize
)
ax.set_axis_off()
The local Getis-Ord \(G_i\) (J. K. Ord and Getis 1995; Getis and Ord 1992) statistic quantifies the weighted concentration of points within a radius \(d\) and in a local region \(i\), according to:
\[G_i(d) = \frac{\sum_{j \neq i } w_{ij}(d)x_j}{\sum_{j \neq i} x_j}\]
There is a variant of this statistic, \(G_i^*(d)\), which is the same as \(G_i(d)\) except that the contribution when \(j=i\) is included in the term (Getis and Ord 1996).
The interpretation of the local Getis-Ord statistic \(G_i(d)\) is the same as for the global variant. The easiest is to calculate a z-statistic. If this value is positive, we call the region a hotspot, if it is negative, we call it a cold spot (Anselin 2024).
for feature in features:
local_getis_ord = G_Local(adata[:, feature].X.toarray().astype(np.float64), spatial_weights, transform="B", star=False, permutations=1000, seed=3407)
adata.obs[f"local_getis_ord_{feature}"] = local_getis_ord.Zs
adata.obs[f"local_getis_ord_{feature}_p"] = -np.log10(false_discovery_control(local_getis_ord.p_norm))
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
sq.pl.spatial_scatter(
adata,
color=f"local_getis_ord_{feature}",
cmap=cmap_continuous,
vmin=-adata.obs[f"local_getis_ord_{feature}"].abs().max(),
vmax=adata.obs[f"local_getis_ord_{feature}"].abs().max(),
vcenter=0,
shape=None,
library_id="spatial",
title=f"Local Getis-Ord $G_i$: {feature}",
ax=ax,
fig=fig,
size=pointsize
)
ax.set_axis_off()
The results above give an estimate of the local Getis-Ord statistic for each cell, but no significance value. This can be done by using a permutation approach using the localG_perm
argument.
Positive values indicate clustering of high values, i.e., hot spots, and negative values indicate clustering of low values, i.e., cold spots. The method does not detect outlier values because, unlike in local Moran’s \(I\), there is no cross-product between \(i\) and \(j\). But unlike local Moran’s \(I\), we know the type of interaction (high-high or low-low) between \(i\) and \(j\) (Anselin, Syabri, and Kho 2009; Anselin 2024).
The local spatial heteroscedasticity (LOSH) (J. Keith Ord and Getis 2012) is a measure of spatial autocorrelation that is based on the variance of the local neighbourhood. Unlike the other measures, this method does not assume homoscedastic variance over the whole tissue region. LOSH is defined as:
\[H_i(d) = \frac{\sum_j w_{ij}(d)|e_j(d)|^a}{\sum_j w_{ij}(d)}\]
where \(e_j(d) = x_j - \bar{x}_i(d), j \in N(i,d)\) are the local residuals that are subtracted from the local mean. The power \(a\) modulates the interpretation of the residuals (\(a=1\): residuals are interpreted as absolute deviations from the local mean; \(a=2\): residuals are interpreted as deviations from the local variance) (J. Keith Ord and Getis 2012).
The LOSH should be interpreted in combination with the local Getis-Ord \(G_i^*\) statistic. The \(G_i^*\) quantifies the local mean of the variable of interest, while \(H_i\) quantifies the local variance. This table provided by Ord and Getis (J. Keith Ord and Getis 2012) summarizes the interpretation of the combination of \(G_i^*\) and \(H_i\).
high \(H_i\) | low \(H_i\) | |
---|---|---|
large \(\|G_i^*\|\) | A hot spot with heterogeneous local conditions | A hot spot with similar surrounding areas; the map would indicate whether the affected region is larger than the single “cell” |
small \(\|G_i^*\|\) | Heterogeneous local conditions but at a low average level (an unlikely event) | Homogeneous local conditions and a low average level |
for feature in features:
losh = LOSH(connectivity=spatial_weights, inference="chi-square")
losh_result = losh.fit(adata[:, feature].X.toarray().astype(np.float64))
adata.obs[f"losh_{feature}"] = losh_result.Hi
fig, axes = plt.subplots(1, len(features), figsize=figsize, layout = "tight")
for feature, ax in zip(features, axes):
sq.pl.spatial_scatter(
adata,
color=f"losh_{feature}",
cmap="Blues",
vmin=0,
vmax=adata.obs[f"losh_{feature}"].abs().max(),
shape=None,
library_id="spatial",
title=f"Local spatial heteroscedasticity: {feature}",
ax=ax,
fig=fig,
size=pointsize
)
ax.set_axis_off()
The local methods presented above should always be interpreted with care, since we face the problem of multiple testing when calculating them for each cell. Moreover, the presented methods should mainly serve as exploratory measures to identify interesting regions in the data. Multiple processes can lead to the same pattern, thus from identifying the pattern we cannot infer the underlying process. Indication of clustering does not explain why this occurs. On the one hand, clustering can be the result of spatial interaction between the variables of interest. We have an accumulation of a gene of interest in one region of the tissue. On the other hand clustering can be the result spatial heterogeneity, when local similarity is created by structural heterogeneity in the tissue, e.g., that cells with uniform expression of a gene of interest are grouped together which then creates the apparent clustering of the gene expression measurement (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023; Anselin 2024).
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.7.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices datasets utils methods
[8] base
other attached packages:
[1] dixon_0.0-9 splancs_2.01-45
[3] sp_2.1-4 bluster_1.10.0
[5] magrittr_2.0.3 stringr_1.5.1
[7] spdep_1.3-8 spData_2.3.3
[9] tmap_3.3-4 scater_1.28.0
[11] scran_1.28.2 scuttle_1.10.3
[13] SFEData_1.2.0 SpatialFeatureExperiment_1.2.3
[15] Voyager_1.2.7 rgeoda_0.0.11-1
[17] digest_0.6.37 sf_1.0-19
[19] reshape2_1.4.4 patchwork_1.3.0
[21] STexampleData_1.8.0 ExperimentHub_2.8.1
[23] AnnotationHub_3.8.0 BiocFileCache_2.8.0
[25] dbplyr_2.3.4 rlang_1.1.4
[27] ggplot2_3.5.1 dplyr_1.1.4
[29] spatstat_3.3-0 spatstat.linnet_3.2-3
[31] spatstat.model_3.3-3 rpart_4.1.24
[33] spatstat.explore_3.3-3 nlme_3.1-166
[35] spatstat.random_3.3-2 spatstat.geom_3.3-4
[37] spatstat.univar_3.1-1 spatstat.data_3.1-4
[39] SpatialExperiment_1.10.0 SingleCellExperiment_1.22.0
[41] SummarizedExperiment_1.30.2 Biobase_2.60.0
[43] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4
[45] IRanges_2.34.1 S4Vectors_0.38.2
[47] BiocGenerics_0.46.0 MatrixGenerics_1.12.3
[49] matrixStats_1.5.0
loaded via a namespace (and not attached):
[1] spatstat.sparse_3.1-0 bitops_1.0-9
[3] httr_1.4.7 RColorBrewer_1.1-3
[5] tools_4.3.1 R6_2.5.1
[7] HDF5Array_1.28.1 mgcv_1.9-1
[9] anndata_0.7.5.6 rhdf5filters_1.12.1
[11] withr_3.0.2 gridExtra_2.3
[13] leaflet_2.2.2 leafem_0.2.3
[15] cli_3.6.3 labeling_0.4.3
[17] proxy_0.4-27 R.utils_2.12.3
[19] dichromat_2.0-0.1 scico_1.5.0
[21] limma_3.56.2 rstudioapi_0.15.0
[23] RSQLite_2.3.9 generics_0.1.3
[25] crosstalk_1.2.1 Matrix_1.5-4.1
[27] ggbeeswarm_0.7.2 abind_1.4-8
[29] R.methodsS3_1.8.2 terra_1.8-5
[31] lifecycle_1.0.4 yaml_2.3.10
[33] edgeR_3.42.4 rhdf5_2.44.0
[35] tmaptools_3.1-1 grid_4.3.1
[37] blob_1.2.4 promises_1.3.2
[39] dqrng_0.4.1 crayon_1.5.3
[41] dir.expiry_1.8.0 lattice_0.22-6
[43] beachmat_2.16.0 KEGGREST_1.40.1
[45] magick_2.8.5 pillar_1.10.1
[47] knitr_1.49 metapod_1.7.0
[49] rjson_0.2.23 boot_1.3-28.1
[51] codetools_0.2-19 wk_0.9.4
[53] glue_1.8.0 vctrs_0.6.5
[55] png_0.1-8 gtable_0.3.6
[57] assertthat_0.2.1 cachem_1.1.0
[59] xfun_0.50 S4Arrays_1.0.6
[61] mime_0.12 DropletUtils_1.20.0
[63] units_0.8-5 statmod_1.5.0
[65] interactiveDisplayBase_1.38.0 bit64_4.5.2
[67] filelock_1.0.3 irlba_2.3.5.1
[69] vipor_0.4.7 KernSmooth_2.23-26
[71] colorspace_2.1-1 DBI_1.2.3
[73] zellkonverter_1.9.0 raster_3.6-30
[75] tidyselect_1.2.1 bit_4.5.0.1
[77] compiler_4.3.1 curl_6.1.0
[79] BiocNeighbors_1.18.0 basilisk.utils_1.12.1
[81] DelayedArray_0.26.7 scales_1.3.0
[83] classInt_0.4-10 rappdirs_0.3.3
[85] goftest_1.2-3 spatstat.utils_3.1-2
[87] rmarkdown_2.29 basilisk_1.11.2
[89] XVector_0.40.0 htmltools_0.5.8.1
[91] pkgconfig_2.0.3 base64enc_0.1-3
[93] sparseMatrixStats_1.12.2 fastmap_1.2.0
[95] htmlwidgets_1.6.4 shiny_1.10.0
[97] DelayedMatrixStats_1.22.6 farver_2.1.2
[99] jsonlite_1.8.9 BiocParallel_1.34.2
[101] R.oo_1.27.0 BiocSingular_1.16.0
[103] RCurl_1.98-1.16 GenomeInfoDbData_1.2.10
[105] s2_1.1.7 Rhdf5lib_1.22.1
[107] munsell_0.5.1 Rcpp_1.0.13-1
[109] ggnewscale_0.5.0 viridis_0.6.5
[111] reticulate_1.40.0 stringi_1.8.4
[113] leafsync_0.1.0 zlibbioc_1.46.0
[115] plyr_1.8.9 parallel_4.3.1
[117] ggrepel_0.9.6 deldir_2.0-4
[119] Biostrings_2.68.1 stars_0.6-7
[121] splines_4.3.1 tensor_1.5
[123] locfit_1.5-9.10 igraph_2.1.2
[125] ScaledMatrix_1.8.1 BiocVersion_3.17.1
[127] XML_3.99-0.18 evaluate_1.0.1
[129] renv_1.0.11 BiocManager_1.30.25
[131] httpuv_1.6.15 purrr_1.0.2
[133] polyclip_1.10-7 rsvd_1.0.5
[135] lwgeom_0.2-14 xtable_1.8-4
[137] e1071_1.7-16 RSpectra_0.16-2
[139] later_1.4.1 viridisLite_0.4.2
[141] class_7.3-22 tibble_3.2.1
[143] memoise_2.0.1 beeswarm_0.4.0
[145] AnnotationDbi_1.62.2 cluster_2.1.4
[147] BiocStyle_2.28.1
©2024 The pasta
authors. Content is published under Creative Commons CC-BY-4.0 License for the text and GPL-3 License for any code.