Show the code
source("utils.R")
theme_set(theme_light())
This website is still under active development - all content subject to change
February 4, 2025
In this vignette we will show:
Multivariate lattice data analysis methods for HTS-based approaches.
This includes global metrics on the entire field of view and local variants thereof.
The use case is a 10x Visium data set from 10x Genomics (2021) downloaded from the squidpy visium datasets (Palla et al. 2022).
The R
implementations rely on the Voyager package. The data is represented as SpatialFeatureExperiment (Moses et al. 2023). Complementary resources using these methods are found in the Voyager bivariate vignette and Voyager multivariate vignette.
Python
implementations rely on the the packages esda, pysal and squidpy (Rey and Anselin 2010; Palla et al. 2022). Data representation rely on the anndata structure (Virshup et al. 2024).
import numpy as np
import scanpy as sc
import squidpy as sq
from esda.moran import Moran_BV, Moran_Local_BV
from esda.lee import Spatial_Pearson, Spatial_Pearson_Local
from esda.geary_local_mv import Geary_Local_MV
from scipy.stats import false_discovery_control
from libpysal.cg import KDTree
from libpysal.weights import W, KNN
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import warnings
warnings.filterwarnings("ignore")
plt.rcParams['font.family'] = 'Futura'
df_cmap_discrete = pd.read_csv("../misc/ditto_colors.csv", index_col=0)
df_cmap_continuous = pd.read_csv("../misc/roma_colors.csv", index_col=0)
cmap_discrete = ListedColormap(df_cmap_discrete["ditto_colors"].values, "ditto")
cmap_continuous = LinearSegmentedColormap.from_list("roma", df_cmap_continuous["roma_colors"]).reversed()
#load the dataset
adata = sq.datasets.visium("Visium_FFPE_Mouse_Kidney")
# normalise counts already here that both R and python have same values
adata.raw = adata.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# store data as h5ad object
adata.write("../data/adata_neighbors.h5ad")
library(SpatialExperiment)
sce <- zellkonverter::readH5AD("../data/adata_neighbors.h5ad")
colnames(reducedDim(sce, "spatial")) <- c("x", "y")
spe <- toSpatialExperiment(sce)
spatialCoords(spe) <- reducedDim(sce, "spatial")
sfe <- SpatialFeatureExperiment::toSpatialFeatureExperiment(spe)
# the counts are already normalised
assayNames(sfe) <- "logcounts"
sfe
class: SpatialFeatureExperiment
dim: 19465 3124
metadata(2): log1p spatial
assays(1): logcounts
rownames(19465): Xkr4 Rp1 ... Gm3376 Gm20830
rowData names(3): gene_ids feature_types genome
colnames(3124): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ...
TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
colData names(4): in_tissue array_row array_col sample_id
reducedDimNames(1): spatial
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : x y
imgData names(0):
unit:
Geometries:
colGeometries: centroids (POINT)
Graphs:
sample01:
AnnData object with n_obs × n_vars = 3124 × 19465
obs: 'in_tissue', 'array_row', 'array_col'
var: 'gene_ids', 'feature_types', 'genome'
uns: 'spatial', 'log1p'
obsm: 'spatial'
Given this data from 10x we choose two genes to analyse henceforth, named Slc5a12 (solute carrier family 5, member 12) and Calr (calreticulin) (10x Genomics 2021). For further information on the preprocessing of visium datasets, please refer to the Voyager 10X Visium vignette.
Here we set the arguments for the examples below.
Spot based data is collected along a regular spaced grid where all sample areas have the same size. Such a grid is also called a regular lattice. In more rigorous terms the data \(Y\) is a realisation of a random variable fixed along a lattice \(D\). The lattice \(D\) does not have to be regular but in the scope of spot based data it is. Spot-based data is generated by a defined sampling strategy, whereas point pattern data is the result of a stochastic process (Zuur, Ieno, and Smith 2007).
The lattice is composed of individual spatial units
\[D = \{A_1, A_2,...,A_n\}\]
where these units do not overlap
\[A_i \cap A_j = \emptyset \forall i \neq j\]
The data is then a random variable of the spatial unit along the lattice
\[Y_i = Y(A_i)\]
Most lattice data analysis technique build on the concept of neighbours. Therefore, the spatial relationship can be modelled with e.g. a spatial weight matrix \(W\). There are a lot of ways to define a spatial weight matrix \(W\). For example, the spatial weight matrix can be defined as a binary contiguity matrix where adjacent units are indicated by one and non-adjacent units as zero.
\[w_{ij} = \begin{cases} 1 \text{ if } A_i \text{ and } A_j \text{ are adjacent}\\ 0 \text{ otw} \end{cases}\]
other options to specify the weight matrix \(W\) are mentioned in Zuur, Ieno, and Smith (2007).
Voyager
has a special function for the construction of the weight matrix in Visium data findVisiumGraph
. This function defines the neigbours to be the surrounding spots.
# there is an empty spot we have to exclude as it has no neighbours
sfe <- sfe[, colnames(sfe) != "GAGATCTGCTTGGCAT-1"]
# construct the weight matrix using the Voyager function
colGraph(sfe, "visium") <- findVisiumGraph(sfe)
colGraph(sfe, "binary") <- findVisiumGraph(sfe, style = "B")
plotColGraph(sfe,
colGraphName = colGraphName
) + theme_void()
sq.gr.spatial_neighbors(adata, n_neighs=6, coord_type="grid")
# remove unconnected spots
adata = adata[adata.obsp["spatial_connectivities"].sum(1).A1 > 0, :]
# calculate spatial weights
spatial_weights = W.from_sparse(adata.obsp["spatial_connectivities"])
# define a wider neighbourhood graph - result is more smoothed
sq.gr.spatial_neighbors(adata, n_neighs=50, coord_type="grid", key_added="wide_spatial")
spatial_weights_wide = W.from_sparse(adata.obsp["wide_spatial_connectivities"])
fig, ax = plt.subplots(1, 1, figsize=(7, 7), layout = "tight")
sq.pl.spatial_scatter(
adata[adata.obsp["spatial_connectivities"].nonzero()[0], :],
connectivity_key="spatial_connectivities",
size=0.1,
na_color="black",
edges_color="black",
edges_width=0.1,
shape=None,
library_id="spatial",
ax=ax,
fig=fig,
)
ax.set_axis_off()
fig.suptitle("Connectivity grid", size=10, y=0.9)
fig.set_dpi(200)
For two continous variables the global bivariate Moran’s \(I\) is defined as (Wartenberg 1985; Bivand 2022)
\[I_B = \frac{\Sigma_i(\Sigma_j{w_{ij}y_j\times x_i})}{\Sigma_i{x_i^2}}\]
where \(x_i\) and \(y_j\) are the two variables of interest and \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\).
The global bivariate Moran’s \(I\) is a measure of correlation between the variables \(x\) and \(y\) where \(y\) has a spatial lag. The result might overestimate the spatial autocorrelation of the variables due to the non-spatial correlation of \(x\) and \(y\) (Bivand 2022).
DATA PERMUTATION
Call:
boot(data = xx, statistic = bvm_boot, R = nsim, sim = "permutation",
listw = listw, parallel = parallel, ncpus = ncpus, cl = cl)
Bootstrap Statistics :
original bias std. error
t1* 0.2663111 -0.2657355 0.007882321
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 499 bootstrap replicates
CALL :
boot::boot.ci(boot.out = res, conf = c(0.99, 0.95, 0.9), type = "basic")
Intervals :
Level Basic
99% ( 0.5116, 0.5538 )
95% ( 0.5152, 0.5473 )
90% ( 0.5188, 0.5452 )
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
np.random.seed(3407)
moran_bv = Moran_BV(
adata[:, features[0]].X.toarray(),
adata[:, features[1]].X.toarray(),
spatial_weights,
transformation="r",
permutations=499,
)
adata.uns[f"moran_bv_{features[0]}_{features[1]}"] = moran_bv.I
adata.uns[f"moran_bv_{features[0]}_{features[1]}_p_sim"] = moran_bv.p_sim
for key in filter(lambda x: x.startswith("moran_"), adata.uns.keys()):
print(f"{key}: {adata.uns[key].round(4)}")
moran_bv_Slc5a12_Calr: 0.2663
moran_bv_Slc5a12_Calr_p_sim: 0.002
plt.figure(figsize=(5, 4), dpi=100)
hist = plt.hist(moran_bv.sim, bins=30, color="lightgrey", edgecolor="black")
plt.axvline(moran_bv.I, color="red", linestyle="--")
plt.xlabel("Simulated bivariate Moran's I")
plt.ylabel("Simulated frequency")
plt.title(f"Bivariate Moran's I: {features[0]} vs {features[1]} (I: {moran_bv.I.round(4)}, p: {moran_bv.p_sim.round(4)})")
sns.despine()
The value t0
indicates the test statistic of global bivariate Moran’s \(I\). The global bivariate Moran’s \(I\) value for the genes Slc5a12, Calr is 0.2663111. Significance can be assessed by comparing the permuted confidence interval with the test statistic.
Lee’s \(L\) is a bivariate measure that combines non-spatial pearson correlation with spatial autocorrelation via Moran’s \(I\) (Lee 2001). This enables us to asses the spatial dependence of two continuous variables in a single measure. The measure is defined as
\[L(x,y) = \frac{n}{\sum_{i=1}^n(\sum_{j=1}^nw_{ij})^2}\frac{\sum_{i=1}^n[\sum_{j=1}^nw_{ij}(x_j-\bar{x})](\sum_{j=1}^nw_{ij}(y_j-\bar{y}))}{\sqrt{\sum_{i=1}^n(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^n(y_i-\bar{y})^2}}\]
where \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\), \(x\) and \(y\), the two variables of interest and \(\bar{x}\) and \(\bar{y}\) their means (Lee 2001; Bivand 2022).
statistic
0.26808
[1] 0.002
The effect size of bivariate Lee’s \(L\) for the genes Slc5a12, Calr is 0.26808 and the associated p-value is 0.002
Spatial_Pearson(connectivity=<COOrdinate sparse array of dtype 'float64' with 18018 stored elements and shape (3123, 3123)>, permutations=499)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Spatial_Pearson(connectivity=<COOrdinate sparse array of dtype 'float64' with 18018 stored elements and shape (3123, 3123)>, permutations=499)
adata.uns[f"lees_l_{features[0]}_{features[1]}"] = lees_l_estimator.association_
adata.uns[f"lees_l_{features[0]}_{features[1]}_p_sim"] = lees_l_estimator.significance_
fig, ax = plt.subplots(figsize=(5, 5))
sns.heatmap(
lees_l_estimator.association_,
cmap=cmap_continuous,
annot=True,
cbar=False,
ax=ax,
mask=np.triu(np.ones_like(lees_l_estimator.association_)) - np.eye(2),
)
ax.set_title("Lee's L")
ax.set_xticklabels(features[:2])
ax.set_yticklabels(features[:2], rotation=0)
for i, row in enumerate(lees_l_estimator.significance_):
for j, p in enumerate(row):
ax.text(j + 0.5, i + 0.75, f"p = {p:.3f}", ha="center", va="center", color="white", fontsize=8)
Similar to the global bivariate Moran’s \(I\) statistic, there is a local analogue. The formula is given by:
\[ I_i^B = x_i\sum_jw_{ij}y_j \]
This can be interesting in the context of detection of coexpressed ligand-receptor pairs. A method that is based on local bivariate Moran’s \(I\) and tries to detect such pairs is SpatialDM
(Li et al. 2023).
sfe_tissue <- runBivariate(sfe, type = "localmoran_bv",
feature1 = features[1], feature2 = features[2],
colGraphName = colGraphName,
nsim = 499)
plotLocalResult(sfe_tissue, "localmoran_bv",
features = localResultFeatures(sfe_tissue, "localmoran_bv"),
ncol = 2, divergent = TRUE, diverge_center = 0,
colGeometryName = colGeometryName, size = plotsize)
local_moran_bv = Moran_Local_BV(
adata[:, features[0]].X.toarray().astype(np.float64),
adata[:, features[1]].X.toarray().astype(np.float64),
spatial_weights,
transformation="r",
permutations=499,
seed=3407,
)
adata.obs[f"local_moran_bv_{features[0]}_{features[1]}"] = local_moran_bv.Is
adata.obs[f"local_moran_bv_{features[0]}_{features[1]}_p_sim"] = local_moran_bv.p_sim
fig, ax = plt.subplots(1, 1, figsize=figsize, layout = "tight")
sq.pl.spatial_scatter(
adata,
color=f"local_moran_bv_{features[0]}_{features[1]}",
cmap=cmap_continuous,
vmin=-adata.obs[f"local_moran_bv_{features[0]}_{features[1]}"].abs().max(),
vmax=adata.obs[f"local_moran_bv_{features[0]}_{features[1]}"].abs().max(),
vcenter=0,
shape=None,
library_id="spatial",
title=f"Local bivariate Moran's I: {features[0]}"+ r"$\rightarrow$" + f"{features[1]}",
ax=ax,
fig=fig,
size=pointsize,
)
ax.set_axis_off()
Similar to the global variant of Lee’s \(L\) the local variant (Lee 2001; Bivand 2022) is defined as
\[L_i(x,y) = \frac{n(\sum_{j=1}^nw_{ij}(x_j-\bar{x}))(\sum_{j=1}^nw_{ij}(y_j-\bar{y}))}{\sqrt{\sum_{i=1}^n(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^n(y_i-\bar{y})^2}}\] Local Lee’s \(L\) is a measure of spatial co-expression, when the variables of interest are gene expression measurements. Unlike the gobal version, the variables are not averaged and show the local contribution to the metric. Positive values indicate colocalization, negative values indicate segregation (Lee 2001; Bivand 2022).
sfe_tissue <- runBivariate(sfe, type = "locallee",
feature1 = features[1], feature2 = features[2],
colGraphName = colGraphName)
plotLocalResult(sfe_tissue, "locallee",
features = localResultFeatures(sfe_tissue, "locallee"),
ncol = 2, divergent = TRUE, diverge_center = 0,
colGeometryName = colGeometryName, size = plotsize)
Spatial_Pearson_Local(connectivity=<COOrdinate sparse array of dtype 'float64' with 18018 stored elements and shape (3123, 3123)>, permutations=499)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Spatial_Pearson_Local(connectivity=<COOrdinate sparse array of dtype 'float64' with 18018 stored elements and shape (3123, 3123)>, permutations=499)
adata.obs[f"local_lees_l_{features[0]}_{features[1]}"] = local_lees_l_estimator.associations_
fig, ax = plt.subplots(1, 1, figsize=figsize, layout = "tight")
sq.pl.spatial_scatter(
adata,
color=f"local_lees_l_{features[0]}_{features[1]}",
cmap=cmap_continuous,
vmin=-adata.obs[f"local_lees_l_{features[0]}_{features[1]}"].abs().max(),
vmax=adata.obs[f"local_lees_l_{features[0]}_{features[1]}"].abs().max(),
vcenter=0,
shape=None,
library_id="spatial",
title=f"Local Lees's Ls: {features[0]}"+ r"$\leftrightarrow$" + f"{features[1]}",
ax=ax,
fig=fig,
size=pointsize,
)
ax.set_axis_off()
Geary’s \(C\) is a measure of spatial autocorrelation that is based on the difference between a variable and its neighbours. (Anselin 2019, 1995) defines it as
\[c_i = \sum_{j=1}^n w_{ij}(x_i-y_j)^2\]
and can be generalized to \(k\) features (in our case genes) by expanding
\[c_{k,i} = \sum_{v=1}^k c_{v,i}\]
where \(c_{v,i}\) is the local Geary’s \(C\) for the \(v\)th variable at location \(i\). The number of variables that can be used is not fixed, which makes the interpretation a bit more difficult. In general, the metric summarizes similarity in the “multivariate attribute space” (i.e. the gene expression) to its geographic neighbours. The common difficulty in these analyses is the interpretation of the mixture of similarity in the physical space and similarity in the attribute space (Anselin 2019, 1995).
To speed up computation we will use highly variable genes.
hvgs <- getTopHVGs(sfe, n = 100)
# Subset of the tissue
sfe_tissue <- runMultivariate(sfe, type = "localC_multi",
subset_row = hvgs,
colGraphName = colGraphName)
# # Local C mutli is stored in colData so this is a workaround to plot it
# plotSpatialFeature(sfe_tissue, "localC_multi", size = plotsize, scattermore = FALSE)
sc.pp.highly_variable_genes(adata, n_top_genes=100, flavor="seurat_v3", inplace=True)
local_geary_mv_estimator = Geary_Local_MV(connectivity=spatial_weights, permutations=100)
local_geary_mv_estimator.fit(
[
adata[:, highly_variable_gene].X.toarray()[:, 0]
for highly_variable_gene in adata.var_names[adata.var["highly_variable"]].to_list()
]
)
Geary_Local_MV(connectivity=<libpysal.weights.weights.W object at 0x34781e860>, permutations=100)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Geary_Local_MV(connectivity=<libpysal.weights.weights.W object at 0x34781e860>, permutations=100)
We can further plot the results of the permutation test. Significant values indicate interesting regions, but should be interpreted with care for various reasons. For example, we are looking for similarity in a combination of multiple features but the exact combination is not known. Anselin (2019) write “Overall, however, the statistic indicates a combination of the notion of distance in multi-attribute space with that of geographic neighbors. This is the essence of any spatial autocorrelation statistic. It is also the trade-off encountered in spatially constrained multivariate clustering methods (for a recent discussion, see, e.g., Grubesic, Wei, and Murray 2014).”. Multi-attribute space refers here to the highly variable genes. The problem can be summarised to where the similarity comes from, the gene expression or the physical space (Anselin 2019). The same problem is common in spatial domain detection methods.
fig, ax = plt.subplots(1, 2, figsize=(len(features)*7, 7), layout = "tight")
sq.pl.spatial_scatter(
adata,
color=["local_geary_mv", "local_geary_mv_p"],
cmap="Blues",
vmin=0,
shape=None,
library_id="spatial",
title=["Geary's local multivariate C", "Simulated $-log_{10}(p_{adjusted})$"],
ax=ax,
fig=fig,
size=pointsize
)
for ax in ax:
ax.set_axis_off()
plotted are the effect size and the adjusted p-values in space.
This test is useful to assess the overlap of the \(k\)-nearest neighbours from physical distances (tissue space) with the \(k\)-nearest neighbours from the gene expression measurements (attribute space). \(k\)-nearest neighbour matrices are computed for both physical and attribute space. In a second step the probability of overlap between the two matrices is computed (Anselin and Li 2020).
sf <- colGeometries(sfe)[[colGeometryName]]
sf <- cbind(sf, t(as.matrix(logcounts(sfe)[hvgs,])))
# "-" gets replaced by "." so harmonise here
hvgs <- gsub("-", ".", hvgs)
nbr_test <- neighbor_match_test(sf[c(hvgs)], k = 6)
sf$Probability <- nbr_test$Probability
sf$Cardinality <- nbr_test$Cardinality
p <- tm_shape(sf) + tm_dots(col = 'Cardinality', size = 0.06 * plotsize)
q <- tm_shape(sf) + tm_dots(col = 'Probability', size = 0.06 * plotsize)
tmap_arrange(p,q)
k = 20
# Spatial grid neighbors
df_neighbors_spatial = spatial_weights.to_adjlist()
df_neighbors_features = KNN(KDTree(adata[:, adata.var["highly_variable"]].X.toarray(), distance_metric="euclidean"), k=k).to_adjlist()
focal_points = sorted(set(df_neighbors_spatial.focal).intersection(df_neighbors_features.focal))
focal_points_names = adata.obs_names[focal_points]
df_neighborhood_match_test = pd.DataFrame(columns=["neighbors_match_count", "neighbors_match_fraction"], index=focal_points_names)
for focal_point, focal_name in zip(focal_points, focal_points_names):
neighbors_spatial = set(df_neighbors_spatial[df_neighbors_spatial.focal == focal_point].neighbor)
neighbors_features = set(df_neighbors_features[df_neighbors_features.focal == focal_point].neighbor)
neighbors_match_count = len(neighbors_spatial.intersection(neighbors_features))
neighbors_match_fraction = neighbors_match_count / len(neighbors_spatial)
df_neighborhood_match_test.loc[focal_name] = [neighbors_match_count, neighbors_match_fraction]
adata.obs["neighbors_match_count"] = df_neighborhood_match_test.loc[adata.obs_names, "neighbors_match_count"]
adata.obs["neighbors_match_fraction"] = df_neighborhood_match_test.loc[adata.obs_names, "neighbors_match_fraction"]
# plot the results
neighborhood_match_test_features = ["neighbors_match_count", "neighbors_match_fraction"]
fig, axes = plt.subplots(1, len(neighborhood_match_test_features), figsize=(len(neighborhood_match_test_features)*5, 5), layout = "tight")
for feature, ax in zip(neighborhood_match_test_features, axes):
title = feature.replace("_", " ").capitalize()
sq.pl.spatial_scatter(adata,
color=feature,
shape=None,
size=pointsize,
cmap="YlOrBr",
title=title,
ax=ax,
fig=fig,
use_raw=False)
ax.set_axis_off()
Cardinality is a measure of how many neighbours of the two matrices are common. Some regions show high cardinality with low probability and therefore share similarity on both attribute and physical space. In contrast to multivariate local Geary’s \(C\) this metric focuses directly on the distances and not on a weighted average. A problem of this approach is called the empty space problem which states that as the number of dimensions of the feature sets increase, the empty space between observations also increases (Anselin and Li 2020).
In addition to measures of spatial autocorrelation of continuous data as seen above, the join count statistic method applies the same concept to binary and categorical data. In essence, the joint count statistic compares the distribution of categorical marks in a lattice with frequencies that would occur randomly. These random occurrences can be computed using a theoretical approximation or random permutations. The same concept was also extended in a multivariate setting with more than two categories. The corresponding spdep
functions are called joincount.test
and joincount.multi
(Dale and Fortin 2014; Bivand 2022; Cliff and Ord 1981).
First, we need to get categorical marks for each data point. We do so by running (non-spatial) PCA on the data followed by Leiden clustering (Traag, Waltman, and Van Eck 2019).
library(BiocNeighbors)
library(BiocSingular)
set.seed(123)
# Run PCA on the sample
sfe <- runPCA(sfe, exprs_values = "logcounts", ncomponents = 50, BSPARAM = IrlbaParam())
# Cluster based on first 20 PC's and using leiden
colData(sfe)$cluster <- clusterRows(reducedDim(sfe, "PCA")[,1:10],
BLUSPARAM = KNNGraphParam(
k = 20,
BNPARAM=AnnoyParam(ntrees=50),
cluster.fun = "leiden",
cluster.args = list(
resolution = 0.3,
objective_function = "modularity")))
plotSpatialFeature(sfe,
"cluster",
colGeometryName = colGeometryName, size = plotsize
)
from sklearn_ann.kneighbors.annoy import AnnoyTransformer
np.random.seed(123)
#compute a PCA on the
sc.pp.pca(adata, n_comps = 50, zero_center = True, svd_solver = "arpack")
#compute the neighbours
sc.pp.neighbors(adata, use_rep = "X_pca", knn = True, n_pcs = 10, transformer=AnnoyTransformer(n_neighbors=20, n_trees=50))
#compute leiden clustering
sc.tl.leiden(adata, resolution = 0.3, flavor = "igraph", objective_function = "modularity")
fig, ax = plt.subplots(1, 1, figsize=figsize, layout = "tight")
sq.pl.spatial_scatter(
adata,
color="leiden",
palette=ListedColormap(cmap_discrete.colors[:len(np.unique(adata.obs["leiden"]))]),
shape=None,
library_id="spatial",
title="Clusters",
ax=ax,
fig=fig,
size=pointsize
)
ax.set_axis_off()
The join count statistics of this example are:
Joincount Expected Variance z-value
1:1 1595.000 342.131 230.086 82.596
2:2 2964.000 1210.328 529.827 76.187
3:3 1555.000 390.990 255.053 72.885
4:4 666.000 62.703 53.387 82.568
5:5 1215.000 192.554 144.048 85.189
2:1 3.000 1288.619 776.179 -46.146
3:1 283.000 732.656 509.910 -19.913
3:2 463.000 1377.489 824.087 -31.856
4:1 132.000 293.738 226.149 -10.755
4:2 0.000 552.265 351.323 -29.464
4:3 0.000 313.995 238.454 -20.334
5:1 24.000 514.322 376.868 -25.257
5:2 51.000 966.993 596.735 -37.497
5:3 4.000 549.793 397.877 -27.362
5:4 54.000 220.424 178.065 -12.472
Jtot 1014.000 6810.293 1497.669 -149.776
sq.gr.interaction_matrix(adata, "leiden", normalized = False, connectivity_key="spatial", weights = False)
df_interactions = pd.DataFrame(adata.uns["leiden_interactions"], columns=np.unique(adata.obs["leiden"]), index=np.unique(adata.obs["leiden"]))
# add lower triangular matrix (w/o diagonal) to the dataframe and divide by 2
array_join_counts = (df_interactions + np.tril(df_interactions, k = -1).T)/2
#only print the upper triangular matrix
np.triu(array_join_counts)
array([[1.254e+03, 2.000e+00, 1.310e+02, 3.100e+02, 0.000e+00, 6.000e+00],
[0.000e+00, 2.938e+03, 0.000e+00, 4.530e+02, 1.900e+01, 2.500e+01],
[0.000e+00, 0.000e+00, 9.920e+02, 6.000e+00, 0.000e+00, 7.300e+01],
[0.000e+00, 0.000e+00, 0.000e+00, 1.552e+03, 0.000e+00, 1.600e+01],
[0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 3.240e+02, 5.300e+01],
[0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 8.550e+02]])
The Python
function sq.gr.interaction_matrix
counts the interaction for each pair twice, while the R
function joincount.multi
counts each interaction only once. Therefore, in Python
we add the lower triangular matrix to the upper triangle (without the diagonal) and divide the resulting interaction matrix by 2. Since there are differences in the implementation of the principal component calculcation (namely in the SVD decomposition of the sparse logcounts matrix), the results are not perfectly corresponding, c.f. Rich et al. (2024).
The rows show different combinations of clusters that are in physical contact. E.g. \(1:1\) means the cluster \(1\) with itself. The column Joincount
is the observed statistic whereas the column Expected
is the expected value of the statistic for this combination. Like this, we can compare whether contacts among cluster combinations occur more frequently than expected at random (Cliff and Ord 1981).
The local methods presented above should always be interpreted with care, since we face the problem of multiple testing when calculating them for each cell. Moreover, the presented methods should mainly serve as exploratory measures to identify interesting regions in the data. Multiple processes can lead to the same pattern, thus the underlying process cannot be inferred from characterising the pattern. Indication of clustering does not explain why this occurs. On one hand, clustering can be the result of spatial interaction between the variables of interest. This is the case if a gene of interest is highly expressed in a tissue region. On the other hand, clustering can be the result of spatial heterogeneity, when local similarity is created by structural heterogeneity in the tissue, e.g., when cells with uniform expression of a gene of interest are grouped together which then creates the apparent clustering of the gene expression measurement.
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.7.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] BiocSingular_1.16.0 BiocNeighbors_1.18.0
[3] dixon_0.0-9 splancs_2.01-45
[5] sp_2.1-4 bluster_1.10.0
[7] magrittr_2.0.3 stringr_1.5.1
[9] spdep_1.3-8 spData_2.3.3
[11] tmap_3.3-4 scater_1.28.0
[13] scran_1.28.2 scuttle_1.10.3
[15] SFEData_1.2.0 SpatialFeatureExperiment_1.2.3
[17] Voyager_1.2.7 rgeoda_0.0.11-1
[19] digest_0.6.37 sf_1.0-19
[21] reshape2_1.4.4 patchwork_1.3.0
[23] STexampleData_1.8.0 ExperimentHub_2.8.1
[25] AnnotationHub_3.8.0 BiocFileCache_2.8.0
[27] dbplyr_2.3.4 rlang_1.1.4
[29] ggplot2_3.5.1 dplyr_1.1.4
[31] spatstat_3.3-0 spatstat.linnet_3.2-3
[33] spatstat.model_3.3-3 rpart_4.1.24
[35] spatstat.explore_3.3-3 nlme_3.1-166
[37] spatstat.random_3.3-2 spatstat.geom_3.3-4
[39] spatstat.univar_3.1-1 spatstat.data_3.1-4
[41] SpatialExperiment_1.10.0 SingleCellExperiment_1.22.0
[43] SummarizedExperiment_1.30.2 Biobase_2.60.0
[45] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4
[47] IRanges_2.34.1 S4Vectors_0.38.2
[49] BiocGenerics_0.46.0 MatrixGenerics_1.12.3
[51] matrixStats_1.5.0
loaded via a namespace (and not attached):
[1] spatstat.sparse_3.1-0 bitops_1.0-9
[3] httr_1.4.7 RColorBrewer_1.1-3
[5] tools_4.3.1 R6_2.5.1
[7] HDF5Array_1.28.1 mgcv_1.9-1
[9] rhdf5filters_1.12.1 withr_3.0.2
[11] gridExtra_2.3 leaflet_2.2.2
[13] leafem_0.2.3 cli_3.6.3
[15] labeling_0.4.3 proxy_0.4-27
[17] dbscan_1.2-0 R.utils_2.12.3
[19] dichromat_2.0-0.1 scico_1.5.0
[21] limma_3.56.2 rstudioapi_0.17.1
[23] RSQLite_2.3.9 generics_0.1.3
[25] crosstalk_1.2.1 Matrix_1.5-4.1
[27] ggbeeswarm_0.7.2 abind_1.4-8
[29] R.methodsS3_1.8.2 terra_1.8-5
[31] lifecycle_1.0.4 yaml_2.3.10
[33] edgeR_3.42.4 rhdf5_2.44.0
[35] tmaptools_3.1-1 grid_4.3.1
[37] blob_1.2.4 promises_1.3.2
[39] dqrng_0.4.1 crayon_1.5.3
[41] dir.expiry_1.8.0 lattice_0.22-6
[43] beachmat_2.16.0 KEGGREST_1.40.1
[45] magick_2.8.5 pillar_1.10.1
[47] knitr_1.49 metapod_1.7.0
[49] rjson_0.2.23 boot_1.3-28.1
[51] codetools_0.2-19 wk_0.9.4
[53] glue_1.8.0 vctrs_0.6.5
[55] png_0.1-8 gtable_0.3.6
[57] cachem_1.1.0 xfun_0.50
[59] S4Arrays_1.0.6 mime_0.12
[61] DropletUtils_1.20.0 units_0.8-5
[63] statmod_1.5.0 interactiveDisplayBase_1.38.0
[65] bit64_4.5.2 filelock_1.0.3
[67] irlba_2.3.5.1 vipor_0.4.7
[69] KernSmooth_2.23-26 colorspace_2.1-1
[71] DBI_1.2.3 zellkonverter_1.9.0
[73] raster_3.6-30 tidyselect_1.2.1
[75] bit_4.5.0.1 compiler_4.3.1
[77] curl_6.1.0 basilisk.utils_1.12.1
[79] DelayedArray_0.26.7 scales_1.3.0
[81] classInt_0.4-10 rappdirs_0.3.3
[83] goftest_1.2-3 spatstat.utils_3.1-2
[85] rmarkdown_2.29 basilisk_1.11.2
[87] XVector_0.40.0 htmltools_0.5.8.1
[89] pkgconfig_2.0.3 base64enc_0.1-3
[91] sparseMatrixStats_1.12.2 fastmap_1.2.0
[93] htmlwidgets_1.6.4 shiny_1.10.0
[95] DelayedMatrixStats_1.22.6 farver_2.1.2
[97] jsonlite_1.8.9 BiocParallel_1.34.2
[99] R.oo_1.27.0 RCurl_1.98-1.16
[101] GenomeInfoDbData_1.2.10 s2_1.1.7
[103] Rhdf5lib_1.22.1 munsell_0.5.1
[105] Rcpp_1.0.13-1 ggnewscale_0.5.0
[107] viridis_0.6.5 reticulate_1.40.0
[109] stringi_1.8.4 leafsync_0.1.0
[111] zlibbioc_1.46.0 plyr_1.8.9
[113] parallel_4.3.1 ggrepel_0.9.6
[115] deldir_2.0-4 Biostrings_2.68.1
[117] stars_0.6-7 splines_4.3.1
[119] tensor_1.5 locfit_1.5-9.10
[121] igraph_2.1.2 ScaledMatrix_1.8.1
[123] BiocVersion_3.17.1 XML_3.99-0.18
[125] evaluate_1.0.1 BiocManager_1.30.25
[127] httpuv_1.6.15 purrr_1.0.2
[129] polyclip_1.10-7 rsvd_1.0.5
[131] lwgeom_0.2-14 xtable_1.8-4
[133] e1071_1.7-16 RSpectra_0.16-2
[135] later_1.4.1 viridisLite_0.4.2
[137] class_7.3-22 tibble_3.2.1
[139] memoise_2.0.1 beeswarm_0.4.0
[141] AnnotationDbi_1.62.2 cluster_2.1.4
[143] BiocStyle_2.28.1
©2024 The pasta
authors. Content is published under Creative Commons CC-BY-4.0 License for the text and GPL-3 License for any code.