Published

January 29, 2025

Point pattern analysis – multivariate methods for imaging-based data

In this vignette we will show:

  • Multivariate point pattern data analysis methods for imaging-based approaches.

  • This includes global metrics on the entire field of view and local variants thereof.

  • The use case is a MERFISH data set from Moffitt et al. (2018).

  • For more background on point pattern analysis we refer the reader to Baddeley, Rubak, and Turner (2015).

  • For the implementations we use R. Data is represented as SpatialExperiment (Righelli et al. 2022).

Dependencies

Show the code
source("utils.R")

Setup

Show the code
spe <- readRDS("../data/spe.rds")

#define the Z-stacks that you want to compare
zstack_list <- list("-0.09", "0.01", "0.21")

#define the celltype that you want to compare across the stacks - hereby we assume independence across the z-stacks which is an assumption that can be challenged
celltype_ls <- "OD Mature"

selectZstacks <- function(zstack, spe){
  sub <- spe[, spe$sample_id == zstack]
  pp <- .ppp(sub, marks = "cluster_id")
  return(pp)
}
pp_ls <- lapply(zstack_list, selectZstacks, spe)
names(pp_ls) <- zstack_list

The theory of spatial point patterns is discussed in great detail in (Baddeley, Rubak, and Turner 2015). The book has an accompanying package called spatstat which offers great functionality to the theoretical concepts described in the book (Baddeley and Turner 2005). This chapter relies heavily on both publications.

Concepts and Definitions of Point Processes

Point Process

In point pattern analysis we assume that the patterns we observe are a realisation of a stochastic process called a point process. The inferences we make about the point pattern are based on the point process. For example, if a pattern is created by a Poisson point process it will be evenly distributed in the observation window (Baddeley, Rubak, and Turner 2015, 127).

When considering a pattern with \(m\) multiple types, as we do in the (Moffitt et al. 2018) dataset, there are two distinct ways looking at the data. One can view the pattern as a multitype point pattern, where all the points are sampled from the same point process. The other option is to consider the pattern as a multivariate point pattern, where the points come from \(m\) distinct point processes. The distinction between these two views lies in the assumption that in the multitype framework all points stem from a single point process, while in the multivariate framework, types stem from distinct point processes, allowing us to consider dependencies of individual types. Whether or not the patterns stem from the same point process depends on the biological question. If we analyse two cell types within one slice of a tissue, we should consider them as being sampled from one point process. However, if we consider the distribution of a cell type in two slices of the same tissue we have grounds to consider them as distinct processes (Baddeley, Rubak, and Turner 2015, 564–65).

Observation Windows

Most often, one does not observe the entire point process, but only a subset of the process. This is called window sampling. Instead of observing the entire pattern we examine a subset of this pattern in the so called window. For example, small microscopy windows provide a view of portions of a larger tissue slice (Baddeley, Rubak, and Turner 2015, 143–45).

There is another concept called the small world model. It assumes that points can only be observed in a finite small world and not beyond these boundaries. When thinking of an entire tissue, this is a very common scenario. Cells can only be observed within the tissue and not beyond. In this case, it would be correct to not assume a rectangular observation window but to use methods to estimate an unknown sampling window such as the Ripley-Rasson estimate of a spatial region (Baddeley, Rubak, and Turner 2015, 144–45; Ripley and Rasson 1977).

The Ripley-Rasson method estimates convex observation windows. In our sample an ideal approximation would be non-convex, due to the tissue fold at the bottom. However, as non-convex estimation requires more complicated methods we will estimate the observation window using a Ripley-Rasson estimator.

Show the code
setRiprasWindows <- function(pp){
  Window(pp) <- ripras(pp)
  return(pp)
}
#the entire point patterns with the ripras windows
pp <- lapply(pp_ls, setRiprasWindows)

separateMarks <- function(pp){
  #split the multitype point process into several single type processes
  ppls <- split(pp)
  return (ppls)
}
#the point patterns separated by their marks
pp_ls <- lapply(pp, separateMarks)

Complete Spatial Randomness

Complete spatial randomness (CSR) is often used as the null model for various point patterns. It is the result of a Poisson process. A completely spatial random process is characterised by two properties, homogeneity and independence, as discussed below (Baddeley, Rubak, and Turner 2015, 132).

Homogeneity

“Homogeneity […] means that the expected number of points falling in a region \(B\) should be proportional to its area \(|B|\)(Baddeley, Rubak, and Turner 2015, 132) given a proportionality constant \(\lambda\). The constant \(\lambda\) represents the intensity of the process, i.e., the average number of points in a unit area (Baddeley, Rubak, and Turner 2015, 132–33)

\[ \mathbb{E}[X\cap B] = \lambda |B|. \label{eq:expected_number_points} \]

Independence

Independence implies that in two (non-overlapping) regions \(A\) and \(B\), the number of points \(n(X\cap A)\) and \(n(X\cap B)\) are independent random variables. In other words, the number of points in region \(A\) does not affect the number of points in region \(B\). (Baddeley, Rubak, and Turner 2015, 133).

Inhomogeneous Poisson Process

A Poisson process that is spatially varying in its average density of points is called inhomogeneous. Here, the average density, \(\lambda(u)\), sometimes known as the intensity function (see below), is a function of the spatial location \(u\). In this case, the expected number of points within a region \(B\), denoted by \(\mu = n(X\cap B)\), is given by the integral of the intensity function over that region (Baddeley, Rubak, and Turner 2015, 138)

\[ \mu = \int_{B} \lambda(u) du. \label{eq:expected_number_inhomogeneous} \]

Confounding between clustering and inhomogeneity

Often, it is hard to distinguish inhomogeneous intensity from genuine clustering of points. Patterns can arise from an inhomogeneous process or a clustered process and it is not straightforward to distinguish them: one is governed by different (inhomogeoenous) intensities whereas the other is due to interactions (clustering) between points (Baddeley, Rubak, and Turner 2015, 151 ff.). One way this can occur is if cells of different sizes are treated as points instead of considering their entire volume (Baddeley, Rubak, and Turner 2015, 210 ff.).

Isotropy

A point process is called isotropic, if its statistical properties are invariant to rotations; a CSR process is both stationary and isotropic (Baddeley, Rubak, and Turner 2015, 147).

Stationarity

“A point process is called stationary if, when we view the process through a window \(W\) , its statistical properties do not depend on the location of the window in two-dimensional space.” (Baddeley, Rubak, and Turner 2015, 146). This is the case for any homogeneous point process, where the statistical properties of the pattern are unchanged given shifting of the observation window. This means it is stationary in all statistical properties; first-order properties (e.g. intensity) and second-order properties (e.g. correlation) (Baddeley, Rubak, and Turner 2015, 218). Not all metrics assume stationarity in its full sense. Inhomogeneous metrics only assume second-order / correlation stationarity. This means that while the intensity function can vary spatially (first-order stationarity is not given), the estimates of correlation functions (e.g. the inhomogeneous K-function) are expected to be consistent across different parts of the window (Baddeley, Rubak, and Turner 2015, 689 ff.).

Local scaling

If a process is not correlation stationary, so the estimates of the inhomogeneous metric vary between locations, locally-scaled versions of the metric are applicable. This means that in subregions, the process is still stationary and isotropic, but there is a rescaling factor that can vary across the total process (Baddeley, Rubak, and Turner 2015, 246–47).

To test the assumption of inhomogeneity, we can use a permutation test. In this scenario, we split the patterns into quadrats and compare the estimated functions between the quadrats. It should be noted that this test depends on the arbitrary definition of the quadrats. The chosen patterns are not independent but result as marks from an overall point-pattern. Therefore, the permutation approach is questionable (Baddeley, Rubak, and Turner 2015, 689–93).

As the interpretation of the permutation test is highly dependent on the quadrats, the results should be interpreted with care. Both inhomogeneous and locally scaled versions of the summary functions have support and both offer interesting insights into the spatial pattern. Therefore, we will compare all versions and show what the choice of metrics means for their interpretation.

Intensity

Intensity is the expected density of points per unit area. It can be interpreted as the rate of occurrence or the abundance of events. The intensity represents a first order property because it is related to the expected number of points. More formally the average intensity of a point process is defined as:

\[ \bar{\lambda} = \frac{n(x)}{|W|} \label{eq:average_intensity} \]

As this is an average over the entire window, it is a good estimate for a homogeneous point process (Baddeley, Rubak, and Turner 2015, 157–60).

Estimating Intensity

For a homogeneous point process, the intensity can be estimated in a simplistic way: summing the individual intensities of the marks (Baddeley, Rubak, and Turner 2015, 161).

Show the code
intensityPointProcess <- function(pp,mark) if(mark) intensity(pp) else sum(intensity(pp))

intensityPointProcess(pp_ls[['0.01']], mark = FALSE) %>% round(6)
[1] 0.001909

Otherwise, we can compute the intensity for each mark individually.

Show the code
intensityPointProcess(pp_ls[['0.01']], mark = TRUE) %>% round(8)
  Ambiguous   Astrocyte Endothelial   Ependymal  Excitatory  Inhibitory 
 0.00024151  0.00020183  0.00014653  0.00008373  0.00036867  0.00061393 
  Microglia OD Immature   OD Mature   Pericytes 
 0.00003031  0.00006249  0.00014278  0.00001750 

Kernel Estimation

In kernel estimation, we try to estimate the intensity function \(\lambda(u)\) of the point process. There is a wide variety of kernel estimators, see (Baddeley, Rubak, and Turner 2015, 168), but a popular choice is the isotropic Gaussian kernel where the standard deviation corresponds to the smoothing bandwidth (Baddeley, Rubak, and Turner 2015, 168).

Show the code
pp_sel <-  subset(pp_ls[['0.01']]$`OD Mature`, drop = TRUE)
Dens <- density.ppp(pp_sel, sigma = 50)
plot(Dens, main = 'Kernel Density (OD Mature cells)')

Quadrat Counting

In quadrat counting, all points falling into a given quadrat are counted. This gives an overview on the characteristics of the point pattern, such as correlation stationarity (Baddeley, Rubak, and Turner 2015, 163).

Show the code
Q5 <- quadratcount(pp_ls[['0.01']], nx=8, ny=8)
plot(unmark(pp[['0.01']]), main='Unmarked Point Pattern Quadrats')
plot(Q5, col='black', add=TRUE)

Under independence assumptions, the quadrat counts can be used for testing homogeneity, i.e., whether the points are distributed evenly across the quadrats (Baddeley, Rubak, and Turner 2015, 164–65).

Show the code
val <- quadrat.test(pp_ls[['0.01']]$`OD Mature`, 5, alternative="regular", method="MonteCarlo")
val

    Conditional Monte Carlo test of CSR using quadrat counts
    Test statistic: Pearson X2 statistic

data:  pp_ls[["0.01"]]$`OD Mature`
X2 = 635.09, p-value = 1
alternative hypothesis: regular

Quadrats: 25 tiles (irregular windows)

A p-value of 1 indicates that the null hypothesis of irregularity can’t be rejected at \(\alpha = 0.05\). Thus, the point pattern of oligodendrocytes is strongly irregular.

Alternatively, we can inspect departures from the hypothesis that points were generated by a Poisson process. We can identify hotspots and coldspots by comparing the standard error of the relrisk function, which computes nonparamatric estimates of the relative risk by kernel smoothing, to the theoretical null distribution of points. The relative risk is the ratio of spatially varying probablilities of different types (Buller 2020).

Show the code
# select marks
selection <- c('OD Mature', 'Ependymal', 'Microglia')
pp_sel <-  subset(pp[['0.01']], marks %in% selection, drop = TRUE)

f1 <- pValuesHotspotMarks(pp_sel)

# Plot significant p-values
plot(f1$p, main = "Significant difference\n to Poisson process alpha = 0.05")

Testing for CSR

Whether or not a point process is completely spatially random (CSR) depends on two characteristics: points need to be distributed homogeneously and they have to be independent of each other (see definitions above). There are various ways to test for CSR, here we show the use-case of the clark-evans test (Baddeley, Rubak, and Turner 2015, 165–66).

Show the code
clarkevans.test(pp_ls[['0.01']]$`OD Mature`)

    Clark-Evans test
    CDF correction
    Z-test

data:  pp_ls[["0.01"]]$`OD Mature`
R = 0.73962, p-value < 2.2e-16
alternative hypothesis: two-sided

Multitype point process

Show the code
sub <- spe[, spe$sample_id == "0.01"]
(pp <- .ppp(sub, marks = "cluster_id"))
Marked planar point pattern: 6111 points
Multitype, with levels = 
   Ambiguous Astrocyte Endothelial Ependymal Excitatory Inhibitory Microglia OD 
Immature OD Mature Pericytes
window: rectangle = [1222.5635, 3012.4248] x [-3993.535, -2202.755] units

Multitype and Multivariate viewpoint

A pattern with multiple type of points, e.g. points representing different cell types, can be seen in different ways. One the one hand, the multitype approach assumes that the points \(x\) were recorded together with with their labels \(m\) and that they were generated at the same time. The locations and labels therefore have a joint distribution \(P(X,M)\). Alternatively, in the multivariate approach, the pattern with multiple types of points is assumed to be a combination of several distinct point patterns; one for each type of point. Here, separate point patterns \(A\) and \(B\) form a joint distribution \(P(A,B)\). To test if the labels depend on the location one can assume the following null hypotheses (Baddeley, Rubak, and Turner 2015, 565–67);

  • Complete spatial randomness and independence (CSRI): points are distributed at random; the type of each points is randomly allocated; independence between points of different types; allocation of the types independently of the other points and of its location.
  • Random labeling: each point is assigned a type at random independently of its location.
  • Independence of components: points of different types are independent of each other.

Apart from CSRI an important aspect for analysis if we can assume stationarity, i.e. the statistical properties of the point pattern do not change in the window.

For simplicity, we will focus on three cell types in our point pattern: Ependymal, OD Mature and Microglia.

Show the code
marks(pp) <- factor(marks(pp))
selection <- c('OD Mature', 'Ependymal', 'Microglia')
fov_sel <- c('0.01')

pp_sel <-  subset(pp, marks %in% selection, drop = TRUE)
spe_sel <- spe[, spe$sample_id == "0.01" &  spe$cluster_id %in% selection]

We select one fov, which corresponds to one cut in the frontal plane.

Show the code
pp_sel |> as.data.frame() |> 
  ggplot(aes(x = x, y = y, color = marks)) +
  geom_point() +
  theme_minimal() +
  coord_fixed() +
  scale_color_brewer(palette = "Set1")

The summary of pp (point pattern) object returns general properties, plus intensities, combined and per mark type.

Show the code
summary(pp)
Marked planar point pattern:  6111 points
Average intensity 0.001906561 points per square unit

Coordinates are given to 6 decimal places

Multitype:
            frequency  proportion    intensity
Ambiguous         773 0.126493200 2.411670e-04
Astrocyte         646 0.105711000 2.015445e-04
Endothelial       469 0.076746850 1.463225e-04
Ependymal         268 0.043855340 8.361288e-05
Excitatory       1180 0.193094400 3.681463e-04
Inhibitory       1965 0.321551300 6.130571e-04
Microglia          97 0.015873020 3.026287e-05
OD Immature       200 0.032727870 6.239767e-05
OD Mature         457 0.074783180 1.425787e-04
Pericytes          56 0.009163803 1.747135e-05

Window: rectangle = [1222.5635, 3012.4248] x [-3993.535, -2202.755] units
                    (1790 x 1791 units)
Window area = 3205250 square units

To get the overall intensity the individual intensities can be summed up. Assuming that the multitype process is first order stationary (i.e. each sub-process is stationary) the individual intensities sum up to the intensity of the unmarked point process (Baddeley, Rubak, and Turner 2015, 574ff.).

Show the code
sum(intensity(pp)) == intensity(unmark(pp))
[1] TRUE

The stationarity assumption is not appropriate in all cases. To assess first-order stationarity visually, we can plot the kernel density estimates per type.

Show the code
ppls <- split(pp_sel) # split by mark
plot(density(ppls, sigma = bw.diggle))

Ependymal and OD Mature cells are cleary inhomogeneous, while for Microglia cells it is not so clear and we could assume homogeneity, especially since the window extends beyond the tissue boundary, with a visible border at the bottom center.

To further inverstiagte the spatial arrangement of the different cell types we can calculate the relative risk, i.e., the probability of observing a given celltype at a given location. It is calculated using the function relrisk. The bandwidth for smoothing is calculated with bw.relrisk and might need to be adjusted (Baddeley, Rubak, and Turner 2015, 577–83).

The relrisk function also gives us the dominant mark for different regions of the tissue of interest. This could be interesting in the annotation or comparison of spatial domains. It indicates the most likely cell type to occur at each location.

Show the code
rpd <- relrisk(pp_sel, diggle = TRUE)
dom <- im.apply(rpd, which.max)
dom <- eval.im(factor(dom, levels = seq_along(levels(unique(marks(pp_sel)))),
                      labels = levels(unique(marks(pp_sel)))))
plot(dom,las=2,main="Dominant mark")

Correlation and spacing

Nearest neighbourhood contingency

To further investigate the spatial distribution of the marks we can investigate the nearest neighbourhood of each cell type. One possibility is to work with nearest neighborhood contingency tables developed by Dixon (2002). The statistical tests are implemented in the R package dixon (de la Cruz 2008).

The measure of segregation \(S\) is defined in Dixon (2002) as

\[S_{i,j}= \frac{\log[(N_{i,j}/(N_i−N_{i,j})]}{[(N_i−1)/(N−N_i)]}\] where \(N_i\) is the number of individuals \(i\), \(N_{i,j}\) is the number of individuals of type \(i\) with a nearest neighbor of type \(j\), and \(N\) is the total number of individuals.

A value of \(S=0\) is consistent with random labeling. A value larger than 0 indicates that the two types are more segregated than expected by chance, the larger the value the more segregated. Note that segregated means that it is more likely to expect a neigbour of type \(j\) than by chance. In the case that the neigbour is of the same type this is equivalent to “attraction” of the types. On the other hand if \(S<0\) it indicates that type \(j\) is less likely to be a neigbour than by chance. The P-values are calculated using expected numbers of nearest neighbors under the null hypothesis of random labeling using a Monte-Carlo simulation and assumes an asymptotic \(\chi^2\) distribution.

Show the code
out$tablaZ %>% 
  arrange(desc(abs(`Z `))) %>%
  select(-`  p-val.Nobs`)
       From        To     Obs.Count     Exp. Count    S      Z    p-val.Z
1 Ependymal Ependymal           262          87.16  1.96  20.04    0.0000
2 Ependymal OD Mature             3         149.18 -2.04 -16.87    0.0000
3 OD Mature Ependymal             8         149.18 -1.43 -14.26    0.0000
4 OD Mature OD Mature           380         253.83  0.60  11.43    0.0000
5 Ependymal Microglia             3          31.66 -1.07  -5.66    0.0000
6 Microglia Ependymal             9          31.66 -0.68  -4.92    0.0000
7 Microglia OD Mature            67          53.99  0.25   2.60    0.0094
8 Microglia Microglia            21          11.34  0.32   2.50    0.0124
9 OD Mature Microglia            69          53.99  0.12   2.35    0.0190

In this table we see that most Ependymal cells are very clustered (\(S=1.96\)), while Microglia are more evenly distributed. Further we see that it is less likely to find a Ependymal cells next to a OD mature cells than by chance.

OD Mature cells show this interesting characteristic that they are clustered in some parts of the tissue and more evenly distributed in other parts of the tissue. This characteristic is not visible in the table. The statistic also considers only the nearest neighbour and ignores neighbours that are further away, c.f., section about local scaling.

Summary functions for pairs of types

Similar to the simple case without marks, it is possible to estimate summary functions. In particular, summary functions between different marks can be calculated. Note that the canonical functions assume that the multi-type process is stationary.

Cross \(K\)-function

The cross \(K\)-function is a summary function that measures the average number of points of type \(j\) within a distance \(r\) of a point of type \(i\). The formula is given by:

\[ K(r) = \frac{1}{\lambda_j} \mathbb{E} [t(u,r,X^{j}) \mid u \in X^{i}], \]

where \(X^{i}\) is the point pattern of type \(i\) and \(t(u,r,X^{j})\) is the number of points of type \(j\) in a circle of radius \(r\) around \(u\) (Baddeley, Rubak, and Turner 2015, 594–95). It is important to remember that the homogeneous cross \(K\)-function assumes that the multitype process is stationary. If this is not the case, there is a risk in misinterpreting the results. The problem is the confounding between section about local scaling (Baddeley, Rubak, and Turner 2015, 151–52).

First, we plot an overview of the cross \(K\)-function for the different types. As we have seen before the assumption of stationarity might be not valid. We will therefore use the inhomogeneous version of the cross \(K\)-function.

Show the code
plotCrossMetricPerFov(
  resCross,
  theo = TRUE,
  correction = "iso",
  x = "r",
  imageId = 'sample_id'
)
[[1]]

The diagonal of the inhomogeneous cross \(K\)-function plot shows the \(K\)-function for the different marks (indication of Poisson or non-Poisson point processes). Off-diagonal panels give indication of independence of points when the number of points follows the expected \(K\)-function but does not imply that the individual marks follow a Poisson process. If the processes of the types are independent, we assume that they are also uncorrelated.

In the example above, assuming that the process is inhomogeneous, the Ependymals cells appear to be regularly spaced, which seems counter intuitive. However, this is the result of the pattern being inhomogeneous with spatially varying intensity. When accounting for this, the pattern is more regular than expected under an inhomogeneous point process. The estimation of the inhomogeneous cross functions is not straightforward and results change based on the estimation of the local intensity and the edge correction, c.f. (Baddeley, Rubak, and Turner 2015, 605).

Let’s focus a bit more on the relationship between Ependymal and the other two cell types. We will also calculate confidence intervals for the different cross \(K\)-functions. We have already seen that our dataset most likely does not satisfy the assumption of stationarity. For this reason, we will calculate the inhomogeneous cross \(K\)-function. Note that the option to calculate confidence intervals is not yet implemented in spatialFDA and we will therefore use a custom method below.

Show the code
plotCrossMetric <- function(ppp, fun, from, to, edgecorr){
  lce <- lohboot(ppp, fun, from = from, to = to)
  p <- ggplot(lce, aes(x = r, y = .data[[edgecorr]])) +
    geom_line(size = 1) +
    geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.25)+
    geom_line(aes(x = r, y = theo), linetype = "dotted", size = 1) +
    geom_line() +
    labs(title = attributes(lce)$yexp) +
    theme_minimal()
  return(p)
}

p_epen_od <- plotCrossMetric(pp_sel, "Kcross.inhom", 
                             "Ependymal", "OD Mature", "iso")
p_epend_micro <- plotCrossMetric(pp_sel, "Kcross.inhom", 
                                 "Ependymal", "Microglia", "iso")
Show the code
p_epen_od + p_epend_micro

Remember that the dashed line represents the assumption of a multitype Poisson process. If the line lies above the dotted line, there is indication of attraction while if the line is below the dotted line there is indication of repulsion. In the plot above we can see that there is indication of attraction between Ependymal and OD Mature cells (above Poisson line) while there is indication of repulsion between Ependymal and Microglia cells (below Poisson line).

Cross \(L\)-function

Alternatively the \(L\) cross function with similar interpretation can be calculated using the Lcross function (Baddeley, Rubak, and Turner 2015, 596ff).

Show the code
resCross <- calcCrossMetricPerFov(
  spe,
  selection = c("OD Mature", "Ependymal", "Microglia"),
  subsetby = 'sample_id',
  fun = 'Lcross.inhom',
  marks = 'cluster_id',
  rSeq = seq(0, 500, length.out = 100),
  by = c('Animal_ID', 'sample_id')
)
Show the code
resCross <- subset(resCross, sample_id %in% fov_sel)
Show the code
plotCrossMetricPerFov(
  resCross,
  theo = TRUE,
  correction = "iso",
  x = "r",
  imageId = 'sample_id'
)
[[1]]

Mark connection function

The mark connection function is the cross pair-correlation function, i.e. the generalization of the pair correlation function to a multitype point processes, divided by the unmarked pair-correlation function. It can be interpreted as the conditional probability that two points a distance \(r\) apart have labels of type \(i\) and of type \(j\), given the presence of those points (Baddeley, Rubak, and Turner 2015, 596–97).

Show the code
plotCrossAll(pp_sel, "markconnect", "iso") + 
  scale_y_continuous(limits = c(0, 1))
Show the code
resCross <- calcCrossMetricPerFov(
  spe = spe_sel,
  selection = c("OD Mature", "Ependymal", "Microglia"),
  subsetby = 'sample_id',
  fun = 'markconnect',
  marks = 'cluster_id',
  rSeq = seq(0, 500, length.out = 100),
  by = c('Animal_ID', 'sample_id')
)
Show the code
plotCrossMetricPerFov(
  resCross,
  theo = TRUE,
  correction = "iso",
  x = "r",
  imageId = 'sample_id'
)
[[1]]

The dashed lines indicate expected values under random labeling. The values measure dependence (or association) between the different labelled points. Positive values indicate that nearby points are more likely to have different types than expected by chance. This positive association between different cell types does not necessarily imply dependence, as it could be influenced by a negative association between cells of the same type, as it it could be the case for the Microglia cells. Furthermore, as the calculation is based on the \(K\)-function, the mark connection function assumes homogenity.

Cross \(F\)-function (empty space function), cross \(G\)-function (Nearest-neighbor function) and cross \(J\)-function

The cross \(F\)-function is the cumulative distribution function of the distance from a location to the nearest point of the same type. For each type \(i\), it is defined as:

\[F_i(r) = \mathbb{P}\{d(u,X^{i}\leq r)\}\]

The cross \(G\)-function is the cumulative distribution function of the distance from a location to the nearest point of another type and is defined as:

\[G_{ij}(r) = \mathbb{P}\{d(x,X^{(j)} \setminus u \leq r \mid X^{(i)} \ \text{has a point at u})\}.\]

If the points are independent of each other, the \(G\)- and \(F\)-function are identical. Both assume that the process is stationary. There are inhomogenous alternatives, in case the intensity is varying, for which we only assume correlation stationarity.

There exists a difference in the interpretation of the theoretical values of the \(K\)-cross and the \(G\)-cross function. For the \(K\)-cross, the theoretical value indicates independence between marks while for the \(G\)-cross the theoretical value is consistent with the assumption that the points of type j are Poisson in addition to being independent of the points of type \(i\) (Baddeley, Rubak, and Turner 2015, 597 ff).

The cross \(J\)-function is defined as:

\[J_{ij}(r) = \frac{1-G_{ij}(r)}{1-F_{j}(r)}\]

and summarizes the inter point dependence between type \(i\) and \(j\). Under the hypothesis of independent components, i.e., that the point processes of each type are independent, the \(G\)-function is equivalent to the \(F\)-function and the \(J\)-function is equal to \(1\) (Baddeley, Rubak, and Turner 2015, 597 ff).

Dot functions

For each \(K\)-, \(G\)- and \(J\)-function, there also exist dot functions, which measure distances from points of one type to points of any type. These functions allow us to measure the dependence of one mark with all other marks at once. For example, the \(K\)-dot function represents the expected number of an other point within distance \(r\) of a typical point of type \(i\) (Baddeley, Rubak, and Turner 2015, 600 ff).

Show the code
plotCrossAll(pp_sel, "Kdot.inhom", "iso")
Show the code
resCross <- calcCrossMetricPerFov(
  spe,
  selection = c("OD Mature", "Ependymal", "Microglia"),
  subsetby = 'sample_id',
  fun = 'Kdot',
  marks = 'cluster_id',
  rSeq = seq(0, 500, length.out = 100),
  by = c('Animal_ID', 'sample_id')
)
Show the code
resCross <- subset(resCross, sample_id %in% fov_sel)
#create a unique plotting ID
resCross$ID <- paste0(resCross$Animal_ID, '|', resCross$sample_id)
#plot the result of the Dot function
plotMetricPerFov(resCross, theo = TRUE, correction = "iso", x = "r", imageId = 'sample_id', ID = "ID")

The dot functions are useful summary statistics to analyse the dependence of one mark with all other marks.

Summary function within and between types

In our original dataset, we have a large number of different marks. We picked three: OD mature, Ependymal and Microglia for illustrative purposes. An alternative to looking at all cross summary function combinations, it is possible to compare between and within types (Baddeley, Rubak, and Turner 2015).

Mark equality function

The Mark or Type Equality function for a stationary multitype point process measures the correlation between types of two points separated by distance r. It is the sum of the mark connection function of all pairs of points of the same type.

If \(k < 1\), points at distance \(r\) are less likely than expected to be of the same type. If \(k > 1\), they are more likely to be of the same type. The value \(1\) indicates a lack of correlation (Baddeley, Rubak, and Turner 2015, 603 ff).

Show the code
resCross <- calcMetricPerFov(
  spe,
  selection = c("OD Mature", "Ependymal", "Microglia"),
  subsetby = 'sample_id',
  fun = 'markcorr',
  marks = 'cluster_id',
  rSeq = seq(0, 500, length.out = 100),
  by = c('Animal_ID', 'sample_id')
)
Show the code
resCross <- subset(resCross, sample_id %in% fov_sel)
plotMetricPerFov(
  resCross,
  theo = TRUE,
  correction = "iso",
  x = "r",
  imageId = 'sample_id'
)

We can see that in our dataset that it the more likely it is to find points of the same type at shorter distances. At around \(r=265\) this behavior changes and it becomes slightly less likely to find the same type compared to by chance.

Tests of randomness and independence

In a multitype point process, there are usually two interesting hypotheses:

  • random-labeling hypothesis: the allocation of labels to the points is random
  • independent component hypothesis: there is independence between different type of points

If both statments are correct, the point pattern is considered to be complete spatially random and independent (CSRI), the marked analog to complete spatial randomness (CSR) (Baddeley, Rubak, and Turner 2015, 605 ff).

Testing random labelling

The random labeling test is most logical when the marks represents its status, which is not the most appropriate assumption when considering cell types. Testing for random labeling can be done using permutation tests, in which the labels are randomly permuted. Random labeling can be assumed if the permuted datasets are statistically equivalent to the original dataset (Baddeley, Rubak, and Turner 2015, 609 ff).

Testing the indepenence of components assumption

The i-to-j functions are useful to test the independence of different subprocesses. If the processes of type i and j are independent, then \(K_{ij} = \pi r^2, G_{ij}(r) = F_{j}(r), J_{ij}(r) \equiv 1\). Alternatively, randomization tests can be used in which simulated patterns from the dataset are generated and randomly split into subpatterns. These are then compared to the null hypothesis in which all subpatterns should be statistically equivalent to the original. However, this approach assumes stationarity and there is a need to handle edge effects (Baddeley, Rubak, and Turner 2015, 606 ff).

Assuming stationarity of the total pattern

As outlined above, the homogeneous cross correlation and spacing functions assume stationarity, whereas the inhomogeneous functions only assume correlation stationarity. First-order stationarity is not given in our dataset, when we look at the different patterns individually. However, using the total (unmarked) pattern, we could assume first-order stationarity, since the intensity is the same across the pattern.

Show the code
plot(density(unmark(pp), sigma = bw.diggle))

Let’s look at the homogeneous cross \(K\)-function.

Show the code
resCross <- calcCrossMetricPerFov(
  spe,
  selection = c("OD Mature", "Ependymal", "Microglia"),
  subsetby = 'sample_id',
  fun = 'Kcross',
  marks = 'cluster_id',
  rSeq = seq(0, 500, length.out = 100),
  by = c('Animal_ID', 'sample_id')
)
Show the code
resCross <- subset(resCross, sample_id %in% fov_sel)
plotCrossMetricPerFov(
  resCross,
  theo = TRUE,
  correction = "iso",
  x = "r",
  imageId = 'sample_id'
)
[[1]]

The result is different from the previous analysis. The Ependymal cells now appear to be clustered (above the Poisson line). This is because stationarity assumes that Ependymal cells could theoretically be present in the total observation window. If this assumption is justified, depends on the context and research question. The interpretation of the results should always be done with the assumption of stationarity or inhomogeneity in mind and should be reported in an analysis.

Appendix

Session info

Show the code
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.7.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] dixon_0.0-9                    splancs_2.01-45               
 [3] sp_2.1-4                       bluster_1.10.0                
 [5] magrittr_2.0.3                 stringr_1.5.1                 
 [7] spdep_1.3-8                    spData_2.3.3                  
 [9] tmap_3.3-4                     scater_1.28.0                 
[11] scran_1.28.2                   scuttle_1.10.3                
[13] SFEData_1.2.0                  SpatialFeatureExperiment_1.2.3
[15] Voyager_1.2.7                  rgeoda_0.0.11-1               
[17] digest_0.6.37                  sf_1.0-19                     
[19] reshape2_1.4.4                 patchwork_1.3.0               
[21] STexampleData_1.8.0            ExperimentHub_2.8.1           
[23] AnnotationHub_3.8.0            BiocFileCache_2.8.0           
[25] dbplyr_2.3.4                   rlang_1.1.4                   
[27] ggplot2_3.5.1                  dplyr_1.1.4                   
[29] spatstat_3.3-0                 spatstat.linnet_3.2-3         
[31] spatstat.model_3.3-3           rpart_4.1.24                  
[33] spatstat.explore_3.3-3         nlme_3.1-166                  
[35] spatstat.random_3.3-2          spatstat.geom_3.3-4           
[37] spatstat.univar_3.1-1          spatstat.data_3.1-4           
[39] SpatialExperiment_1.10.0       SingleCellExperiment_1.22.0   
[41] SummarizedExperiment_1.30.2    Biobase_2.60.0                
[43] GenomicRanges_1.52.1           GenomeInfoDb_1.36.4           
[45] IRanges_2.34.1                 S4Vectors_0.38.2              
[47] BiocGenerics_0.46.0            MatrixGenerics_1.12.3         
[49] matrixStats_1.5.0              spatialFDA_0.99.5             

loaded via a namespace (and not attached):
  [1] spatstat.sparse_3.1-0         bitops_1.0-9                 
  [3] httr_1.4.7                    RColorBrewer_1.1-3           
  [5] tools_4.3.1                   R6_2.5.1                     
  [7] HDF5Array_1.28.1              mgcv_1.9-1                   
  [9] rhdf5filters_1.12.1           withr_3.0.2                  
 [11] gridExtra_2.3                 leaflet_2.2.2                
 [13] leafem_0.2.3                  cli_3.6.3                    
 [15] labeling_0.4.3                mvtnorm_1.3-2                
 [17] proxy_0.4-27                  R.utils_2.12.3               
 [19] dichromat_2.0-0.1             scico_1.5.0                  
 [21] limma_3.56.2                  rstudioapi_0.15.0            
 [23] RSQLite_2.3.9                 generics_0.1.3               
 [25] crosstalk_1.2.1               Matrix_1.5-4.1               
 [27] ggbeeswarm_0.7.2              abind_1.4-8                  
 [29] R.methodsS3_1.8.2             terra_1.8-5                  
 [31] lifecycle_1.0.4               yaml_2.3.10                  
 [33] edgeR_3.42.4                  rhdf5_2.44.0                 
 [35] tmaptools_3.1-1               grid_4.3.1                   
 [37] blob_1.2.4                    promises_1.3.2               
 [39] dqrng_0.4.1                   crayon_1.5.3                 
 [41] lattice_0.22-6                beachmat_2.16.0              
 [43] KEGGREST_1.40.1               magick_2.8.5                 
 [45] pillar_1.10.1                 knitr_1.49                   
 [47] metapod_1.7.0                 rjson_0.2.23                 
 [49] boot_1.3-28.1                 fda_6.2.0                    
 [51] codetools_0.2-19              wk_0.9.4                     
 [53] glue_1.8.0                    vctrs_0.6.5                  
 [55] png_0.1-8                     gtable_0.3.6                 
 [57] cachem_1.1.0                  ks_1.14.3                    
 [59] xfun_0.50                     S4Arrays_1.0.6               
 [61] mime_0.12                     DropletUtils_1.20.0          
 [63] fds_1.8                       pracma_2.4.4                 
 [65] pcaPP_2.0-5                   units_0.8-5                  
 [67] statmod_1.5.0                 interactiveDisplayBase_1.38.0
 [69] bit64_4.5.2                   filelock_1.0.3               
 [71] irlba_2.3.5.1                 vipor_0.4.7                  
 [73] KernSmooth_2.23-26            colorspace_2.1-1             
 [75] DBI_1.2.3                     raster_3.6-30                
 [77] tidyselect_1.2.1              bit_4.5.0.1                  
 [79] compiler_4.3.1                curl_6.1.0                   
 [81] BiocNeighbors_1.18.0          DelayedArray_0.26.7          
 [83] scales_1.3.0                  classInt_0.4-10              
 [85] rappdirs_0.3.3                goftest_1.2-3                
 [87] fftwtools_0.9-11              rainbow_3.8                  
 [89] spatstat.utils_3.1-2          rmarkdown_2.29               
 [91] XVector_0.40.0                base64enc_0.1-3              
 [93] htmltools_0.5.8.1             pkgconfig_2.0.3              
 [95] sparseMatrixStats_1.12.2      fastmap_1.2.0                
 [97] htmlwidgets_1.6.4             shiny_1.10.0                 
 [99] DelayedMatrixStats_1.22.6     farver_2.1.2                 
[101] jsonlite_1.8.9                BiocParallel_1.34.2          
[103] mclust_6.1.1                  R.oo_1.27.0                  
[105] BiocSingular_1.16.0           RCurl_1.98-1.16              
[107] GenomeInfoDbData_1.2.10       s2_1.1.7                     
[109] Rhdf5lib_1.22.1               munsell_0.5.1                
[111] Rcpp_1.0.13-1                 ggnewscale_0.5.0             
[113] viridis_0.6.5                 stringi_1.8.4                
[115] leafsync_0.1.0                zlibbioc_1.46.0              
[117] MASS_7.3-60                   plyr_1.8.9                   
[119] parallel_4.3.1                ggrepel_0.9.6                
[121] deldir_2.0-4                  Biostrings_2.68.1            
[123] stars_0.6-7                   splines_4.3.1                
[125] tensor_1.5                    locfit_1.5-9.10              
[127] igraph_2.1.2                  ScaledMatrix_1.8.1           
[129] XML_3.99-0.18                 BiocVersion_3.17.1           
[131] evaluate_1.0.1                renv_1.0.11                  
[133] BiocManager_1.30.25           deSolve_1.40                 
[135] httpuv_1.6.15                 tidyr_1.3.1                  
[137] purrr_1.0.2                   polyclip_1.10-7              
[139] rsvd_1.0.5                    lwgeom_0.2-14                
[141] xtable_1.8-4                  e1071_1.7-16                 
[143] RSpectra_0.16-2               later_1.4.1                  
[145] viridisLite_0.4.2             class_7.3-22                 
[147] tibble_3.2.1                  memoise_2.0.1                
[149] beeswarm_0.4.0                AnnotationDbi_1.62.2         
[151] cluster_2.1.4                 hdrcde_3.4                   
[153] BiocStyle_2.28.1             

©2024 The pasta authors. Content is published under Creative Commons CC-BY-4.0 License for the text and GPL-3 License for any code.

References

Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns. 1st ed. CRC Interdisciplinary Statistics Series. CRC Press, Taylor & Francis Group.
Baddeley, Adrian, and Rolf Turner. 2005. “Spatstat: An R Package for Analyzing Spatial Point Patterns.” Journal of Statistical Software 12 (January): 1–42. https://doi.org/10.18637/jss.v012.i06.
Buller. 2020. “Areas of a Spatial Segregation Model Significantly Different from Null Expectations.” Ian Buller, Ph.D., M.A. https://idblr.rbind.io/post/pvalues-spatial-segregation/.
de la Cruz, Marcelino. 2008. “Métodos Para Analizar Datos Puntuales.” In Introduccion Al Analisis Espacial de Datos En Ecologia y Ciencias Ambientales: Metodos y Aplicaciones, 76–127.
Dixon, Philip M. 2002. “Nearest-Neighbor Contingency Table Analysis of Spatial Segregation for Several Species.” Écoscience 9 (2): 142–51. https://doi.org/10.1080/11956860.2002.11682700.
Moffitt, Jeffrey R., Dhananjay Bambah-Mukku, Stephen W. Eichhorn, Eric Vaughn, Karthik Shekhar, Julio D. Perez, Nimrod D. Rubinstein, et al. 2018. “Molecular, Spatial, and Functional Single-Cell Profiling of the Hypothalamic Preoptic Region.” Science 362 (6416): eaau5324. https://doi.org/10.1126/science.aau5324.
Righelli, Dario, Lukas M Weber, Helena L Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron T L Lun, Stephanie C Hicks, and Davide Risso. 2022. SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in R Using Bioconductor.” Bioinformatics 38 (11): 3128–31. https://doi.org/10.1093/bioinformatics/btac299.
Ripley, Brian D., and Jean-Paul Rasson. 1977. “Finding the Edge of a Poisson Forest.” Journal of Applied Probability 14 (3): 483–91. https://doi.org/10.2307/3213451.