Alignment Of Spatial Genomics Data using Deep Gaussian Processes

Final Notes - While interesting, I personally am sceptical of the method presented in this paper. The main problem is that this uses a probabilistic approach and will therefor not create consistent results over different runs. Ideally, we find a computational approach that limits the probabilistic component.

The paper uses some datasets that could be helpful for a small-scale benchmark (breast cancer ST data) - ❌ Not MERFISH

Additional Material -

The code is available on GitHub.

Abstract - The main idea is to use a probabilistic model that aligns spatially-resolved samples onto a known or unknown common coordinate system (CCS) with respect to phenotypic readouts in a two step (spatial → expression) process.

Introduction

The authors provide reasons why usage of fMRI alignment methods is undesirable:

Curated anatomical CCSs are not available for the diversity of tissue types, developmental stages and species that are studied using spatial genomics.
The readout at each location for fMRI data is a single number representing blood flow, the readout in spatial genomics often has - sparse features.
The spatial resolution of fMRI scans tends to be one of a few standard resolutions, there is a wide diversity of spatial technologies, each with their own resolution and field of view.

Good for literature review purposes - the authors mention several other frameworks such as PASTE, Splotch, Eggplant and STIM.

The method probabilistically projects slices onto a generated common coordinate system (CSS) using a two-stacked GP.

Results

Template-based CSS

To avoid extreme warping when aligning slices, the authors propose choosing one slice as the CSS (and fixing its warping function to identity). I personally see several potentials/pitfalls here:

Use one of the middle slices as the CSS

We can potentially improve upon this by using a simple AABB algorithm as follows:

Compute the AABB of each slice
Compute the centroid of each AABB
Align the centroids against a -origin CSS (i.e. warp the centroids to positions where is the index of the slice).

The problem with a “template” based alignment is what I will call “alignment bias”. By aligning on a single slice (1) or by some median method (2) will cause some inherent bias. Perhaps we should look for a method that does not produce any bias or at least generates as little bias as possible (e.g. iterative alignment).

Recovery of true latent common coordinates

Authors mention that both “de novo” and “template-based” alignments are viable options, I (again) do not see any PI’s. This is important since GP’s are especially notorious for being slow in high-dimensional settings.

Robustness of GPSA to observation noise

The authors compare to PASTE but not PASTE2 (simply because it was not yet available at that time). It would be interesting to perform a benchmark on GPSA and PASTE2 in terms of raw performance and robustness! This is especially the case since the authors mention PASTE applies a linear transformation, which is no longer the case in PASTE2 (which should be able to reliably handle nonlinearities).

Aligning samples in 3D space to create an atlas

They use a GP to fit 3D coordinates and create a 3D atlas. My main question here is why it is necessary to allow the y-coordinate to change? This, at least for us, creates several issues:

Memory representation - Since we have to store an additional dimension instead of simply a pointer to an index (not a problem for small-scale data, but for millions of spots this becomes expensive since we have to store millions of floating point data instead of a single integer)
Visualization - Inference becomes more difficult since they “warp” the slice across an additional dimension, removing “slice-ability” across the 3D sample.

Aligning Visium profiles of the mouse cortex

While the alignment works, the only decreased by in comparison to a naive alignment. This, at least to me, indicates that GPSA is not as good on larger datasets as the authors claim it to be. If there is one upside, the IQR corresponds better to the median, indicating more consistent predictions.