This page contains a plan of action concerning the master thesis. This includes which papers would likely to be helpful, potential ideas, reviews of useful techniques and result and more in a time-ordered manner.
29/10/2025
- Make before/after plots of alignment in benchmark
- Look at SCVI
- human embryo notebook
28/10/2025
Worked mainly on getting a Latex manuscript running and found a good style → Legrand Book. While this is officially a “book” template, I think it provides some nice additional features such as header styling, a clean TOC, amazing readable font (that isn’t as boring as TNR) with accompanying math typesetting, option for specific informative “boxes” (e.g. insight or definition/theorems, etc).
Furthermore, I tried to streamline the actual process of setting up the benchmark. This requires the following steps:
- Create a python module that contains tools to load a dataset, wrap alignment functions, execute benchmarks, store benchmark results and manage specific Conda environments.
- This module acts as a mediator between the alignment methods, Jupyter and the benchmark results itself.
- For each method
- Create an implementation of the method following the core module’s semantics
- Have a file that fully determines the (Conda) environment required for the method to run properly. The Conda environment should have;
- A name
- A list of requirements that it should load (excluding
ipykernel)
- A curated set of datasets to test the methods on. Currently, the idea is to have:
- STOMICS ARTISTA data - Axolotl Telencephalon data (small-scale, fast 🟢)
- MERFISH Mouse Brain data (semi-scale, medium 🟠)
- Synthetic data (large scale, slow 🔴) ⇒ Is this necessary? NO! It is really easy to program:
- Choose an arbitrary 3D shape (e.g. sphere) and slice with 2D shape in 1000(0)x planes
- For each section, Poisson sampling can obtain a fixed number of points. If testing reconstruction is the only concern, simply use the points. If we want to include expression vectors, spatially define expression (e.g. volumetric perlin noise)
- Proper metrics to benchmark. I’m thinking of including:
- Performance Metrics - runtime (wall time), memory usage, GPU usage, scalability (i.e. how performance scales with number of spots, tissue slices, or resolution!)µ
- Quality Metrics - We want some sort of quality measure on which we can compare the different methods. Thus, this needs to be a universal method. Some suggestions:
- Haussdorf distance (⇒ max deviation between point sets)
- Jaccard index
- Local neighbourhood preservation (⇒ for each spot, how many of its neighbours are preserved after alignment. ⚠️ might be very expensive to run!)
- Per-cell nearest neighbour or small nearest neighbourhood
- Expression Metrics - Metrics concerning the expression profiles of each spot. Some suggestions:
- Correlation of matched gene expression vectors (i.e. correlation between matched spots)
- Cluster preservation metrics (Adjusted Rand Index or Normalized Mutual Information)
- Robustness Metrics - How robust is the method? For example, we can introduce shifts, jitter or small amounts of noise in the data. We also need to track the failure rate and parameter sensitivity (because we want an automated solution in the end).
- We should also take into account the ease of installation and compatibility with respect to potential future packages we are going to use.
24/10/2025
Progress Update
- Ultimate drawback to using existing packages → versioning. Currently, the main issue is that, to test different solutions, they all require different input/output formats, none of which is properly streamlined. This makes is quite difficult to properly test different solutions.
- Tested the following approaches (partially, due to issues):
- GPSA
- PASTE
- CAST
- (Spateo)
- Benchmark difficulty
- It is going to be difficult to properly benchmark these methods in a unified way, without creating a separate python kernel for each.
- Implementations of the models is open source, what I personally suggest is to copy the method implementations into modules.
- The differences to correct for are often very small and thus negligible (i.e. np.gtg() instead of .gtg())
14/10/2025
Lots of research, planning and being busy with other projects.
30/09/2025 (2)
After some more consideration, a general thought struck me; modular pipeline. The field of computational biology seems to often lean on different tools that aren’t able to properly work together. Since spatiotemporal transcriptomics is a rapidly evolving field, it can be beneficial to provide a baseline pipeline that facilitates specific features (made during the thesis) such as alignment, 3D/4D reconstruction and recovery and inference.
The problem with providing a fixed approach is however that this pipeline is relatively streamlined, but not able to accommodate different input modalities, alignment or reconstruction methods. The general pipeline should thus be modular and expandible. The pipeline would allow researchers to create their own modules that they can use to load/analyse data.
Some examples of modules:
- Input Parser - The data we are working with it (very similar to) MERFISH data. However, there are a multitude of different ST modalities out there and some downstream methods even support compatibility for multimodal inputs (e.g. MERFISH + histology).
- Alignment - As has become clear in the first week, there exists a wide variety of alignment tools (with their own trade-offs in terms of performance and accuracy). As a lab, we can propose our own alignment but also allow for the use of other alignment methods.
- …
Important considerations in this are data layout validation after each stage in the pipeline, to ensure that the data format returned from one stage can be accepted by the next. Furthermore, ideally, the pipeline provides numerous modules such that researchers can easily create, automate and parallelize their pipelines.
An example of how the package could be used:
# The pipeline can be created as an object, # to which various "modules" can be assigned # for (pre)processing. pipeline = Spatial4D() .assignInputParser(MERFISHParser) .assignSpatialAlignment(PASTE2) .assignReconstruction(CCF) .assignTemporalAlignment(GaussianProcess) .benchmark(Expression(your_favourite_gene)) .create() # The data can be returned in a tensor/data- # frame. result = pipeline.analyze(mouse_brain_data)
Ideally, this integrated the GPU (CUDA), visualizations (seaborn/plotly) and efficient data wrangling (polar).
Potential names (had some fun with ChatGPT):
- ASTRA - Advanced SpatioTemporal RNA Analysis
- CORTEX - Computational Omics for Reconstructing Temporal EXpression
Personally a fan of “ASTRA”, since this would make it easier to design a cool logo 😎.
30/09/2025 (1)
After reading multiple papers related to alignment techniques, I will here keep a list of techniques I propose to take a benchmark of. The reasons for creating such a benchmark in the first place is twofold:
- Scalability - Most of the papers describing the methods below do not provide information on the raw performance and parallelizability. Since this is arguably the main bottleneck in terms of alignment, we need to take this into account.
- Data Efficiency - Some methods are multimodal, others use neural networks requiring substantial datasets and training time. Ideally, we use (at least for alignment), a method that does not use both, but rather an online method for one single modality (i.e. MERFISH-type data).
The proposed techniques to include in the benchmark are as follows:
23/09/2025
- Partial alignment of multislice spatially resolved transcriptomics data (PASTE2)
- Perhaps we should consider creating a small benchmark for aligning sliced ST data, to see what is the most optimal and most scalable method. Some methods might be more preferable for single instance experiments, which we will need to carefully rule out.
22/09/2025
This being the first full day of the internship, it immediately becomes clear that for the task at hand we need to stay aware of the following key points, at all times:
- Performance - For one experiment, we have multiple terabytes of data. Because of this, we ideally have a pipeline that is performant for one single experiment. This means we will have to take into account potential performance increases like GPU use or multithreading.
- Scalability - Since we’re not interested in one single experiment, but rather a temporal alignment of multiple experiments, we have to take into account the scalability of the method. This differs from performance in that we need additional optimizations on top of the pipeline for one single experiment (e.g. can we perform multithreaded batch-processing of certain tasks, like segmentation or OT).
- Complexity/Interpretability - Finally, arguably the most important part in terms of usability down the line is the interpretability of the results. In some way we care less about the quality of the alignments, but rather about interpretability of the alignment and potential to integrate other scRNA or ATAC-seq data.