Spatial Datasets
Spatial transcriptomic datasets can be visualized and analyzed in rakaia. The preprocessing steps for spatial datasets are slightly different from antibody-based imaging datasets, and will also vary slightly across different technologies.
Required format
All spatial datasets that contain marker expression need to be pre-processed and imported into rakaia as an Anndata object; crucially, this file putput must have the file extension .h5ad
to be read into rakaia. More specifically, the Anndata object must have a spatial
array in the obs
slot that contains both the x and y coordinates for each spatial measurement, whether it corresponds to a cell, spot, etc. Libraries such as scanpy
and squidpy
are Python libraries that have provide readers for raw spatial data into this format. These libraries will be referenced in the article below.
Spot-based assays: 10X Visium V1, V2
Spot-based spatial technologies such as the 10X Visium Spatial Gene Expression profile transcript counts summarized at the spot level. The Visium technology is capable of profiling tens of thousands of markers per spot, providing comprehensive spatial context of the transcriptome.
Raw Visium data (from the space ranger standard directory output as described here) should be read using either read_visium
function from either scanpy
or squidpy
. Below is a minimal example showing how the data can be preprocessed and exported into a compatible file format:
from scanpy import read_visium
import os
# specify the input directory with the outs subdir
input_dir = "/path_to_visium_raw/outs/"
adata = read_visium(input_dir)
adata.var_names_make_unique()
# specify the output file as an anndata object
out_anndata = "/output_dir/visium.h5ad"
adata.write_h5ad(out_anndata)
When the Anndata object is read into rakaia, the spot size and region dimension will automatically be computed from the spatial coordinates:
Non-spot based assays
Non spot-based assays behave differently in rakaia as the user has more flexibility over the visualization parameters. Specifically, this means that the user may specify a unique visualization size for the data in the image, which isn't supported with spot-based assays because the spot scaling factors are computed automatically from the input data.
Binned expression assays: 10x Visium HD
The HD version of 10X Visium differs slightly from the spot-based technology used in either V1 or V2. Instead of using circular spots that have gaps among them, HD offers tiled, barcoded squares without gaps, as described here. This results in data that can be binned at summarized at three different micron resolutions: 2, 8, and 16.
Currently, the only supported reader in Python for HD datasets is the spatialdata HD reader. The reader generates aggregate expression profiles for each micron resolution, and each of these profiles can be exported as an Anndata file for visualization in rakaia. This example notebook here shows an example of how to export these binned profiles. each bin can then be imported into rakaia as a separate ROI with the same set of marker genes, but with different dimensions and resolution.
10X Xenium In-situ Expression
The 10X Xenium platform differs slightly from the assays above as it profiles in-situ transcript profiles, and also supports segmentation and the overlay of object masks. The notebook example here provides a complete example of reading the data into a spatialdata object and exporting both the expression Anndata object as well as the separate cell segmentation mask as a tiff.
Setting marker sizes for visualization (non spot-based)
The non spot-base technologies above support custom user marker sizes in rakaia. This means that the marker can be visually enlarged or minimized down to a default size of 1 pixel in the viewer. This allows the expression to appear more granular/minimal or more extensive/uniform throughout the ROI as the user desires.
Below are some recommended marker sizes for the different technologies listed above. The marker size can be changed under Additional application settings
-> Appearance
-> Custom spatial marker radius
.
10X Visium HD: 2-3. A marker size of 2 will generaly reveal pixel gaps between areas of expression, with 3 removing those gaps and providing a more uniform, albeit maybe slightly blurred, expression visualization. Generally, marker sizes of 4 of greater will cause expression to overlap erroneously.
10X Visium: 2-4. The marker size should be set based on the presence of an overlaid segmentation mask. Without using a segmentation mask, larger marker sizes up to 4 will make the expression more visually appealing at a global resolution, but could result in markers that "spill" out of the segmentation mask. When using a mask, values of 2-3 allow the marker to appear in the centroid of the cell mask while also allowing the expression to appear at the global resolution.
Other spatial datasets
Additional spatial technologies may be inherently supported in rakaia provided that they follow the input data format as described above (minimally, that spatial coordinates for pixel locations are provided in spatial
array in the obs
slot).
Additional examples of spatial datasets that can be rendered in rakaia include the following from the squidpy spatial technology tutorials:
Vizgen: https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_vizgen.html
4i: https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_fouri.html
Slide-seq: https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_slideseqv2.html
For all tutorial datasets above, the data are imported and anlayzed in Anndata format, consistent with the required format for rakaia. The adata
object referenced in the tutorials can be exported as files with the .h5ad
format.
rakaia features not available to spatial datasets
Spatial datasets have orders of magnitude more variables and markers than IMC datasets (tens of thousands of markers as opposed to 40-50 antibodies), so the lazy loading features behave differently for these technologies. This means that certain features that can be applied to an entire ROI for IMC cannot be used for spatial analysis due to memory and time constraints in processing all dataset markers:
- The channel/marker tile gallery is not supported for as generating a thumbnail preview for thousands of markers would be prohibitively time-consuming
- Both marker correlation and in-app marker quantification can be performed only on markers that are in the current canvas, as these are the only variables that have been loaded into memory at a given point in analysis