AnnData and MuData
To put it briefly, AnnData
objects represent annotated datasets with the main data as a matrix and with rich annotations that might include tables and arrays. MuData
objects represent collections of AnnData
objects focusing on, but not limited to, scenarios with different AnnData
objects representing different sets of features profiled for the same samples.
Originally, both AnnData objects and MuData objects have been implemented in Python.
AnnData
AnnData
implementation in Muon.jl
tries to mainly follow the reference implementation, albeit there are some differences in how these objects are implemented and behave due to how different languages are designed and opeate.
AnnData
objects can be stored in and read from .h5ad
files.
Creating AnnData objects
A simple 2D array is already enough to initialize an annotated data object:
x = rand(10, 2) * rand(2, 5);
ad = AnnData(X=x)
AnnData object 10 ✕ 5
Observations correpond to the rows of the matrix and have unique names:
ad.obs_names .= "obs_" .* ad.obs_names
10-element Muon.Index{String, UInt8}: "obs_1" "obs_2" "obs_3" "obs_4" "obs_5" "obs_6" "obs_7" "obs_8" "obs_9" "obs_10"
Corresponding arrays for the observations are stored in the .obsm
slot:
f = svd(x);
ad.obsm["X_svd"] = f.U * Diagonal(f.S);
10×5 Matrix{Float64}: -0.918583 -0.246326 1.6901e-16 -4.14421e-17 3.25028e-18 -0.887995 -0.107357 2.46205e-17 1.08497e-17 3.64905e-17 -0.838145 -0.117194 3.3536e-17 3.69332e-17 -1.34083e-17 -0.948302 -0.514805 -4.25235e-17 1.97992e-17 -4.10308e-17 -0.844609 0.0736088 2.01825e-17 2.30467e-18 -3.1411e-18 -0.471753 -0.0273332 7.18613e-18 -8.27236e-18 -1.21768e-17 -1.28503 0.342398 -4.4402e-17 -5.09137e-17 -2.62271e-17 -1.05353 0.523235 6.79878e-17 4.57247e-17 -1.29572e-17 -0.60874 -0.057996 -1.74195e-18 3.30919e-17 1.59686e-17 -2.06022 -0.0471728 -9.65143e-17 -1.07912e-17 2.9506e-17
When data is assigned, it is verified first that the dimensions match:
ad.obsm["X_Vt"] = f.Vt # won't work
# => DimensionMismatch
Slicing AnnData objects
Just as simple arrays, AnnData objects can be subsetted with slicing operations, with the first dimension corresponding to observations and the second dimension corresponding to variables:
obs_sub = "obs_" .* string.(collect(1:3))
ad_sub = ad[obs_sub,:]
AnnData object 3 ✕ 5
Since the dimensions are labelled, using names is a natural way to subset these objects but boolean and integer arrays can be used as well:
# both return the same subset
ad_sub[[true,false,true],:]
ad_sub[[1,3],:]
AnnData object 2 ✕ 5
MuData
The basic idea behind a multimodal object is key $\rightarrow$ value relationship where keys represent the unique names of individual modalities and values are AnnData
objects that contain the correposnding data. Similarly to AnnData
objects, MuData
objects can also contain rich multimodal annotations.
ad2 = AnnData(X=rand(Binomial(1, 0.3), (10, 7)),
obs_names="obs_" .* string.(collect(1:10)))
md = MuData(mod=Dict("view_rand" => ad, "view_binom" => ad2))
MuData object 10 ✕ 12 └ view_rand AnnData object 10 ✕ 5 └ view_binom AnnData object 10 ✕ 7
Features are considered unique to each modality.
Slicing MuData objects
Slicing now works across all modalities:
md[["obs_1", "obs_9"],:]
MuData object 2 ✕ 12 └ view_rand AnnData object 2 ✕ 5 └ view_binom AnnData object 2 ✕ 7
Multimodal annotation
We can store annotation at the multimodal level, that includes multidimensional arrays:
md.obsm["X_svd"] = f.U * Diagonal(f.S);
md.obsm
Muon.AlignedMapping{Tuple{1 => 1}, String, MuData} with 3 entries: "X_svd" => [-0.918583 -0.246326 … -4.14421e-17 3.25028e-18; -0.887995 -0… "view_rand" => Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1] "view_binom" => Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]