AnnData and MuData

To put it briefly, AnnData objects represent annotated datasets with the main data as a matrix and with rich annotations that might include tables and arrays. MuData objects represent collections of AnnData objects focusing on, but not limited to, scenarios with different AnnData objects representing different sets of features profiled for the same samples.

Originally, both AnnData objects and MuData objects have been implemented in Python.

AnnData

AnnData implementation in Muon.jl tries to mainly follow the reference implementation, albeit there are some differences in how these objects are implemented and behave due to how different languages are designed and opeate.

AnnData objects can be stored in and read from .h5ad files.

Creating AnnData objects

A simple 2D array is already enough to initialize an annotated data object:

x = rand(10, 2) * rand(2, 5);
ad = AnnData(X=x)
AnnData object 10 ✕ 5

Observations correpond to the rows of the matrix and have unique names:

ad.obs_names .= "obs_" .* ad.obs_names
10-element Muon.Index{String, UInt8}:
 "obs_1"
 "obs_2"
 "obs_3"
 "obs_4"
 "obs_5"
 "obs_6"
 "obs_7"
 "obs_8"
 "obs_9"
 "obs_10"

Corresponding arrays for the observations are stored in the .obsm slot:

f = svd(x);
ad.obsm["X_svd"] = f.U * Diagonal(f.S);
10×5 Matrix{Float64}:
 -0.918583  -0.246326    1.6901e-16   -4.14421e-17   3.25028e-18
 -0.887995  -0.107357    2.46205e-17   1.08497e-17   3.64905e-17
 -0.838145  -0.117194    3.3536e-17    3.69332e-17  -1.34083e-17
 -0.948302  -0.514805   -4.25235e-17   1.97992e-17  -4.10308e-17
 -0.844609   0.0736088   2.01825e-17   2.30467e-18  -3.1411e-18
 -0.471753  -0.0273332   7.18613e-18  -8.27236e-18  -1.21768e-17
 -1.28503    0.342398   -4.4402e-17   -5.09137e-17  -2.62271e-17
 -1.05353    0.523235    6.79878e-17   4.57247e-17  -1.29572e-17
 -0.60874   -0.057996   -1.74195e-18   3.30919e-17   1.59686e-17
 -2.06022   -0.0471728  -9.65143e-17  -1.07912e-17   2.9506e-17

When data is assigned, it is verified first that the dimensions match:

ad.obsm["X_Vt"] = f.Vt  # won't work
# => DimensionMismatch

Slicing AnnData objects

Just as simple arrays, AnnData objects can be subsetted with slicing operations, with the first dimension corresponding to observations and the second dimension corresponding to variables:

obs_sub = "obs_" .* string.(collect(1:3))
ad_sub = ad[obs_sub,:]
AnnData object 3 ✕ 5

Since the dimensions are labelled, using names is a natural way to subset these objects but boolean and integer arrays can be used as well:

# both return the same subset
ad_sub[[true,false,true],:]
ad_sub[[1,3],:]
AnnData object 2 ✕ 5

MuData

The basic idea behind a multimodal object is key $\rightarrow$ value relationship where keys represent the unique names of individual modalities and values are AnnData objects that contain the correposnding data. Similarly to AnnData objects, MuData objects can also contain rich multimodal annotations.

ad2 = AnnData(X=rand(Binomial(1, 0.3), (10, 7)),
              obs_names="obs_" .* string.(collect(1:10)))

md = MuData(mod=Dict("view_rand" => ad, "view_binom" => ad2))
MuData object 10 ✕ 12
└ view_rand
  AnnData object 10 ✕ 5
└ view_binom
  AnnData object 10 ✕ 7

Features are considered unique to each modality.

Slicing MuData objects

Slicing now works across all modalities:

md[["obs_1", "obs_9"],:]
MuData object 2 ✕ 12
└ view_rand
  AnnData object 2 ✕ 5
└ view_binom
  AnnData object 2 ✕ 7

Multimodal annotation

We can store annotation at the multimodal level, that includes multidimensional arrays:

md.obsm["X_svd"] = f.U * Diagonal(f.S);
md.obsm
Muon.AlignedMapping{Tuple{1 => 1}, String, MuData} with 3 entries:
  "X_svd"      => [-0.918583 -0.246326 … -4.14421e-17 3.25028e-18; -0.887995 -0…
  "view_rand"  => Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
  "view_binom" => Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]