AUCell.aucell_kernelMethod
aucell_kernel(mat, features, gene_set)

Calculate the AUC for each gene_set in each profile of mat and row names of mat are stored as features which should be the same types with those in gene_set.

Examples

julia> mat = [1 2 3;4 5 6;0 1 2;7 8 0]
4×3 Matrix{Int64}:
 1  2  3
 4  5  6
 0  1  2
 7  8  0

julia> fea = ["a", "b", "c", "d"]

julia> gene_set = ["b","c", "e"]

julia> aucell_kernel(mat, fea, gene_set)
1×3 Matrix{Float64}:
 0.125  0.125  0.5

julia> gene_sets = [["a", "b", "e"], ["b", "d", "e"]]

julia> aucell_kernel(mat, fea, gene_sets)
2×3 Matrix{Float64}:
 0.25  0.25  0.75
 0.75  0.75  0.375
AUCell.cell_marker_scoreMethod
cell_marker_score(mat, features, barcodes, gene_set, group)

Given a single-cell RNA expression matrix mat with row-names of features and column-names of barcodes, calculate the relative cell type marker scores (0-1) for the gene_set; the grouping information is specified in the group (vector of vectors, which store the cell barcodes in each group).

Examples


julia> mat = rand(0:32, 12, 8)

julia> features = 1:12

julia> gene_set = [1,5,6,8]

julia> barcodes = ["a", "b", "c", "d", "e", "f", "g", "h"]

julia> group = [["a", "b", "g", "h"], ["c", "d", "e", "f"]]
2-element Vector{Vector{String}}:
 ["a", "b", "g", "h"]
 ["c", "d", "e", "f"]

julia> cell_marker_score(mat, features, barcodes, gene_set, group)
4 genes are found among 4 genes.
1×2 Matrix{Float64}:
 0.476227  0.523773
AUCell.filter_expr_matrixFunction
filter_expr_matrix(mat, feature_threshold, cell_threshold)

Filter an expression matrix mat, only keep those genes expressed in greater than feature_threshold cells and cells expressing greater than cell_threshold features. Return the filtered matrix and the bit vectors for keeping features and cells.

Examples


julia> @time mat, fea, bar = read_mtx("matrix.mtx", "features.tsv", "barcodes.tsv")

julia> size(mat)
(36601, 5744)

julia> @time mat2, kf, kb = filter_expr_matrix(mat)
 26.438175 seconds (978.08 k allocations: 1.320 GiB, 0.52% gc time)
(sparse([2, 12, 15, 25, 26, 27, 29, 32, 34, 37  …  21104, 21105, 21106, 21107, 21108, 21109, 21110, 21111, 21113, 21116], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  5728, 5728, 5728, 5728, 5728, 5728, 5728, 5728, 5728, 5728], Int32[1, 1, 5, 1, 4, 1, 1, 1, 1, 1  …  287, 8, 239, 124, 32, 8, 145, 41, 99, 2], 21121, 5728), Bool[0, 0, 0, 1, 0, 0, 1, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 1, 0], Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

julia> size(mat2)
(21121, 5728)

julia> fea2 = fea[kf]; bar2 = bar[kb];

julia> length(fea2)
21121

julia> length(bar2)
5728

Arguments

  • mat::AbstractMatrix: expression matrix (either dense or sparse).
  • feature_threshold::Int: the least number of cells that a feature must express in, in order to be kept. Default: 30.
  • cell_threshold::Int: the least number of genes that a cell must express, in order to be kept. Default: 200.
AUCell.generate_pseudobulkFunction
generate_pseudobulk(mat, np)

Generate a matrix of pseudobulk profiles from mat which stores single-cell RNA profiles. Each column represents a cell's profile. Each pseudobulk profile is generated from np (default: 10) single-cell profiles.

Examples

julia> generate_pseudobulk(rand(0:32, 10, 6), 3)
10×2 Matrix{Int64}:
 59  30
 66  34
 37  26
 58  70
 83  86
 15  11
 58  62
 38  62
 62  35
 15  51
AUCell.mode_pathway_clusterMethod
pathway_cluster mode. `pathway_cluster` is subgroups based on pathway activation.

Examples

julia> mode_pathway_cluster([[1,5,7] [6,4,3] [8,5,2]],[[1,5,7] [6,4,3] [8,5,2]],BitVector([0,1,1]),reshape([1,2,3],:,1),["sample1","sample2","sample3"])
  0.005570 seconds (2.62 k allocations: 150.455 KiB, 97.78% compilation time)
1×5 Matrix{Any}:
 
["sample1"]  ["sample2", "sample3"]  NaN  NaN  [0.5 0.0 0.0]
AUCell.pathway_AUC_mainFunction

Examples

default: reAUCluster mode

julia> pathway_AUC_main(use_testdata = "yes")
1.632442 seconds (8.44 M allocations: 279.642 MiB, 7.95% gc time, 73.87% compilation time)
[ Info: INFO: The size of expression profile was (36602, 8).
1.779532 seconds (4.95 M allocations: 260.557 MiB, 4.14% gc time, 97.91% compilation time)
[ Info: INFO: The filtered of expression profile size was (7549, 8).
0.000320 seconds (27 allocations: 34.672 KiB)
[ Info: INFO: There are 1 pathways to be analyzed.
0.768511 seconds (1.50 M allocations: 99.943 MiB, 2.46% gc time, 95.16% compilation time)
2×5 Matrix{Any}:
"pathways_name"                     ["cluster1"]                                                                                                       …  ["t"]      ["pvalue"]
"HALLMARK_TNFA_SIGNALING_VIA_NFKB"  Any["AAACCCAAGGGTTAAT-1", "AAACCCAAGAAACCAT-1", "AAACCCAAGCAACAAT-1", "AAACCCAAGCCAGAGT-1", "AAACCCACAGCAGATG-1"]     [4.92654]  [0.00263937]

aucell mode

julia> pathway_AUC_main(use_testdata = "yes", mode = "aucell")
  1.557316 seconds (8.44 M allocations: 279.659 MiB, 3.27% gc time, 78.85% compilation time)
[ Info: INFO: The size of expression profile was (36602, 8).
  1.771720 seconds (4.95 M allocations: 260.557 MiB, 3.69% gc time, 97.39% compilation time)
[ Info: INFO: The filtered of expression profile size was (7549, 8).
  0.000329 seconds (27 allocations: 34.672 KiB)
[ Info: INFO: There are 1 pathways to be analyzed.
  0.667055 seconds (1.75 M allocations: 87.598 MiB, 3.82% gc time, 99.79% compilation time)
[ Info: INFO: According to the meta information, there are 8 groups of data and each group will be analyzed with the rest of the sample.
  3.153389 seconds (6.62 M allocations: 421.960 MiB, 3.39% gc time, 80.62% compilation time)
2×65 Matrix{Any}:
 "GeneSet"                            "AAACCCAAGAAACCAT-1"   "AAACCCAAGAAACCAT-1"   "AAACCCAAGAAACCAT-1"  …   "AAACCCAGTACGGGAT-1"   "AAACCCAGTACGGGAT-1"   "AAACCCAGTACGGGAT-1"
 "HALLMARK_TNFA_SIGNALING_VIA_NFKB"  0.506962               0.500821               0.515332                  0.512858               0.482078               0.440029
AUCell.read_expr_matrixMethod
 read_expr_matrix(fn, rn, cn)

Read in an expression matrix stored in fn where its row names are stored in rn and column names are stored in cn. It returns (matrix, vector of row names, vector of column names)

Examples

julia> mat, fea, bar = read_expr_matrix("matrix.csv", "features.tsv", "barcodes.tsv", matrix_delim = ',')

julia> mat, fea, bar = read_expr_matrix("matrix.txt", "features.tsv", "barcodes.tsv", matrix_delim = '	')

julia> mat, fea, bar = read_expr_matrix("matrix.tsv", "features.tsv", "barcodes.tsv", matrix_delim = '	')

julia> mat, fea, bar = read_expr_matrix("matrix.tsv", "features.tsv", "barcodes.tsv")
AUCell.read_gmtMethod
read_gmt(fn)

Read in a GMT file (MSigDB gene set format), where fn is the file path.

Examples

julia> res = read_gmt("h.all.v7.5.1.symbols.gmt")

julia> gn, gs = read_gmt("h.all.v7.5.1.symbols.gmt")
AUCell.read_gsfMethod
read_gsf(fn [, delim = ','])

Read in a general gene set file, where fn is the file path and the fields are separated by the delim character (default: white space). Each row represents a gene set and the first column is the name of the set and the rest are the genes in the set.

Examples

julia> gn, gs = read_gsf("my_gene_set.csv", delim = ',')

julia> gn, gs = read_gsf("my_gene_set.tsv", delim = '	')

julia> gn, gs = read_gsf("my_gene_set.tsv")
AUCell.read_metaFunction
read_meta(fn, group)

Read in a meta data file with the first row assumed to be the header and the row names assumed to be the profile names (cell barcodes). Grouping information is specified by the column with the header name of group. If group is not found, the second column will be used. It returns the grouped profile names (vector of vectors) and group names.

Examples


julia> grp, nam = read_meta("meta.tsv", "Cluster")

julia> length(grp)
12

julia> length.(grp)
12-element Vector{Int64}:
   65
  512
 1057
  647
  654
  326
  680
  369
 1191
   46
  101
   80

julia> length(nam)
12
AUCell.read_mtxMethod
read_mtx(fn, rn, cn)

Read in the common 10X single-cell RNA expression file in the MTX format (unzipped).

Examples

julia> @time mat, fea, bar = read_mtx("matrix.mtx", "features.tsv", "barcodes.tsv")
 62.946154 seconds (481.84 M allocations: 13.082 GiB, 3.50% gc time)
(sparse([7, 27, 31, 44, 45, 46, 49, 52, 54, 58  …  36563, 36564, 36565, 36566, 36567, 36568, 36569, 36570, 36572, 36576], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  5744, 5744, 5744, 5744, 5744, 5744, 5744, 5744, 5744, 5744], Int32[1, 1, 5, 1, 4, 1, 1, 1, 1, 1  …  287, 8, 239, 124, 32, 8, 145, 41, 99, 2], 36601, 5744), Any["ENSG00000243485", "ENSG00000237613", "ENSG00000186092", "ENSG00000238009", "ENSG00000239945", "ENSG00000239906", "ENSG00000241860", "ENSG00000241599", "ENSG00000286448", "ENSG00000236601"  …  "ENSG00000274175", "ENSG00000275869", "ENSG00000273554", "ENSG00000278782", "ENSG00000277761", "ENSG00000277836", "ENSG00000278633", "ENSG00000276017", "ENSG00000278817", "ENSG00000277196"], Any["AAACCCAAGAACAAGG-1", "AAACCCAAGCCTGAAG-1", "AAACCCAAGCTGAGTG-1", "AAACCCAAGTATTGCC-1", "AAACCCAGTCATGACT-1", "AAACCCATCGGAATTC-1", "AAACCCATCTGTCTCG-1", "AAACGAAAGCGGGTAT-1", "AAACGAAAGGTAGCCA-1", "AAACGAAAGTGGTGAC-1"  …  "TTTGGTTTCCACAGCG-1", "TTTGTTGCACCTCGTT-1", "TTTGTTGCAGCTGTTA-1", "TTTGTTGCATACCGTA-1", "TTTGTTGGTAGGACCA-1", "TTTGTTGGTGACAGGT-1", "TTTGTTGTCCACTTTA-1", "TTTGTTGTCCTATTGT-1", "TTTGTTGTCGCTCTAC-1", "TTTGTTGTCTCCAAGA-1"])

Arguments

  • fn::AbstractString: MTX file path .
  • rn::AbstractString: features file path.
  • cn::AbstractString: barcodes file path.
  • T::Type: Datatype in the MTX file. Default: Int32.
  • feature_col::Int: which column is used as feature names. Default: 1 (first).
  • barcode_col::Int: which column is used as barcode names. Default: 1 (first).
AUCell.splitbyMethod
splitby(A::AbstractVector, by)

Split a vector into subsets by a function by which takes two consective elements in A. The vector is splited at where by returns false.

Returns a vector of subsets (iteration indices to A).