AUCell.aucell_kernel
— Methodaucell_kernel(mat, features, gene_set)
Calculate the AUC for each gene_set
in each profile of mat
and row names of mat
are stored as features
which should be the same types with those in gene_set
.
Examples
julia> mat = [1 2 3;4 5 6;0 1 2;7 8 0]
4×3 Matrix{Int64}:
1 2 3
4 5 6
0 1 2
7 8 0
julia> fea = ["a", "b", "c", "d"]
julia> gene_set = ["b","c", "e"]
julia> aucell_kernel(mat, fea, gene_set)
1×3 Matrix{Float64}:
0.125 0.125 0.5
julia> gene_sets = [["a", "b", "e"], ["b", "d", "e"]]
julia> aucell_kernel(mat, fea, gene_sets)
2×3 Matrix{Float64}:
0.25 0.25 0.75
0.75 0.75 0.375
AUCell.cell_marker_score
— Methodcell_marker_score(mat, features, barcodes, gene_set, group)
Given a single-cell RNA expression matrix mat
with row-names of features
and column-names of barcodes
, calculate the relative cell type marker scores (0-1) for the gene_set
; the grouping information is specified in the group
(vector of vectors, which store the cell barcodes in each group).
Examples
julia> mat = rand(0:32, 12, 8)
julia> features = 1:12
julia> gene_set = [1,5,6,8]
julia> barcodes = ["a", "b", "c", "d", "e", "f", "g", "h"]
julia> group = [["a", "b", "g", "h"], ["c", "d", "e", "f"]]
2-element Vector{Vector{String}}:
["a", "b", "g", "h"]
["c", "d", "e", "f"]
julia> cell_marker_score(mat, features, barcodes, gene_set, group)
4 genes are found among 4 genes.
1×2 Matrix{Float64}:
0.476227 0.523773
AUCell.classify_cell_cluster
— MethodAside each cell into clusters.
AUCell.filter_expr_matrix
— Functionfilter_expr_matrix(mat, feature_threshold, cell_threshold)
Filter an expression matrix mat
, only keep those genes expressed in greater than feature_threshold
cells and cells expressing greater than cell_threshold
features. Return the filtered matrix and the bit vectors for keeping features and cells.
Examples
julia> @time mat, fea, bar = read_mtx("matrix.mtx", "features.tsv", "barcodes.tsv")
julia> size(mat)
(36601, 5744)
julia> @time mat2, kf, kb = filter_expr_matrix(mat)
26.438175 seconds (978.08 k allocations: 1.320 GiB, 0.52% gc time)
(sparse([2, 12, 15, 25, 26, 27, 29, 32, 34, 37 … 21104, 21105, 21106, 21107, 21108, 21109, 21110, 21111, 21113, 21116], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 5728, 5728, 5728, 5728, 5728, 5728, 5728, 5728, 5728, 5728], Int32[1, 1, 5, 1, 4, 1, 1, 1, 1, 1 … 287, 8, 239, 124, 32, 8, 145, 41, 99, 2], 21121, 5728), Bool[0, 0, 0, 1, 0, 0, 1, 0, 0, 0 … 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
julia> size(mat2)
(21121, 5728)
julia> fea2 = fea[kf]; bar2 = bar[kb];
julia> length(fea2)
21121
julia> length(bar2)
5728
Arguments
mat::AbstractMatrix
: expression matrix (either dense or sparse).feature_threshold::Int
: the least number of cells that a feature must express in, in order to be kept. Default: 30.cell_threshold::Int
: the least number of genes that a cell must express, in order to be kept. Default: 200.
AUCell.generate_pseudobulk
— Functiongenerate_pseudobulk(mat, np)
Generate a matrix of pseudobulk profiles from mat
which stores single-cell RNA profiles. Each column represents a cell's profile. Each pseudobulk profile is generated from np
(default: 10) single-cell profiles.
Examples
julia> generate_pseudobulk(rand(0:32, 10, 6), 3)
10×2 Matrix{Int64}:
59 30
66 34
37 26
58 70
83 86
15 11
58 62
38 62
62 35
15 51
AUCell.mode_pathway_cluster
— Methodpathway_cluster mode. `pathway_cluster` is subgroups based on pathway activation.
Examples
julia> mode_pathway_cluster([[1,5,7] [6,4,3] [8,5,2]],[[1,5,7] [6,4,3] [8,5,2]],BitVector([0,1,1]),reshape([1,2,3],:,1),["sample1","sample2","sample3"])
0.005570 seconds (2.62 k allocations: 150.455 KiB, 97.78% compilation time)
1×5 Matrix{Any}:
["sample1"] ["sample2", "sample3"] NaN NaN [0.5 0.0 0.0]
AUCell.pathway_AUC_main
— FunctionExamples
default: reAUCluster mode
julia> pathway_AUC_main(use_testdata = "yes")
1.632442 seconds (8.44 M allocations: 279.642 MiB, 7.95% gc time, 73.87% compilation time)
[ Info: INFO: The size of expression profile was (36602, 8).
1.779532 seconds (4.95 M allocations: 260.557 MiB, 4.14% gc time, 97.91% compilation time)
[ Info: INFO: The filtered of expression profile size was (7549, 8).
0.000320 seconds (27 allocations: 34.672 KiB)
[ Info: INFO: There are 1 pathways to be analyzed.
0.768511 seconds (1.50 M allocations: 99.943 MiB, 2.46% gc time, 95.16% compilation time)
2×5 Matrix{Any}:
"pathways_name" ["cluster1"] … ["t"] ["pvalue"]
"HALLMARK_TNFA_SIGNALING_VIA_NFKB" Any["AAACCCAAGGGTTAAT-1", "AAACCCAAGAAACCAT-1", "AAACCCAAGCAACAAT-1", "AAACCCAAGCCAGAGT-1", "AAACCCACAGCAGATG-1"] [4.92654] [0.00263937]
aucell mode
julia> pathway_AUC_main(use_testdata = "yes", mode = "aucell")
1.557316 seconds (8.44 M allocations: 279.659 MiB, 3.27% gc time, 78.85% compilation time)
[ Info: INFO: The size of expression profile was (36602, 8).
1.771720 seconds (4.95 M allocations: 260.557 MiB, 3.69% gc time, 97.39% compilation time)
[ Info: INFO: The filtered of expression profile size was (7549, 8).
0.000329 seconds (27 allocations: 34.672 KiB)
[ Info: INFO: There are 1 pathways to be analyzed.
0.667055 seconds (1.75 M allocations: 87.598 MiB, 3.82% gc time, 99.79% compilation time)
[ Info: INFO: According to the meta information, there are 8 groups of data and each group will be analyzed with the rest of the sample.
3.153389 seconds (6.62 M allocations: 421.960 MiB, 3.39% gc time, 80.62% compilation time)
2×65 Matrix{Any}:
"GeneSet" "AAACCCAAGAAACCAT-1" "AAACCCAAGAAACCAT-1" "AAACCCAAGAAACCAT-1" … "AAACCCAGTACGGGAT-1" "AAACCCAGTACGGGAT-1" "AAACCCAGTACGGGAT-1"
"HALLMARK_TNFA_SIGNALING_VIA_NFKB" 0.506962 0.500821 0.515332 0.512858 0.482078 0.440029
AUCell.read_expr_matrix
— Method read_expr_matrix(fn, rn, cn)
Read in an expression matrix stored in fn
where its row names are stored in rn
and column names are stored in cn
. It returns (matrix, vector of row names, vector of column names)
Examples
julia> mat, fea, bar = read_expr_matrix("matrix.csv", "features.tsv", "barcodes.tsv", matrix_delim = ',')
julia> mat, fea, bar = read_expr_matrix("matrix.txt", "features.tsv", "barcodes.tsv", matrix_delim = ' ')
julia> mat, fea, bar = read_expr_matrix("matrix.tsv", "features.tsv", "barcodes.tsv", matrix_delim = ' ')
julia> mat, fea, bar = read_expr_matrix("matrix.tsv", "features.tsv", "barcodes.tsv")
AUCell.read_gmt
— Methodread_gmt(fn)
Read in a GMT file (MSigDB gene set format), where fn
is the file path.
Examples
julia> res = read_gmt("h.all.v7.5.1.symbols.gmt")
julia> gn, gs = read_gmt("h.all.v7.5.1.symbols.gmt")
AUCell.read_gsf
— Methodread_gsf(fn [, delim = ','])
Read in a general gene set file, where fn
is the file path and the fields are separated by the delim
character (default: white space). Each row represents a gene set and the first column is the name of the set and the rest are the genes in the set.
Examples
julia> gn, gs = read_gsf("my_gene_set.csv", delim = ',')
julia> gn, gs = read_gsf("my_gene_set.tsv", delim = ' ')
julia> gn, gs = read_gsf("my_gene_set.tsv")
AUCell.read_meta
— Functionread_meta(fn, group)
Read in a meta data file with the first row assumed to be the header and the row names assumed to be the profile names (cell barcodes). Grouping information is specified by the column with the header name of group
. If group
is not found, the second column will be used. It returns the grouped profile names (vector of vectors) and group names.
Examples
julia> grp, nam = read_meta("meta.tsv", "Cluster")
julia> length(grp)
12
julia> length.(grp)
12-element Vector{Int64}:
65
512
1057
647
654
326
680
369
1191
46
101
80
julia> length(nam)
12
AUCell.read_mtx
— Methodread_mtx(fn, rn, cn)
Read in the common 10X single-cell RNA expression file in the MTX format (unzipped).
Examples
julia> @time mat, fea, bar = read_mtx("matrix.mtx", "features.tsv", "barcodes.tsv")
62.946154 seconds (481.84 M allocations: 13.082 GiB, 3.50% gc time)
(sparse([7, 27, 31, 44, 45, 46, 49, 52, 54, 58 … 36563, 36564, 36565, 36566, 36567, 36568, 36569, 36570, 36572, 36576], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 5744, 5744, 5744, 5744, 5744, 5744, 5744, 5744, 5744, 5744], Int32[1, 1, 5, 1, 4, 1, 1, 1, 1, 1 … 287, 8, 239, 124, 32, 8, 145, 41, 99, 2], 36601, 5744), Any["ENSG00000243485", "ENSG00000237613", "ENSG00000186092", "ENSG00000238009", "ENSG00000239945", "ENSG00000239906", "ENSG00000241860", "ENSG00000241599", "ENSG00000286448", "ENSG00000236601" … "ENSG00000274175", "ENSG00000275869", "ENSG00000273554", "ENSG00000278782", "ENSG00000277761", "ENSG00000277836", "ENSG00000278633", "ENSG00000276017", "ENSG00000278817", "ENSG00000277196"], Any["AAACCCAAGAACAAGG-1", "AAACCCAAGCCTGAAG-1", "AAACCCAAGCTGAGTG-1", "AAACCCAAGTATTGCC-1", "AAACCCAGTCATGACT-1", "AAACCCATCGGAATTC-1", "AAACCCATCTGTCTCG-1", "AAACGAAAGCGGGTAT-1", "AAACGAAAGGTAGCCA-1", "AAACGAAAGTGGTGAC-1" … "TTTGGTTTCCACAGCG-1", "TTTGTTGCACCTCGTT-1", "TTTGTTGCAGCTGTTA-1", "TTTGTTGCATACCGTA-1", "TTTGTTGGTAGGACCA-1", "TTTGTTGGTGACAGGT-1", "TTTGTTGTCCACTTTA-1", "TTTGTTGTCCTATTGT-1", "TTTGTTGTCGCTCTAC-1", "TTTGTTGTCTCCAAGA-1"])
Arguments
fn::AbstractString
: MTX file path .rn::AbstractString
: features file path.cn::AbstractString
: barcodes file path.T::Type
: Datatype in the MTX file. Default: Int32.feature_col::Int
: which column is used as feature names. Default: 1 (first).barcode_col::Int
: which column is used as barcode names. Default: 1 (first).
AUCell.splitby
— Methodsplitby(A::AbstractVector, by)
Split a vector into subsets by a function by
which takes two consective elements in A
. The vector is splited at where by
returns false
.
Returns a vector of subsets (iteration indices to A
).