Topic Modeling
Another application of Natural Language Processing is the Topic Modeling, which aims to extract the topics from a given document. In this section, we are going to apply this to Chapter 18 (the Cave) of the Qur'an. To do this, we are going to use the TextAnalysis.jl library. The model for this task will be the Latent Dirichlet Allocation (LDA). To start with, load the data as follows:
julia> using QuranTree
julia> using TextAnalysis
julia> using Yunir
julia> crps, tnzl = QuranData() |> load;
julia> crpsdata = table(crps)
Quranic Arabic Corpus (morphology) (C) 2011 Kais Dukes 128219×7 DataFrame Row │ chapter verse word part form tag features ⋯ │ Int64 Int64 Int64 Int64 String String String ⋯ ────────┼─────────────────────────────────────────────────────────────────────── 1 │ 1 1 1 1 bi P PREFIX|bi+ ⋯ 2 │ 1 1 1 2 somi N STEM|POS:N|LEM:{so 3 │ 1 1 2 1 {ll~ahi PN STEM|POS:PN|LEM:{l 4 │ 1 1 3 1 {l DET PREFIX|Al+ 5 │ 1 1 3 2 r~aHoma`ni ADJ STEM|POS:ADJ|LEM:r ⋯ 6 │ 1 1 4 1 {l DET PREFIX|Al+ 7 │ 1 1 4 2 r~aHiymi ADJ STEM|POS:ADJ|LEM:r 8 │ 1 2 1 1 {lo DET PREFIX|Al+ ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 128213 │ 114 5 5 2 n~aAsi N STEM|POS:N|LEM:n~a ⋯ 128214 │ 114 6 1 1 mina P STEM|POS:P|LEM:min 128215 │ 114 6 2 1 {lo DET PREFIX|Al+ 128216 │ 114 6 2 2 jin~api N STEM|POS:N|LEM:jin 128217 │ 114 6 3 1 wa CONJ PREFIX|w:CONJ+ ⋯ 128218 │ 114 6 3 2 {l DET PREFIX|Al+ 128219 │ 114 6 3 3 n~aAsi N STEM|POS:N|LEM:n~a 1 column and 128204 rows omitted
You need to install Yunir.jl to successfully run the code.
using Pkg
Pkg.add("Yunir")
Pkg.add("TextAnalysis")
Data Preprocessing
The first data processing will be the removal of all Disconnected Letters (like الٓمٓ ,الٓمٓصٓ, among others), Prepositions, Particles, Conjunctions, Pronouns, and Adverbs. This is done as follows:
julia> function preprocess(s::String) feat = parse(QuranFeatures, s) disletters = isfeat(feat, AbstractDisLetters) prepositions = isfeat(feat, AbstractPreposition) particles = isfeat(feat, AbstractParticle) conjunctions = isfeat(feat, AbstractConjunction) pronouns = isfeat(feat, AbstractPronoun) adverbs = isfeat(feat, AbstractAdverb) return !disletters && !prepositions && !particles && !conjunctions && !pronouns && !adverbs end
preprocess (generic function with 1 method)
julia> crpstbl = filter(t -> preprocess(t.features), crpsdata[18].data)
827×7 DataFrame Row │ chapter verse word part form tag features ⋯ │ Int64 Int64 Int64 Int64 String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 18 1 1 2 Hamodu N STEM|POS:N|LEM:Hamod|R ⋯ 2 │ 18 1 2 2 l~ahi PN STEM|POS:PN|LEM:{ll~ah 3 │ 18 1 4 1 >anzala V STEM|POS:V|PERF|(IV)|L 4 │ 18 1 6 1 Eabodi N STEM|POS:N|LEM:Eabod|R 5 │ 18 1 9 1 yajoEal V STEM|POS:V|IMPF|LEM:ja ⋯ 6 │ 18 1 11 1 EiwajaA N STEM|POS:N|LEM:Eiwaj|R 7 │ 18 2 2 1 l~i PRP PREFIX|l:PRP+ 8 │ 18 2 2 2 yun*ira V STEM|POS:V|IMPF|(IV)|L ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 821 │ 18 110 14 1 yarojuwA@ V STEM|POS:V|IMPF|LEM:ya ⋯ 822 │ 18 110 16 1 rab~i N STEM|POS:N|LEM:rab~|RO 823 │ 18 110 17 2 lo IMPV PREFIX|l:IMPV+ 824 │ 18 110 17 3 yaEomalo V STEM|POS:V|IMPF|LEM:Ea 825 │ 18 110 21 1 yu$oriko V STEM|POS:V|IMPF|(IV)|L ⋯ 826 │ 18 110 22 2 EibaAdapi N STEM|POS:N|LEM:EibaAda 827 │ 18 110 23 1 rab~i N STEM|POS:N|LEM:rab~|RO 1 column and 812 rows omitted
Next, we create a copy of the above data so we have the original state, and use the copy to do further data processing.
julia> crpsnew = deepcopy(crpstbl)
827×7 DataFrame Row │ chapter verse word part form tag features ⋯ │ Int64 Int64 Int64 Int64 String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 18 1 1 2 Hamodu N STEM|POS:N|LEM:Hamod|R ⋯ 2 │ 18 1 2 2 l~ahi PN STEM|POS:PN|LEM:{ll~ah 3 │ 18 1 4 1 >anzala V STEM|POS:V|PERF|(IV)|L 4 │ 18 1 6 1 Eabodi N STEM|POS:N|LEM:Eabod|R 5 │ 18 1 9 1 yajoEal V STEM|POS:V|IMPF|LEM:ja ⋯ 6 │ 18 1 11 1 EiwajaA N STEM|POS:N|LEM:Eiwaj|R 7 │ 18 2 2 1 l~i PRP PREFIX|l:PRP+ 8 │ 18 2 2 2 yun*ira V STEM|POS:V|IMPF|(IV)|L ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 821 │ 18 110 14 1 yarojuwA@ V STEM|POS:V|IMPF|LEM:ya ⋯ 822 │ 18 110 16 1 rab~i N STEM|POS:N|LEM:rab~|RO 823 │ 18 110 17 2 lo IMPV PREFIX|l:IMPV+ 824 │ 18 110 17 3 yaEomalo V STEM|POS:V|IMPF|LEM:Ea 825 │ 18 110 21 1 yu$oriko V STEM|POS:V|IMPF|(IV)|L ⋯ 826 │ 18 110 22 2 EibaAdapi N STEM|POS:N|LEM:EibaAda 827 │ 18 110 23 1 rab~i N STEM|POS:N|LEM:rab~|RO 1 column and 812 rows omitted
julia> feats = crpsnew[!, :features]
827-element Vector{String}: "STEM|POS:N|LEM:Hamod|ROOT:Hmd|M|NOM" "STEM|POS:PN|LEM:{ll~ah|ROOT:Alh|GEN" "STEM|POS:V|PERF|(IV)|LEM:>anzala|ROOT:nzl|3MS" "STEM|POS:N|LEM:Eabod|ROOT:Ebd|M|GEN" "STEM|POS:V|IMPF|LEM:jaEala|ROOT:jEl|3MS|MOOD:JUS" "STEM|POS:N|LEM:Eiwaj|ROOT:Ewj|M|NOM" "PREFIX|l:PRP+" "STEM|POS:V|IMPF|(IV)|LEM:>an*ara|ROOT:n*r|3MS|MOOD:SUBJ" "STEM|POS:N|LEM:l~adun|ROOT:ldn|GEN" "STEM|POS:V|IMPF|(II)|LEM:bu\$~ira|ROOT:b\$r|3MS|MOOD:SUBJ" ⋮ "STEM|POS:ADJ|LEM:wa`Hid|ROOT:wHd|MS|INDEF|NOM" "STEM|POS:V|PERF|LEM:kaAna|ROOT:kwn|SP:kaAn|3MS" "STEM|POS:V|IMPF|LEM:yarojuwA@|ROOT:rjw|3MS" "STEM|POS:N|LEM:rab~|ROOT:rbb|M|GEN" "PREFIX|l:IMPV+" "STEM|POS:V|IMPF|LEM:Eamila|ROOT:Eml|3MS|MOOD:JUS" "STEM|POS:V|IMPF|(IV)|LEM:>a\$oraka|ROOT:\$rk|3MS|MOOD:JUS" "STEM|POS:N|LEM:EibaAdat|ROOT:Ebd|F|GEN" "STEM|POS:N|LEM:rab~|ROOT:rbb|M|GEN"
julia> feats = parse.(QuranFeatures, feats)
827-element Vector{AbstractQuranFeature}: Stem(:N, N, AbstractQuranFeature[Lemma("Hamod"), Root("Hmd"), M, NOM]) Stem(:PN, PN, AbstractQuranFeature[Lemma("{ll~ah"), Root("Alh"), GEN]) Stem(:V, V, AbstractQuranFeature[Lemma(">anzala"), Root("nzl"), PERF, IV, 3, M, S, IND, ACT]) Stem(:N, N, AbstractQuranFeature[Lemma("Eabod"), Root("Ebd"), M, GEN]) Stem(:V, V, AbstractQuranFeature[Lemma("jaEala"), Root("jEl"), JUS, IMPF, 3, M, S, ACT, I]) Stem(:N, N, AbstractQuranFeature[Lemma("Eiwaj"), Root("Ewj"), M, NOM]) Prefix(Symbol("l:PRP+"), PRP) Stem(:V, V, AbstractQuranFeature[Lemma(">an*ara"), Root("n*r"), SUBJ, IMPF, IV, 3, M, S, ACT]) Stem(:N, N, AbstractQuranFeature[Lemma("l~adun"), Root("ldn"), GEN]) Stem(:V, V, AbstractQuranFeature[Lemma("bu\$~ira"), Root("b\$r"), SUBJ, IMPF, II, 3, M, S, ACT]) ⋮ Stem(:ADJ, ADJ, AbstractQuranFeature[Lemma("wa`Hid"), Root("wHd"), M, S, INDEF, NOM]) Stem(:V, V, AbstractQuranFeature[Lemma("kaAna"), Root("kwn"), Special("kaAn"), PERF, 3, M, S, IND, ACT, I]) Stem(:V, V, AbstractQuranFeature[Lemma("yarojuwA@"), Root("rjw"), IMPF, 3, M, S, IND, ACT, I]) Stem(:N, N, AbstractQuranFeature[Lemma("rab~"), Root("rbb"), M, GEN]) Prefix(Symbol("l:IMPV+"), IMPV) Stem(:V, V, AbstractQuranFeature[Lemma("Eamila"), Root("Eml"), JUS, IMPF, 3, M, S, ACT, I]) Stem(:V, V, AbstractQuranFeature[Lemma(">a\$oraka"), Root("\$rk"), JUS, IMPF, IV, 3, M, S, ACT]) Stem(:N, N, AbstractQuranFeature[Lemma("EibaAdat"), Root("Ebd"), F, GEN]) Stem(:N, N, AbstractQuranFeature[Lemma("rab~"), Root("rbb"), M, GEN])
Lemmatization
Using the above parsed features, we then convert the form
of the tokens into its lemma. This is useful for addressing inflections.
julia> lemmas = lemma.(feats)
827-element Vector{Union{Missing, String}}: "Hamod" "{ll~ah" ">anzala" "Eabod" "jaEala" "Eiwaj" missing ">an*ara" "l~adun" "bu\$~ira" ⋮ "wa`Hid" "kaAna" "yarojuwA@" "rab~" missing "Eamila" ">a\$oraka" "EibaAdat" "rab~"
julia> forms1 = crpsnew[!, :form]
827-element Vector{String}: "Hamodu" "l~ahi" ">anzala" "Eabodi" "yajoEal" "EiwajaA" "l~i" "yun*ira" "l~aduno" "yuba\$~ira" ⋮ "wa`HidN" "kaAna" "yarojuwA@" "rab~i" "lo" "yaEomalo" "yu\$oriko" "EibaAdapi" "rab~i"
julia> forms1[.!ismissing.(lemmas)] = lemmas[.!ismissing.(lemmas)]
795-element Vector{Union{Missing, String}}: "Hamod" "{ll~ah" ">anzala" "Eabod" "jaEala" "Eiwaj" ">an*ara" "l~adun" "bu\$~ira" "Eamila" ⋮ "<ila`h" "wa`Hid" "kaAna" "yarojuwA@" "rab~" "Eamila" ">a\$oraka" "EibaAdat" "rab~"
We can also use the Root
features instead, which is done by simply replacing lemma.(feats)
with root.(feats)
.
We now put back the new form to the corpus:
julia> crpsnew[!, :form] = forms1
827-element Vector{String}: "Hamod" "{ll~ah" ">anzala" "Eabod" "jaEala" "Eiwaj" "l~i" ">an*ara" "l~adun" "bu\$~ira" ⋮ "wa`Hid" "kaAna" "yarojuwA@" "rab~" "lo" "Eamila" ">a\$oraka" "EibaAdat" "rab~"
julia> crpsnew = CorpusData(crpsnew)
Quranic Arabic Corpus (morphology) (C) 2011 Kais Dukes 827×7 DataFrame Row │ chapter verse word part form tag features ⋯ │ Int64 Int64 Int64 Int64 String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 18 1 1 2 Hamod N STEM|POS:N|LEM:Hamod|R ⋯ 2 │ 18 1 2 2 {ll~ah PN STEM|POS:PN|LEM:{ll~ah 3 │ 18 1 4 1 >anzala V STEM|POS:V|PERF|(IV)|L 4 │ 18 1 6 1 Eabod N STEM|POS:N|LEM:Eabod|R 5 │ 18 1 9 1 jaEala V STEM|POS:V|IMPF|LEM:ja ⋯ 6 │ 18 1 11 1 Eiwaj N STEM|POS:N|LEM:Eiwaj|R 7 │ 18 2 2 1 l~i PRP PREFIX|l:PRP+ 8 │ 18 2 2 2 >an*ara V STEM|POS:V|IMPF|(IV)|L ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 821 │ 18 110 14 1 yarojuwA@ V STEM|POS:V|IMPF|LEM:ya ⋯ 822 │ 18 110 16 1 rab~ N STEM|POS:N|LEM:rab~|RO 823 │ 18 110 17 2 lo IMPV PREFIX|l:IMPV+ 824 │ 18 110 17 3 Eamila V STEM|POS:V|IMPF|LEM:Ea 825 │ 18 110 21 1 >a$oraka V STEM|POS:V|IMPF|(IV)|L ⋯ 826 │ 18 110 22 2 EibaAdat N STEM|POS:N|LEM:EibaAda 827 │ 18 110 23 1 rab~ N STEM|POS:N|LEM:rab~|RO 1 column and 812 rows omitted
Tokenization
We want to summarize the Qur'an at the verse level. Thus, the token would be the verses of the corpus. From these verses, we further clean it by dediacritization and normalization of the characters:
julia> lem_vrs = verses(crpsnew)
109-element Vector{String}: "Hamod {ll~ah >anzala Eabod jaEala Eiwaj" "l~i>an*ara l~adun bu\$~ira Eamila" ">an*ara qaAla {t~axa*a {ll~ah" "Eilom A^baA' kabura xaraja >afowa`h qaAla" "ba`xiE >avar 'aAmana Hadiyv" "jaEala >aroD libalawo >aHosan" "lajaAEil" "Hasiba kahof r~aqiym kaAna 'aAyap" ">awaY fitoyap kahof qaAla A^taY l~adun yuhay~i}o >amor" "Daraba >u*unN kahof" ⋮ "Hasiba kafara {t~axa*a Eabod duwn >aEotadato jahan~am ka`firuwn" "qaAla nab~a>a >axosariyn" "Dal~a saEoy Hayaw`p d~unoyaA Hasiba >aHosana" "kafara 'aAyap rab~ liqaA^' HabiTa Eamal >aqaAma qiya`map" "jazaA^' jahan~am kafara {t~axa*a 'aAyap rasuwl" "'aAmana Eamila S~a`liHa`t kaAna jan~ap firodawos" "bagaY`" "qaAla kaAna baHor kalima`t rab~" ⋯ 18 bytes ⋯ "fida kalima`t rab~ jaA^'a mivol" "qaAla ba\$ar mivol >awoHaY`^ <il" ⋯ 39 bytes ⋯ "loEamila >a\$oraka EibaAdat rab~"
julia> vrs = normalize.(dediac.(lem_vrs))
109-element Vector{String}: "Hmd {llh >nzl Ebd jEl Ewj" "l>n*r ldn b\$r Eml" ">n*r qAl {tx* {llh" "Elm AbA' kbr xrj >fwh qAl" "bxE >vr 'Amn Hdyv" "jEl >rD lblw >Hsn" "ljAEl" "Hsb khf rqym kAn 'Ayp" ">wY ftyp khf qAl AtY ldn yhy} >mr" "Drb >*n khf" ⋮ "Hsb kfr {tx* Ebd dwn >Etdt jhnm kfrwn" "qAl nb> >xsryn" "Dl sEy Hywp dnyA Hsb >Hsn" "kfr 'Ayp rb lqA' HbT Eml >qAm qymp" "jzA' jhnm kfr {tx* 'Ayp rswl" "'Amn Eml SlHt kAn jnp frdws" "bgY" "qAl kAn bHr klmt rb lnfd bHr nfd klmt rb jA' mvl" "qAl b\$r mvl >wHY <lh <lh wHd kAn yrjwA@ rb lEml >\$rk EbAdt rb"
Creating a TextAnalysis Corpus
To make use of the TextAnalysis.jl's APIs, we need to encode the processed Quranic Corpus to TextAnalysis.jl's Corpus. In this case, we will create a StringDocument
of the verses.
julia> crps1 = Corpus(StringDocument.(vrs))
A Corpus with 109 documents: * 109 StringDocument's * 0 FileDocument's * 0 TokenDocument's * 0 NGramDocument's Corpus's lexicon contains 0 tokens Corpus's index contains 0 tokens
We then update the lexicon and inverse index for efficient indexing of the corpus.
julia> update_lexicon!(crps1)
julia> update_inverse_index!(crps1)
Next, we create a Document Term Matrix, which will have rows of verses and columns of words describing the verses.
julia> m1 = DocumentTermMatrix(crps1)
A 109 X 369 DocumentTermMatrix
Latent Dirichlet Allocation
Finally, run LDA as follows:
julia> k = 3 # number of topics
3
julia> iter = 1000 # number of gibbs sampling iterations
1000
julia> alpha = 0.1 # hyperparameter
0.1
julia> beta = 0.1 # hyperparameter
0.1
julia> ϕ, θ = lda(m1, k, iter, alpha, beta)
(sparse([1, 3, 1, 2, 2, 1, 2, 2, 2, 3 … 2, 3, 1, 3, 2, 1, 1, 2, 2, 3], [1, 2, 3, 3, 4, 5, 6, 7, 8, 8, 9, 9, 9, 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 20, 21, 22, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 31, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 49, 50, 51, 52, 53, 54, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 89, 90, 91, 92, 93, 93, 94, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 111, 112, 113, 113, 114, 115, 116, 117, 118, 119, 120, 121, 121, 122, 123, 123, 124, 124, 125, 125, 126, 126, 127, 128, 129, 129, 130, 131, 132, 132, 133, 134, 135, 136, 137, 137, 138, 139, 140, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 158, 159, 159, 160, 161, 162, 163, 164, 164, 165, 166, 166, 167, 168, 169, 170, 171, 172, 172, 173, 174, 175, 176, 177, 177, 178, 179, 179, 180, 180, 181, 182, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 201, 202, 203, 204, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 215, 216, 217, 218, 219, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 246, 247, 248, 249, 250, 251, 252, 253, 253, 253, 254, 255, 256, 257, 258, 258, 259, 260, 261, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 271, 271, 272, 273, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 319, 320, 321, 322, 323, 324, 325, 325, 325, 326, 327, 328, 329, 330, 330, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 368, 369, 369], [0.01929260450160772, 0.019672131147540985, 0.003215434083601286, 0.002702702702702703, 0.005405405405405406, 0.003215434083601286, 0.002702702702702703, 0.002702702702702703, 0.002702702702702703, 0.003278688524590164 … 0.002702702702702703, 0.003278688524590164, 0.003215434083601286, 0.006557377049180328, 0.002702702702702703, 0.003215434083601286, 0.07395498392282958, 0.07837837837837838, 0.024324324324324326, 0.003278688524590164], 3, 369), [0.0 1.0 … 0.0 0.0; 1.0 0.0 … 0.0 0.0; 0.0 0.0 … 1.0 1.0])
Extract the topic for first cluster:
julia> ntopics = 10
10
julia> cluster_topics = Matrix(undef, ntopics, k);
julia> for i = 1:k topics_idcs = sortperm(ϕ[i, :], rev=true) cluster_topics[:, i] = arabic.(m1.terms[topics_idcs][1:ntopics]) end
julia> cluster_topics
10×3 Matrix{Any}: "ء" "ٱ" "قال" "ٱ" "ذ" "رب" "لله" "قال" "ء" "ذ" "كان" "اتى" "قال" "جعل" "كان" "ر" "ستطاع" "ل" "تخ" "ئ" "شى" "امن" "أمر" "أرض" "وجد" "رب" "لبث" "شا" "دون" "علم"
Tabulating this properly would give us the following
Pkg.add("DataFrames")
Pkg.add("Latexify")
using DataFrames: DataFrame
using Latexify
mdtable(DataFrame(
topic1 = cluster_topics[:, 1],
topic2 = cluster_topics[:, 2],
topic3 = cluster_topics[:, 3]
), latex=false)
topic1 | topic2 | topic3 |
---|---|---|
ء | ٱ | قال |
ٱ | ذ | رب |
لله | قال | ء |
ذ | كان | اتى |
قال | جعل | كان |
ر | ستطاع | ل |
تخ | ئ | شى |
امن | أمر | أرض |
وجد | رب | لبث |
شا | دون | علم |
As you may have noticed, the result is not good and this is mainly due to data processing. Readers are encourage to improve this for their use case. This section simply demonstrated how TextAnalysis.jl's LDA can be used for Topic Modeling of the QuranTree.jl's corpus.
Finally, the following will extract the topic for each verse:
julia> vrs_topics = []
Any[]
julia> for i = 1:dtm(m1).m push!(vrs_topics, sortperm(θ[:, i], rev=true)[1]) end
julia> vrs_topics
109-element Vector{Any}: 2 1 1 3 1 2 3 1 2 2 ⋮ 2 2 2 1 1 1 1 3 3