Indexing the Corpus
QuranTree.jl offers a intuitive indexing for both Quranic Arabic Corpus and the Tanzil Data, specifically it follows the following usage:
# for Quranic Arabic Corpus
crpsdata[<chapters>][<verses>][<words>][<parts>]
# for Tanzil Data
tnzldata[<chapters>][<verses>]
The following are the options supported for each index:
- Chapters:
- Int64 -
crpsdata[1]
(extracts Chapter 1) - UnitRange -
crpsdata[15:24]
(extracts Chapter 15 to 24) - Array{Int64,1} -
crpsdata[[3,9,10]]
(extracts Chapter 3, 9 and 10) - end (special) -
crpsdata[end-3:end]
(extracts Chapter 111 to 114).
- Int64 -
- Verses:
- Int64 -
crpsdata[1][1]
(extracts Verse 1 of Chapter 1) - UnitRange -
crpsdata[2][15:24]
(extracts verses 15 to 24 of Chapter 2) - Array{Int64,1} -
crpsdata[10][[3,9,10]]
(extracts verses 3, 9 and 10 of Chapter 10)
- Int64 -
- Words: (not applicable for
TanzilData
, onlyCorpusData
)- Int64 -
crpsdata[1][1][1]
(extracts Word 1 of Verse 1 of Chapter 1) - UnitRange -
crpsdata[2][8][1:3]
(extracts words 1 to 3 of Verse 8 of Chapter 2) - Array{Int64,1} -
crpsdata[2][8][[1,3]]
(extracts words 1 and 3 of Verse 8 of Chapter 2)
- Int64 -
- Parts: (not applicable for
TanzilData
, onlyCorpusData
)- Int64 -
crpsdata[1][1][1][1]
(extracts Part 1 of Word 1 of Verse 1 of Chapter 1) - UnitRange -
crpsdata[2][9][1][1:2]
(extracts Part 1 to Part 2 of Word 1 of Verse 9 of Chapter 2) - Array{Int64,1} -
crpsdata[2][9][1][[1,2]]
(extracts Part 1 and Part 2 of Word 1 of Verse 9 of Chapter 2)
- Int64 -
As an example, the following will extract Verse 9 of Chapter 2 in both TanzilData
and CorpusData
:
julia> using QuranTree
julia> data = QuranData();
julia> crps, tnzl = load(data);
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
julia> crpsdata[2][9]
Chapter 2 ٱلْبَقَرَة (The Cow)
Verse 9
Table with 18 rows, 5 columns:
Columns:
# colname type
───────────────────
1 word Int64
2 part Int64
3 form String
4 tag String
5 features String
julia> tnzldata[2][9]
Chapter 2 ٱلْبَقَرَة (The Cow)
Verse 9
Table with 1 rows, 1 columns:
form
────────────────────────────────────────────────────────────
"يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخْدَعُونَ إِلَّآ أَنفُسَهُمْ وَمَا يَشْعُرُونَ"
As shown above, the output of the indexing contains label for the chapter name, both in Arabic and in English. Again, the output of the crpsdata[2][9]
is not shown, since the width of the output is wider than the width of the output pane. So, PrettyTables.jl is used to view the table:
julia> using PrettyTables
julia> @ptconf vcrop_mode=:middle tf=tf_compact
julia> @pt crpsdata[2][9]
------- ------- ----------- -------- ------------------------------------------
word part form tag ⋯
Int64 Int64 String String ⋯
------- ------- ----------- -------- ------------------------------------------
1 1 yuxa`diEu V STEM|POS:V|IMPF|(III)|LEM:yuxa`diEu|ROO ⋯
1 2 wna PRON SUFFIX ⋯
2 1 {ll~aha PN STEM|POS:PN|LEM:{ll~ah|ROO ⋯
3 1 wa CONJ PREFI ⋯
3 2 {l~a*iyna REL STEM|POS:REL|LEM:{ ⋯
4 1 'aAmanu V STEM|POS:V|PERF|(IV)|LEM:'aAmana|ROO ⋯
4 2 wA@ PRON SUFFIX ⋯
5 1 wa REM PREF ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋱
7 1 <il~aA^ RES STEM|POS:RES|L ⋯
8 1 >anfusa N STEM|POS:N|LEM:nafos|ROOT:n ⋯
8 2 humo PRON SUFFIX ⋯
9 1 wa CIRC PREFI ⋯
9 2 maA NEG STEM|POS:NE ⋯
10 1 ya$oEuru V STEM|POS:V|IMPF|LEM:ya$oEuru|ROO ⋯
10 2 wna PRON SUFFIX ⋯
------- ------- ----------- -------- ------------------------------------------
1 column and 3 rows omitted
Combinations of Indices
Combinations of these indices are also supported. For example, the following will extract Chapter 111 to 114, each with verses 1 and 3:
julia> @pt crpsdata[111:114][[1,3]]
--------- ------- ------- ------- ---------- -------- -------------------------
chapter verse word part form tag ⋯
Int64 Int64 Int64 Int64 String String ⋯
--------- ------- ------- ------- ---------- -------- -------------------------
111 1 1 1 tab~ato V STEM|POS:V|PERF|LEM ⋯
111 1 2 1 yadaA^ N STEM|POS:N|LEM: ⋯
111 1 3 1 >abiY N STEM|POS:N|LEM:> ⋯
111 1 4 1 lahabK N STEM|POS:N|LEM:lahab|R ⋯
111 1 5 1 wa CONJ ⋯
111 1 5 2 tab~a V STEM|POS:V|PERF|LEM ⋯
111 3 1 1 sa FUT ⋯
111 3 1 2 yaSolaY` V STEM|POS:V|IMPF|LEM:y ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
114 1 3 1 bi P ⋯
114 1 3 2 rab~i N STEM|POS:N|LEM: ⋯
114 1 4 1 {l DET ⋯
114 1 4 2 n~aAsi N STEM|POS:N|LEM:n~ ⋯
114 3 1 1 <ila`hi N STEM|POS:N|LEM:<il ⋯
114 3 2 1 {l DET ⋯
114 3 2 2 n~aAsi N STEM|POS:N|LEM:n~ ⋯
--------- ------- ------- ------- ---------- -------- -------------------------
1 column and 26 rows omitted
julia> @pt tnzldata[111:114][[1,3]]
--------- ------- --------------------------------------------
chapter verse form
Int64 Int64 String
--------- ------- --------------------------------------------
111 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ تَبَّتْ يَدَآ أَبِى لَهَبٍ وَتَبَّ
111 3 سَيَصْلَىٰ نَارًا ذَاتَ لَهَبٍ
112 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ هُوَ ٱللَّهُ أَحَدٌ
112 3 لَمْ يَلِدْ وَلَمْ يُولَدْ
113 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلْفَلَقِ
113 3 وَمِن شَرِّ غَاسِقٍ إِذَا وَقَبَ
114 1 بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ
114 3 إِلَٰهِ ٱلنَّاسِ
--------- ------- --------------------------------------------
Special indexing end
is also applicable, for example crpsdata[111:114][[1,3]]
is the same as crpsdata[end-3:end][[1,3]]
, and tnzldata[111:114][[1,3]]
is equivalent to tnzldata[end-3:end][[1,3]]
.
Another example, the following will extract Part 1 of Words 1 to 3 of the above CorpusData
output:
julia> @pt crpsdata[111:114][[1,3]][1:3][1]
--------- ------- ------- ------- --------- -------- --------------------------
chapter verse word part form tag ⋯
Int64 Int64 Int64 Int64 String String ⋯
--------- ------- ------- ------- --------- -------- --------------------------
111 1 1 1 tab~ato V STEM|POS:V ⋯
111 1 2 1 yadaA^ N STEM|P ⋯
111 1 3 1 >abiY N STEM|PO ⋯
111 3 1 1 sa FUT ⋯
111 3 2 1 naArFA N STEM|POS:N|L ⋯
111 3 3 1 *aAta N ⋯
112 1 1 1 qulo V STEM|POS:V ⋯
112 1 2 1 huwa PRON ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
113 3 2 1 $ar~i N STEM|PO ⋯
113 3 3 1 gaAsiqK N STEM|POS:N|ACT|PCPL|LEM ⋯
114 1 1 1 qulo V STEM|POS:V ⋯
114 1 2 1 >aEuw*u V STEM|POS ⋯
114 1 3 1 bi P ⋯
114 3 1 1 <ila`hi N STEM|POS: ⋯
114 3 2 1 {l DET ⋯
--------- ------- ------- ------- --------- -------- --------------------------
1 column and 8 rows omitted