Indexing the Corpus

QuranTree.jl offers a intuitive indexing for both Quranic Arabic Corpus and the Tanzil Data, specifically it follows the following usage:

# for Quranic Arabic Corpus
crpsdata[<chapters>][<verses>][<words>][<parts>]

# for Tanzil Data
tnzldata[<chapters>][<verses>]

The following are the options supported for each index:

  • Chapters:
    • Int64 - crpsdata[1] (extracts Chapter 1)
    • UnitRange - crpsdata[15:24] (extracts Chapter 15 to 24)
    • Array{Int64,1} - crpsdata[[3,9,10]] (extracts Chapter 3, 9 and 10)
    • end (special) - crpsdata[end-3:end] (extracts Chapter 111 to 114).
  • Verses:
    • Int64 - crpsdata[1][1] (extracts Verse 1 of Chapter 1)
    • UnitRange - crpsdata[2][15:24] (extracts verses 15 to 24 of Chapter 2)
    • Array{Int64,1} - crpsdata[10][[3,9,10]] (extracts verses 3, 9 and 10 of Chapter 10)
  • Words: (not applicable for TanzilData, only CorpusData)
    • Int64 - crpsdata[1][1][1] (extracts Word 1 of Verse 1 of Chapter 1)
    • UnitRange - crpsdata[2][8][1:3] (extracts words 1 to 3 of Verse 8 of Chapter 2)
    • Array{Int64,1} - crpsdata[2][8][[1,3]] (extracts words 1 and 3 of Verse 8 of Chapter 2)
  • Parts: (not applicable for TanzilData, only CorpusData)
    • Int64 - crpsdata[1][1][1][1] (extracts Part 1 of Word 1 of Verse 1 of Chapter 1)
    • UnitRange - crpsdata[2][9][1][1:2] (extracts Part 1 to Part 2 of Word 1 of Verse 9 of Chapter 2)
    • Array{Int64,1} - crpsdata[2][9][1][[1,2]] (extracts Part 1 and Part 2 of Word 1 of Verse 9 of Chapter 2)

As an example, the following will extract Verse 9 of Chapter 2 in both TanzilData and CorpusData:

julia> using QuranTree

julia> data = QuranData();

julia> crps, tnzl = load(data);

julia> crpsdata = table(crps);

julia> tnzldata = table(tnzl);

julia> crpsdata[2][9]
Chapter 2 ٱلْبَقَرَة (The Cow)
Verse 9

Table with 18 rows, 5 columns:
Columns:
#  colname   type
───────────────────
1  word      Int64
2  part      Int64
3  form      String
4  tag       String
5  features  String

julia> tnzldata[2][9]
Chapter 2 ٱلْبَقَرَة (The Cow)
Verse 9

Table with 1 rows, 1 columns:
form
────────────────────────────────────────────────────────────
"يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخْدَعُونَ إِلَّآ أَنفُسَهُمْ وَمَا يَشْعُرُونَ"

As shown above, the output of the indexing contains label for the chapter name, both in Arabic and in English. Again, the output of the crpsdata[2][9] is not shown, since the width of the output is wider than the width of the output pane. So, PrettyTables.jl is used to view the table:

julia> using PrettyTables

julia> @ptconf vcrop_mode=:middle tf=tf_compact

julia> @pt crpsdata[2][9]
 ------- ------- ----------- -------- ------------------------------------------
   word    part        form      tag                                           ⋯
  Int64   Int64      String   String                                           ⋯
 ------- ------- ----------- -------- ------------------------------------------
      1       1   yuxa`diEu        V   STEM|POS:V|IMPF|(III)|LEM:yuxa`diEu|ROO ⋯
      1       2         wna     PRON                                    SUFFIX ⋯
      2       1     {ll~aha       PN                STEM|POS:PN|LEM:{ll~ah|ROO ⋯
      3       1          wa     CONJ                                     PREFI ⋯
      3       2   {l~a*iyna      REL                        STEM|POS:REL|LEM:{ ⋯
      4       1     'aAmanu        V      STEM|POS:V|PERF|(IV)|LEM:'aAmana|ROO ⋯
      4       2         wA@     PRON                                    SUFFIX ⋯
      5       1          wa      REM                                      PREF ⋯
    ⋮       ⋮         ⋮         ⋮                             ⋮                ⋱
      7       1     <il~aA^      RES                            STEM|POS:RES|L ⋯
      8       1     >anfusa        N               STEM|POS:N|LEM:nafos|ROOT:n ⋯
      8       2        humo     PRON                                    SUFFIX ⋯
      9       1          wa     CIRC                                     PREFI ⋯
      9       2         maA      NEG                               STEM|POS:NE ⋯
     10       1    ya$oEuru        V          STEM|POS:V|IMPF|LEM:ya$oEuru|ROO ⋯
     10       2         wna     PRON                                    SUFFIX ⋯
 ------- ------- ----------- -------- ------------------------------------------
                                                     1 column and 3 rows omitted

Combinations of Indices

Combinations of these indices are also supported. For example, the following will extract Chapter 111 to 114, each with verses 1 and 3:

julia> @pt crpsdata[111:114][[1,3]]
 --------- ------- ------- ------- ---------- -------- -------------------------
  chapter   verse    word    part       form      tag                          ⋯
    Int64   Int64   Int64   Int64     String   String                          ⋯
 --------- ------- ------- ------- ---------- -------- -------------------------
      111       1       1       1    tab~ato        V      STEM|POS:V|PERF|LEM ⋯
      111       1       2       1     yadaA^        N          STEM|POS:N|LEM: ⋯
      111       1       3       1      >abiY        N         STEM|POS:N|LEM:> ⋯
      111       1       4       1     lahabK        N   STEM|POS:N|LEM:lahab|R ⋯
      111       1       5       1         wa     CONJ                          ⋯
      111       1       5       2      tab~a        V      STEM|POS:V|PERF|LEM ⋯
      111       3       1       1         sa      FUT                          ⋯
      111       3       1       2   yaSolaY`        V    STEM|POS:V|IMPF|LEM:y ⋯
     ⋮        ⋮       ⋮       ⋮        ⋮         ⋮                          ⋮  ⋱
      114       1       3       1         bi        P                          ⋯
      114       1       3       2      rab~i        N          STEM|POS:N|LEM: ⋯
      114       1       4       1         {l      DET                          ⋯
      114       1       4       2     n~aAsi        N        STEM|POS:N|LEM:n~ ⋯
      114       3       1       1    <ila`hi        N       STEM|POS:N|LEM:<il ⋯
      114       3       2       1         {l      DET                          ⋯
      114       3       2       2     n~aAsi        N        STEM|POS:N|LEM:n~ ⋯
 --------- ------- ------- ------- ---------- -------- -------------------------
                                                    1 column and 26 rows omitted

julia> @pt tnzldata[111:114][[1,3]]
 --------- ------- --------------------------------------------
  chapter   verse                                         form
    Int64   Int64                                       String
 --------- ------- --------------------------------------------
      111       1   بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ تَبَّتْ يَدَآ أَبِى لَهَبٍ وَتَبَّ
      111       3                           سَيَصْلَىٰ نَارًا ذَاتَ لَهَبٍ
      112       1        بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ هُوَ ٱللَّهُ أَحَدٌ
      112       3                              لَمْ يَلِدْ وَلَمْ يُولَدْ
      113       1     بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلْفَلَقِ
      113       3                          وَمِن شَرِّ غَاسِقٍ إِذَا وَقَبَ
      114       1     بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ
      114       3                                    إِلَٰهِ ٱلنَّاسِ
 --------- ------- --------------------------------------------
Note

Special indexing end is also applicable, for example crpsdata[111:114][[1,3]] is the same as crpsdata[end-3:end][[1,3]], and tnzldata[111:114][[1,3]] is equivalent to tnzldata[end-3:end][[1,3]].

Another example, the following will extract Part 1 of Words 1 to 3 of the above CorpusData output:

julia> @pt crpsdata[111:114][[1,3]][1:3][1]
 --------- ------- ------- ------- --------- -------- --------------------------
  chapter   verse    word    part      form      tag                           ⋯
    Int64   Int64   Int64   Int64    String   String                           ⋯
 --------- ------- ------- ------- --------- -------- --------------------------
      111       1       1       1   tab~ato        V                STEM|POS:V ⋯
      111       1       2       1    yadaA^        N                    STEM|P ⋯
      111       1       3       1     >abiY        N                   STEM|PO ⋯
      111       3       1       1        sa      FUT                           ⋯
      111       3       2       1    naArFA        N              STEM|POS:N|L ⋯
      111       3       3       1     *aAta        N                           ⋯
      112       1       1       1      qulo        V                STEM|POS:V ⋯
      112       1       2       1      huwa     PRON                           ⋯
     ⋮        ⋮       ⋮       ⋮        ⋮        ⋮                              ⋱
      113       3       2       1     $ar~i        N                   STEM|PO ⋯
      113       3       3       1   gaAsiqK        N   STEM|POS:N|ACT|PCPL|LEM ⋯
      114       1       1       1      qulo        V                STEM|POS:V ⋯
      114       1       2       1   >aEuw*u        V                  STEM|POS ⋯
      114       1       3       1        bi        P                           ⋯
      114       3       1       1   <ila`hi        N                 STEM|POS: ⋯
      114       3       2       1        {l      DET                           ⋯
 --------- ------- ------- ------- --------- -------- --------------------------
                                                     1 column and 8 rows omitted