Morphological Features
QuranTree.jl provides complete types for all morphological features and part of speech of The Quranic Arabic Corpus.
Parsing
The features of each token are encoded as String
in its raw form, and in order to parse this as morphological feature, the function parse(Features, x)
is used, where x
is the raw String
input. For example, the following will parse the 2nd part of the 3rd word of 1st verse of Chapter 1:
julia> using QuranTree
julia> using JuliaDB
julia> crps, tnzl = load(QuranData());
julia> crpsdata = table(crps);
julia> tnzldata = table(tnzl);
julia> crpsdata[1][1][3][2]
Chapter 1 ٱلْفَاتِحَة (The Opening)
Verse 1
Table with 1 rows, 5 columns:
word part form tag features
─────────────────────────────────────────────────────────────────────────────
3 2 "r~aHoma`ni" "ADJ" "STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN"
julia> token = select(crpsdata[1][1][3][2].data, :features)
1-element WeakRefStrings.StringArray{String,1}:
"STEM|POS:ADJ|LEM:r~aHoma`n|ROOT:rHm|MS|GEN"
julia> mfeat = parse(Features, token[1])
Stem(:ADJ, ADJ, AbstractFeature[Lemma("r~aHoma`n"), Root("rHm"), M, S, GEN])
julia> typeof(mfeat)
Stem
You need to install JuliaDB.jl to successfully run the code.
using Pkg
Pkg.add("JuliaDB")
Extracting Detailed Description
To see the detailed description of the features, @desc
is used.
julia> @desc mfeat
Stem
────
Adjective:
├ data: ADJ
├ desc: Adjective
└ ar_label: صفة
Lemma:
└ data: r~aHoma`n
Root:
└ data: rHm
Masculine:
├ data: M
├ desc: Masculine
└ ar_label: الجنس
Singular:
├ data: S
├ desc: Singular
└ ar_label: العدد
Genetive:
├ data: GEN
├ desc: Genetive case
└ ar_label: مجرور
The Julia's dump
function can be used as to how to access the properties of the Stem
object.
julia> dump(mfeat)
Stem
data: Symbol ADJ
pos: Adjective
data: Symbol ADJ
desc: String "Adjective"
ar_label: String "صفة"
feats: Array{AbstractFeature}((5,))
1: Lemma
data: String "r~aHoma`n"
2: Root
data: String "rHm"
3: Masculine
data: Symbol M
desc: String "Masculine"
ar_label: String "الجنس"
4: Singular
data: Symbol S
desc: String "Singular"
ar_label: String "العدد"
5: Genetive
data: Symbol GEN
desc: String "Genetive case"
ar_label: String "مجرور"
julia> # access other feats of the token
mfeat.feats
5-element Array{AbstractFeature,1}:
Lemma("r~aHoma`n")
Root("rHm")
M
S
GEN
Checking Parts of Speech
isfeature(token, pos)
checks whether the token
's parsed feature is a particular part of speech (pos
). For example, the following checks whether mfeat
above, among others, is indeed Masculine
and Singular
.
julia> isfeature(mfeat, Masculine)
true
julia> isfeature(mfeat, Feminine)
false
julia> isfeature(mfeat, Singular)
true
julia> isfeature(mfeat, Adjective) && isfeature(mfeat, Genetive)
true
Another example on checking whether the token has Root
and Lemma
features.
julia> isfeature(mfeat, Root) && isfeature(mfeat, Lemma)
true
isfeature(...)
is useful when working with the JuliaDB.jl's filter function, instead of using regular expressions. For example,
using Pkg
Pkg.add("PrettyTables")
using PrettyTables
@ptconf vcrop_mode=:middle tf=tf_compact
tbl = filter(t -> isfeature(parse(Features, t.features), ActiveParticle), crpsdata.data)
@pt select(tbl, Not(:word, :part, :tag))
Lemma, Root and Special
root
, lemma
and special
functions are used for extracting the Root, Lemma and Special morphological features, respectively.
julia> root(mfeat)
"rHm"
julia> lemma(mfeat)
"r~aHoma`n"
julia> arabic(root(mfeat))
"رحم"
julia> arabic(lemma(mfeat))
"رَّحْمَٰن"
The following example shows token with Special
feature:
julia> token2 = select(crpsdata.data, :features)[53]
"STEM|POS:NEG|LEM:laA|SP:<in~"
julia> mfeat2 = parse(Features, token2)
Stem(:NEG, NEG, AbstractFeature[Lemma("laA"), Special("<in~")])
julia> special(mfeat2)
"<in~"
julia> arabic(special(mfeat2))
"إِنّ"
Implied Verb Features
Some features of Quranic Arabic Verbs are implied. For example, the Voice feature of the Verb is default to Active voice, the Mood feature is default to Indicative mood, and the Verb form feature is default to First form.
julia> token3 = select(crpsdata.data, :features)[27]
"STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P"
token3
is a Verb
with no Mood and Verb form features stated. However, parsing this will automatically add the default values of the said features as shown below:
julia> mfeat3 = parse(Features, token3)
Stem(:V, V, AbstractFeature[Lemma("{sotaEiynu"), Root("Ewn"), IMPF, X, 1, P, IND, ACT])
julia> @desc mfeat3
Stem
────
Verb:
├ data: V
├ desc: Verb
└ ar_label: فعل
Lemma:
└ data: {sotaEiynu
Root:
└ data: Ewn
Imperfect:
├ data: IMPF
├ desc: Imperfect verb
└ ar_label: فعل مضارع
VerbFormX:
├ data: X
├ desc: Tenth verb form
└ ar_label: فعل
FirstPerson:
├ data: 1
├ desc: First person
└ ar_label: الاسناد
Plural:
├ data: P
├ desc: Plural
└ ar_label: العدد
Indicative:
├ data: IND
├ desc: Indicative mood (default)
└ ar_label: مرفوع
Active:
├ data: ACT
├ desc: Active voice (default)
└ ar_label: مبني للمعلوم
Another example where the Voice feature of the Verb is implied:
julia> token4 = select(crpsdata.data, :features)[27]
"STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P"
julia> mfeat4 = parse(Features, token4)
Stem(:V, V, AbstractFeature[Lemma("{sotaEiynu"), Root("Ewn"), IMPF, X, 1, P, IND, ACT])
julia> @desc mfeat4
Stem
────
Verb:
├ data: V
├ desc: Verb
└ ar_label: فعل
Lemma:
└ data: {sotaEiynu
Root:
└ data: Ewn
Imperfect:
├ data: IMPF
├ desc: Imperfect verb
└ ar_label: فعل مضارع
VerbFormX:
├ data: X
├ desc: Tenth verb form
└ ar_label: فعل
FirstPerson:
├ data: 1
├ desc: First person
└ ar_label: الاسناد
Plural:
├ data: P
├ desc: Plural
└ ar_label: العدد
Indicative:
├ data: IND
├ desc: Indicative mood (default)
└ ar_label: مرفوع
Active:
├ data: ACT
├ desc: Active voice (default)
└ ar_label: مبني للمعلوم
POS Abstract Types
The table below contains the complete list of the Part of Speech with its corresponding types. As shown in the table below, each part of speech has a corresponding parent type, which is a superset type in the Type Hierarchy. This is useful for grouping. For example, instead of using ||
(or) in checking for all tokens that are either FirstPerson
, SecondPerson
, or ThirdPerson
, the parent type AbstractPerson
can be used.
julia> # without using parent type
function allpersons(t)
is1st = isfeature(parse(Features, t.features), FirstPerson)
is2nd = isfeature(parse(Features, t.features), SecondPerson)
is3rd = isfeature(parse(Features, t.features), ThirdPerson)
return is1st || is2nd || is3rd
end
allpersons (generic function with 1 method)
julia> tbl1 = filter(allpersons, crpsdata.data);
julia> select(tbl1, (:form, :features))
Table with 44092 rows, 2 columns:
form features
───────────────────────────────────────────────────────────────
"<iy~aAka" "STEM|POS:PRON|LEM:<iy~aA|2MS"
"naEobudu" "STEM|POS:V|IMPF|LEM:Eabada|ROOT:Ebd|1P"
"<iy~aAka" "STEM|POS:PRON|LEM:<iy~aA|2MS"
"nasotaEiynu" "STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P"
"{hodi" "STEM|POS:V|IMPV|LEM:hadaY|ROOT:hdy|2MS"
"naA" "SUFFIX|PRON:1P"
">anoEamo" "STEM|POS:V|PERF|(IV)|LEM:>anoEama|ROOT:nEm|2MS"
"ta" "SUFFIX|PRON:2MS"
"himo" "SUFFIX|PRON:3MP"
⋮
"qulo" "STEM|POS:V|IMPV|LEM:qaAla|ROOT:qwl|2MS"
">aEuw*u" "STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew*|1S"
"xalaqa" "STEM|POS:V|PERF|LEM:xalaqa|ROOT:xlq|3MS"
"waqaba" "STEM|POS:V|PERF|LEM:waqaba|ROOT:wqb|3MS"
"Hasada" "STEM|POS:V|PERF|LEM:Hasada|ROOT:Hsd|3MS"
"qulo" "STEM|POS:V|IMPV|LEM:qaAla|ROOT:qwl|2MS"
">aEuw*u" "STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew*|1S"
"yuwasowisu" "STEM|POS:V|IMPF|LEM:wasowasa|ROOT:wsws|3MS"
julia> # using parent type
tbl2 = filter(t -> isfeature(parse(Features, t.features), AbstractPerson), crpsdata.data);
julia> select(tbl2, (:form, :features))
Table with 44092 rows, 2 columns:
form features
───────────────────────────────────────────────────────────────
"<iy~aAka" "STEM|POS:PRON|LEM:<iy~aA|2MS"
"naEobudu" "STEM|POS:V|IMPF|LEM:Eabada|ROOT:Ebd|1P"
"<iy~aAka" "STEM|POS:PRON|LEM:<iy~aA|2MS"
"nasotaEiynu" "STEM|POS:V|IMPF|(X)|LEM:{sotaEiynu|ROOT:Ewn|1P"
"{hodi" "STEM|POS:V|IMPV|LEM:hadaY|ROOT:hdy|2MS"
"naA" "SUFFIX|PRON:1P"
">anoEamo" "STEM|POS:V|PERF|(IV)|LEM:>anoEama|ROOT:nEm|2MS"
"ta" "SUFFIX|PRON:2MS"
"himo" "SUFFIX|PRON:3MP"
⋮
"qulo" "STEM|POS:V|IMPV|LEM:qaAla|ROOT:qwl|2MS"
">aEuw*u" "STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew*|1S"
"xalaqa" "STEM|POS:V|PERF|LEM:xalaqa|ROOT:xlq|3MS"
"waqaba" "STEM|POS:V|PERF|LEM:waqaba|ROOT:wqb|3MS"
"Hasada" "STEM|POS:V|PERF|LEM:Hasada|ROOT:Hsd|3MS"
"qulo" "STEM|POS:V|IMPV|LEM:qaAla|ROOT:qwl|2MS"
">aEuw*u" "STEM|POS:V|IMPF|LEM:Eu*o|ROOT:Ew*|1S"
"yuwasowisu" "STEM|POS:V|IMPF|LEM:wasowasa|ROOT:wsws|3MS"
julia> sum(select(tbl1, :features) .!== select(tbl2, :features))
0
Part of Speech Types
Type | Parent Type | Tag | Description | Arabic Name |
Noun | AbstractNoun | Symbol("N") | Noun | اسم |
ProperNoun | AbstractNoun | Symbol("PN") | Proper noun | اسم علم |
Adjective | AbstractDerivedNominal | Symbol("ADJ") | Adjective | صفة |
ImperativeVerbalNoun | AbstractDerivedNominal | Symbol("IMPN") | Imperative verbal noun | اسم فعل أمر |
Personal | AbstractPronoun | Symbol("PRON") | Personal pronoun | ضمير |
Demonstrative | AbstractPronoun | Symbol("DEM") | Demonstrative pronoun | اسم اشارة |
Relative | AbstractPronoun | Symbol("REL") | Relative pronoun | اسم موصول |
Time | AbstractAdverb | Symbol("T") | Time adverb | ظرف زمان |
Location | AbstractAdverb | Symbol("LOC") | Location adverb | ظرف مكان |
Preposition | AbstractPreposition | Symbol("P") | Preposition | حرف جر |
EmphaticLam | AbstractPrefix | Symbol("EMPH") | Emphatic lam prefix | لام التوكيد |
ImperativeLam | AbstractPrefix | Symbol("IMPV") | Imperative lam prefix | لام الامر |
PurposeLam | AbstractPrefix | Symbol("PRP") | Purpose lam prefix | لام التعليل |
EmphaticNun | AbstractPrefix | Symbol("+n:EMPH") | Emphatic lam prefix | لام التوكيد |
Coordinating | AbstractConjunction | Symbol("CONJ") | Coordinating conjunction | حرف عطف |
Subordinating | AbstractConjunction | Symbol("SUB") | Subordinating particle | حرف مصدري |
Accusative | AbstractParticle | Symbol("ACC") | Accusative particle | حرف نصب |
Amendment | AbstractParticle | Symbol("AMD") | Amendment particle | حرف استدراك |
Answer | AbstractParticle | Symbol("ANS") | Answer particle | حرف جواب |
Aversion | AbstractParticle | Symbol("AVR") | Aversion particle | حرف ردع |
Cause | AbstractParticle | Symbol("CAUS") | Particle of cause | حرف سببية |
Certainty | AbstractParticle | Symbol("CERT") | Particle of certainty | حرف تحقيق |
Circumstantial | AbstractParticle | Symbol("CIRC") | Circumstantial particle | حرف حال |
Comitative | AbstractParticle | Symbol("COM") | Comitative particle | واو المعية |
Conditional | AbstractParticle | Symbol("COND") | Conditional particle | حرف شرط |
Equalization | AbstractParticle | Symbol("EQ") | Equalization particle | حرف تسوية |
Exhortation | AbstractParticle | Symbol("EXH") | Exhortation particle | حرف تحضيض |
Explanation | AbstractParticle | Symbol("EXL") | Explanation particle | حرف تفصيل |
Exceptive | AbstractParticle | Symbol("EXP") | Exceptive particle | أداة استثناء |
Future | AbstractParticle | Symbol("FUT") | Future particle | حرف استقبال |
Inceptive | AbstractParticle | Symbol("INC") | Inceptive particle | حرف ابتداء |
Interpretation | AbstractParticle | Symbol("INT") | Inceptive particle | حرف تفسير |
Interogative | AbstractParticle | Symbol("INTG") | Interogative particle | حرف استفهام |
Negative | AbstractParticle | Symbol("NEG") | Negative particle | حرف نفي |
Preventive | AbstractParticle | Symbol("PREV") | Preventive particle | حرف كاف |
Prohibition | AbstractParticle | Symbol("PRO") | Prohibition particle | حرف نهي |
Resumption | AbstractParticle | Symbol("REM") | Resumption particle | |
Restriction | AbstractParticle | Symbol("RES") | Restriction particle | أداة حصر |
Retraction | AbstractParticle | Symbol("RET") | Retraction particle | حرف اضراب |
Result | AbstractParticle | Symbol("RSLT") | Result particle | حرف واقع في جواب الشرط |
Supplemental | AbstractParticle | Symbol("SUP") | Suplemental particle | حرف زائد |
Surprise | AbstractParticle | Symbol("SUR") | Surprise particle | حرف فجاءة |
Vocative | AbstractParticle | Symbol("VOC") | Vocative particle | حرف نداء |
DisconnectedLetters | AbstractDisLetters | Symbol("INL") | Quranic initials | حروف مقطعة |
FirstPerson | AbstractPerson | Symbol("1") | First person | الاسناد |
SecondPerson | AbstractPerson | Symbol("2") | Second person | الاسناد |
ThirdPerson | AbstractPerson | Symbol("3") | Third person | الاسناد |
Masculine | AbstractGender | Symbol("M") | Masculine | الجنس |
Feminine | AbstractGender | Symbol("F") | Feminine | الجنس |
Singular | AbstractNumber | Symbol("S") | Singular | العدد |
Dual | AbstractNumber | Symbol("D") | Dual | العدد |
Plural | AbstractNumber | Symbol("P") | Plural | العدد |
Verb | AbstractPartOfSpeech | Symbol("V") | Verb | فعل |
Perfect | AbstractAspect | Symbol("PERF") | Perfect verb | فعل ماض |
Imperfect | AbstractAspect | Symbol("IMPF") | Imperfect verb | فعل مضارع |
Imperative | AbstractAspect | Symbol("IMPV") | Imperative verb | فعل أمر |
Indicative | AbstractMood | Symbol("IND") | Indicative mood (default) | مرفوع |
Subjunctive | AbstractMood | Symbol("SUBJ") | Subjunctive mood | منصوب |
Jussive | AbstractMood | Symbol("JUS") | Jussive mood | مجزوم |
Active | AbstractVoice | Symbol("ACT") | Active voice (default) | مبني للمعلوم |
Passive | AbstractVoice | Symbol("PASS") | Passive voice | مبني للمجهول |
VerbFormI | AbstractForm | Symbol("I") | First verb form (default) | فعل |
VerbFormII | AbstractForm | Symbol("II") | Second verb form | فعل |
VerbFormIII | AbstractForm | Symbol("III") | Third verb form | فعل |
VerbFormIV | AbstractForm | Symbol("IV") | Fourth verb form | فعل |
VerbFormV | AbstractForm | Symbol("V") | Fifth verb form | فعل |
VerbFormVI | AbstractForm | Symbol("VI") | Sixth verb form | فعل |
VerbFormVII | AbstractForm | Symbol("VII") | Seventh verb form | فعل |
VerbFormVIII | AbstractForm | Symbol("VIII") | Eighth verb form | فعل |
VerbFormIX | AbstractForm | Symbol("IX") | Ninth verb form | فعل |
VerbFormX | AbstractForm | Symbol("X") | Tenth verb form | فعل |
VerbFormXI | AbstractForm | Symbol("XI") | Eleventh verb form | فعل |
VerbFormXII | AbstractForm | Symbol("XII") | Twelfth verb form | فعل |
ActiveParticle | AbstractDerivedNoun | Symbol("ACT PCPL") | Active particle | اسم فاعل |
PassiveParticle | AbstractDerivedNoun | Symbol("PASS PCPL") | Passive particle | اسم مفعول |
VerbalNoun | AbstractDerivedNoun | Symbol("VN") | Verbal noun | مصدر |
Definite | AbstractState | Symbol("DEF") | Definite state | معرفة |
Indefinite | AbstractState | Symbol("INDEF") | Indefinite state | نكرة |
Nominative | AbstractCase | Symbol("NOM") | Nominative case | مرفوع |
Genetive | AbstractCase | Symbol("GEN") | Genetive case | مجرور |