Transliteration

QuranTree.jl uses Buckwalter as the default transliteration, which is based on the Quranic Arabic Corpus encoding. The transliteration is written as encode function, for example, the following will transliterate the first verse of Chapter 1:

julia> using QuranTree

julia> crps, tnzl = load(QuranData());

julia> crpsdata = table(crps);

julia> tnzldata = table(tnzl);

julia> vrs = verses(tnzldata[1][1])
1-element Array{String,1}:
 "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"

julia> encode(vrs[1])
"bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi"

The verses function above is used to extract the corresponding verse from the Qur'an data of type AbstractQuran.

Tips

verses by default only returns the verse form of the table, but one can also extract the corresponding verse number instead of the form, example:

verses(tnzldata, number=true, start_end=true)
verses(tnzldata, number=true, start_end=false)
Tips

To extract the words of the corpus, use the function words instead.

The function verses always returns an Array, and hence encoding multiple verses is possible using Julia's . (dot) broadcasting operation. For example, the following will transliterate all verses of Chapter 114:

julia> vrs = verses(tnzldata[114])
6-element WeakRefStrings.StringArray{String,1}:
 "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"

julia> encode.(vrs)
6-element Array{String,1}:
 "bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi qulo >aEuw*u birab~i {ln~aAsi"
 "maliki {ln~aAsi"
 "<ila`hi {ln~aAsi"
 "min \$ar~i {lowasowaAsi {loxan~aAsi"
 "{l~a*iY yuwasowisu fiY Suduwri {ln~aAsi"
 "mina {lojin~api wa{ln~aAsi"

Decoding

To decode the transliterated back to Arabic form, use the arabic function. For example, the following will decode to Arabic the transliterated verses of Chapter 114 above:

julia> arabic.(encode.(vrs))
6-element Array{String,1}:
 "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"

Or using the CorpusData,

julia> vrs = verses(crpsdata[114])
6-element Array{String,1}:
 "qulo >aEuw*u birab~i {ln~aAsi"
 "maliki {ln~aAsi"
 "<ila`hi {ln~aAsi"
 "min \$ar~i {lowasowaAsi {loxan~aAsi"
 "{l~a*iY yuwasowisu fiY Suduwri {ln~aAsi"
 "mina {lojin~api wa{ln~aAsi"

julia> avrs = arabic.(vrs)
6-element Array{String,1}:
 "قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"
Tips

. (dot) broadcasting is only used for arrays. So, for String input (not arrays of String), arabic(...) (without dot) is used. Example,

arabic(vrs[1])

Custom Transliteration

Creating a custom transliteration requires only an input encoding in the form of a dictionary (Dict). For example, QuranTree.jl's Buckwalter's encoding is provided by the constant BW_ENCODING as shown below:

julia> BW_ENCODING
Dict{Symbol,Symbol} with 61 entries:
  :ذ          => :*
  :ء          => Symbol("'")
  Symbol("ۜ") => :(:)
  Symbol("َ") => :a
  Symbol("ٰ") => Symbol("`")
  :ي          => :y
  :ن          => :n
  :ب          => :b
  :ص          => :S
  :ا          => :A
  :ى          => :Y
  Symbol("۫") => :+
  :ؤ          => :&
  Symbol("۟") => Symbol("@")
  Symbol("ْ") => :o
  :س          => :s
  :ۦ          => :.
  :و          => :w
  Symbol("ً") => :F
  ⋮           => ⋮

Suppose, we want to create a new transliteration by simply reversing the values of the dictionary. This is done as follows:

julia> old_keys = collect(keys(BW_ENCODING));

julia> new_vals = reverse(collect(values(BW_ENCODING)));

julia> my_encoder = Dict(old_keys .=> new_vals)
Dict{Symbol,Symbol} with 61 entries:
  :ذ          => :q
  :ء          => Symbol("(")
  Symbol("ۜ") => :f
  Symbol("َ") => Symbol("[")
  Symbol("ٰ") => :r
  :ي          => :k
  :ن          => Symbol("]")
  :ى          => :_
  :ب          => Symbol("\"")
  :ص          => :i
  :ا          => :m
  Symbol("۫") => :>
  :ؤ          => :H
  Symbol("۟") => :T
  Symbol("ْ") => :g
  :س          => :%
  :ۦ          => :$
  :و          => :-
  Symbol("ً") => :N
  ⋮           => ⋮

julia> @transliterator my_encoder "MyEncoder"

The macro @transliterator is used for updating the transliteration, and it takes two inputs: the dictionary (my_encoder) and the name of the encoding ("MyEncoder"). Using this new encoding, the avrs above will have a new transliteration:

julia> new_vrs = encode.(avrs);

julia> new_vrs
6-element Array{String,1}:
 "*,pg +[!,-q, \"S`[\"jS zp]j[m%S"
 "A[pSyS zp]j[m%S"
 "ZSp[rKS zp]j[m%S"
 "AS] .[`jS zpg-[%g-[m%S zpg}[]j[m%S"
 "zpj[qS_ k,-[%g-S%, :S_ i,^,-`S zp]j[m%S"
 "AS][ zpg~S]j[lS -[zp]j[m%S"

To confirm this new transliteration, decoding it back to arabic should generate the proper results:

julia> arabic.(new_vrs)
6-element Array{String,1}:
 "قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"

To reset the transliteration, simply run the following:

julia> @transliterator :default

This will fallback to the Buckwalter transliteration, as shown below:

julia> bw_vrs = encode.(avrs);

julia> bw_vrs
6-element Array{String,1}:
 "qulo >aEuw*u birab~i {ln~aAsi"
 "maliki {ln~aAsi"
 "<ila`hi {ln~aAsi"
 "min \$ar~i {lowasowaAsi {loxan~aAsi"
 "{l~a*iY yuwasowisu fiY Suduwri {ln~aAsi"
 "mina {lojin~api wa{ln~aAsi"

julia> arabic.(bw_vrs)
6-element Array{String,1}:
 "قُلْ أَعُوذُ بِرَبِّ ٱلنَّاسِ"
 "مَلِكِ ٱلنَّاسِ"
 "إِلَٰهِ ٱلنَّاسِ"
 "مِن شَرِّ ٱلْوَسْوَاسِ ٱلْخَنَّاسِ"
 "ٱلَّذِى يُوَسْوِسُ فِى صُدُورِ ٱلنَّاسِ"
 "مِنَ ٱلْجِنَّةِ وَٱلنَّاسِ"

Simple Encoding

Another feature supported in QuranTree.jl is the Simple Encoding. For example, the following will (Simple) encode the first verse of Chapter 1:

julia> vrs = verses(tnzldata[1][1])
1-element Array{String,1}:
 "بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ"

julia> encode(SimpleEncoder, vrs[1])
"Ba+Kasra | Seen+Sukun | Meem+Kasra | <space> | HamzatWasl | Lam | Lam+Shadda+Fatha | Ha+Kasra | <space> | HamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Sukun | Meem+Fatha | AlifKhanjareeya | Noon+Kasra | <space> | HamzatWasl | Lam | Ra+Shadda+Fatha | HHa+Kasra | Ya | Meem+Kasra"
Tips

For verses 1 to 4 of Chapter 114, use the broadcasting operator:

vrs = verses(tnzldata[114][1:4])
encode.(SimpleEncoder, vrs)