Arabic Disambiguation

In this section, we are going to apply a model, estimated from Maximum Likelihood Estimation (MLE), for disambiguating Arabic texts with no diacritics. As always, load the data as follows:

julia> using QuranTree

julia> crps, tnzl = load(QuranData());

julia> crpsdata = table(crps);

julia> tnzldata = table(tnzl);

For this task, we are going to use the last verse of Chapter 1.

julia> avrs1 = verses(tnzldata[1][7])[1]
"صِرَٰطَ ٱلَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ ٱلْمَغْضُوبِ عَلَيْهِمْ وَلَا ٱلضَّآلِّينَ"

Of course, the input needs to have no diacritics and so:

julia> avrs1 = avrs1 |> dediac
"صرٰط ٱلذين أنعمت عليهم غير ٱلمغضوب عليهم ولا ٱلضالين"

Inferring

To infer the diacritics then, run the following:

julia> using Pkg

julia> Pkg.add("PyCall")
  Resolving package versions...
No Changes to `~/.julia/packages/QuranTree/JFGph/docs/Project.toml`
No Changes to `~/.julia/packages/QuranTree/JFGph/docs/Manifest.toml`

julia> using PyCall

julia> @pyimport camel_tools.disambig.mle as camel_disambig
ERROR: PyError (PyImport_ImportModule

The Python package camel_tools.disambig.mle could not be imported by pyimport. Usually this means
that you did not install camel_tools.disambig.mle in the Python version being used by PyCall.

PyCall is currently configured to use the Julia-specific Python distribution
installed by the Conda.jl package.  To install the camel_tools.disambig.mle module, you can
use `pyimport_conda("camel_tools.disambig.mle", PKG)`, where PKG is the Anaconda
package the contains the module camel_tools.disambig.mle, or alternatively you can use the
Conda package directly (via `using Conda` followed by `Conda.add` etcetera).

Alternatively, if you want to use a different Python distribution on your
system, such as a system-wide Python (as opposed to the Julia-specific Python),
you can re-configure PyCall with that Python.   As explained in the PyCall
documentation, set ENV["PYTHON"] to the path/name of the python executable
you want to use, run Pkg.build("PyCall"), and re-launch Julia.

) <class 'ModuleNotFoundError'>
ModuleNotFoundError("No module named 'camel_tools'")

julia> mled = camel_disambig.MLEDisambiguator.pretrained()
ERROR: UndefVarError: camel_disambig not defined

julia> disambig = mled.disambiguate(split(avrs1))
ERROR: UndefVarError: mled not defined

Extracting Diacritized Output

Finally, tying up all diacritized output:

julia> join([d[2][1][2]["diac"] for d in disambig], " ")
ERROR: UndefVarError: disambig not defined