CitableParserBuilder.AbbreviatedUrn
— TypeShort form of a Cite2Urn containing only collection and object ID.
CitableParserBuilder.AnalysesCex
— TypeValue for CexTrait
CitableParserBuilder.Analysis
— TypeCitable analysis of a string value.
An Analysis
has seven members: string values for the orthographic token and the morphological token, four abbreviated URNs, one each for the lexeme, form, rule and stem, and a sequence ID for the morphological token.
CitableParserBuilder.AnalysisCex
— TypeDefine CexTrait for Analysis type.
CitableParserBuilder.AnalyzedToken
— TypeMorphological analyses for a token identified by CTS URN.
CitableParserBuilder.AnalyzedTokenCollection
— TypeA collection of analyzed tokens.
CitableParserBuilder.CexAnalyzedToken
— TypeValue for CexTrait for AnalyzedToken
CitableParserBuilder.CitableAnalyses
— TypeValue for CitableTrait.
CitableParserBuilder.CitableByAnalysis
— TypeValue for CitableTrait.
CitableParserBuilder.DFParser
— TypeA parser parsing tokens by looking them up in a precomputed data frame of all recognized forms.
CitableParserBuilder.DFParser
— MethodCreate a dataframe-backed parser from a string-backed parser.
DFParser(sp)
CitableParserBuilder.DFParserCex
— TypeSerializable trait for DF Parser.
CitableParserBuilder.FormUrn
— TypeAbbreviated URN for a morphological form.
CitableParserBuilder.GettysburgParser
— TypePOS tagger keyed to the text of the Gettysburg address. data
is a dictionary of tokens to form POS tag.
CitableParserBuilder.LexemeUrn
— TypeAbbreviated URN for a lexeme.
CitableParserBuilder.Rule
— TypeSupertype of all concrete Rule structures.
CitableParserBuilder.RuleUrn
— TypeAbbreviated URN for rule.
CitableParserBuilder.Stem
— TypeSupertype of all concrete Stem structures.
CitableParserBuilder.StemUrn
— TypeAbbreviated URN for a morphological stem.
CitableParserBuilder.StringParser
— TypeA parser parsing tokens by looking them up in a precomputed dictionary of all recognized forms.
CitableParserBuilder.StringParser
— MethodConstruct a string-backed parser from a dataframe-backed parser.
StringParser(dfp; delim)
Base.:==
— MethodOverride Base.==
for Analysis
.
==(a1, a2)
Base.:==
— MethodOverride Base.==
for AnalyzedToken
.
==(atoken1, atoken2)
Base.:==
— MethodOverride Base.==
for AnalyzedTokenCollection
.
==(at1, at2)
Base.:==
— MethodOverride Base.==
for AbbreviatedUrn
.
==(au1, au2)
Base.eltype
— MethodImplement base element type for AnalyzedTokenCollection
.
eltype(atc)
Base.iterate
— MethodImplement iteration with state for AnalyzedTokenCollection
.
iterate(atc, state)
Base.iterate
— MethodImplement iteration for AnalyzedTokenCollection
.
iterate(atc)
Base.objectid
— MethodDefault implementation of function to find the object identifier of AbbreviatedUrn
.
objectid(au)
Base.show
— MethodOverride Base.show for AnalyzedTokenCollection
.
show(io, atc)
Base.show
— MethodOverride Base.show for AnalyzedToken
.
show(io, atoken)
Base.show
— MethodOverride Base.show
for AbbreviatedUrn
.
show(io, au)
CitableBase.cex
— MethodImplementation of cex function for an Analysis.
cex(a; delim, registry)
CitableBase.cex
— MethodFormat an AnalyzedTokenCollection
collection as a delimited-text string.
cex(atc; delimiter, registry)
Required function for Citable
abstraction.
CitableBase.cex
— MethodSerialize an AnalyzedToken
as delimited text (required for Citable
interface).
cex(at; delimiter, registry)
Uses abbreviated URNs. These can be expanded to full CITE2 URNs when read back with a URN registry, or the delimited
function can be used with a URN registry to write full CITE2 URNs.
CitableBase.cex
— MethodCompose delimited text string for a DFParser.
cex(dfp; delimiter)
CitableBase.cextrait
— MethodDefineCexTrait
value for AnalyzedTokenCollection
.
cextrait(_)
CitableBase.cextrait
— MethodDefineCexTrait
value for CitableToken
.
cextrait(_)
CitableBase.cextrait
— MethodGet serializable trait for DFParser type.
cextrait(_)
CitableBase.citablecollectiontrait
— MethodDefineCitableTrait
value for AnalyzedTokenCollection
.
citablecollectiontrait(_)
CitableBase.citabletrait
— MethodDefineCitableTrait
value for CitableToken
.
citabletrait(_)
CitableBase.fromcex
— MethodParse a delimited-text string into an AnalyzedTokenCollection
collection.
fromcex(trait, s, ; delimiter, configuration, strict)
CitableBase.fromcex
— MethodImplementation of fromcex function for an Analysis.
fromcex(
traitvalue,
cexsrc,
T;
delimiter,
configuration,
strict
)
CitableBase.fromcex
— MethodParse a one-line delimited-text representation into an AnalyzedToken
, using abbreviated URNs for identifiers. Note that for a sigle CEX line, the AnalyzedToken
will have a single Analysis
in its vector of analyses.
fromcex(
traitvalue,
cexsrc,
T;
delimiter,
configuration,
strict
)
CitableBase.fromcex
— MethodCreate a DFParser from a delimited-text source.
fromcex(trait, cexsrc, T; delimiter, configuration, strict)
CitableBase.label
— MethodLabel for analyses
.
label(atc)
Required function for Citable
abstraction.
CitableBase.label
— MethodLabel for AnalyzedToken
(required for Citable
interface).
label(at)
CitableBase.urn
— MethodUnique identifier for AnalyzedToken
(required for Citable
interface).
urn(at)
CitableBase.urntype
— MethodTypeof URN identifying analyses in an an AnalyzedTokenCollection
collection.
urntype(analyses)
Required function for Citable
abstraction.
CitableBase.urntype
— MethodIdentify URN type for an AnalyzedToken
as CtsUrn
.
urntype(at)
Required function for Citable
abstraction.
CitableCorpus.text
— MethodGet URN for analysis.
text(at)
CitableParserBuilder.abbreviate
— MethodConstructs an AbbreviatedUrn
string from a Cite2Urn
.
abbreviate(urn)
Example:
julia> abbreviate(Cite2Urn("urn:cite2:kanones:lsj.v1:n123"))
"lsj.n123"
Example: a pipeline abbreviating a Cite2Urn
and forming a LexemeUrn
from the abbreviated string value.
julia> Cite2Urn("urn:cite2:kanones:lsj.v1:n123") |> abbreviate |> LexemeUrn
LexemeUrn("lsj", "n123")
CitableParserBuilder.collection
— MethodDefault implementation of function to find the collection value of an AbbreviatedUrn
.
collection(au)
CitableParserBuilder.coverage
— MethodCompute pct of words in list of words analyzed by parser.
coverage(vocablist, p; data)
CitableParserBuilder.coverage
— MethodCompute pct of tokens in corpus analyzed by parser.
coverage(tokencorpus, p; data)
CitableParserBuilder.ctoken
— MethodGet citable token for analysis.
ctoken(at)
CitableParserBuilder.datasource
— MethodGet DataFrame
object backing the dataframe parser.
datasource(dfp)
CitableParserBuilder.dfParser
— FunctionCreate a DFParser
from delimited text file.
dfParser(delimitedfile; ...)
dfParser(delimitedfile, ortho; delimiter)
CitableParserBuilder.expand
— MethodConstructs a Cite2Urn
from an AbbreviatedUrn
and a dictionary mapping collection identifiers in AbbreviatedUrns's to full Cite2Urn
s for a versioned collection.
CitableParserBuilder.flatpairs
— MethodFlatten a Vector of AnalyzedToken
s into passage+anlaysis pairs.
flatpairs(v)
CitableParserBuilder.formal_ambiguity
— MethodCompute morphological ambiguity in list of words analyzed by parser.
formal_ambiguity(vocablist, p; data)
CitableParserBuilder.formal_ambiguity
— MethodCompute morphological ambiguity in corpus analyzed by parser.
formal_ambiguity(c, p; data)
CitableParserBuilder.formal_frequencies
— MethodCompute frequencies of forms in corpus analyzed by parser.
formal_frequencies(c, p; data)
CitableParserBuilder.formally_ambiguous
— MethodTrue if atkn
can be analyzed to more than one form.
formally_ambiguous(atkn)
CitableParserBuilder.formally_ambiguous
— MethodTrue if Analysis
items in a vector of analyses identify more than one form.
formally_ambiguous(vect)
CitableParserBuilder.forms
— MethodExtract a list of forms from a Vector of Analysis
objects.
forms(v)
CitableParserBuilder.forms
— MethodExtract a list of forms from a Vector of AnalyzedToken
objects.
forms(v)
CitableParserBuilder.forms
— MethodExtract a list of forms from an AnalyzedTokenCollection
object.
forms(atokens)
CitableParserBuilder.formurn
— MethodIdentify morphlogical form identifed in analysis.
formurn(a)
CitableParserBuilder.fstsafe
— MethodCompose SFST representation of an AbbreviatedUrn
.
fstsafe(au)
Example:
julia> LexemeUrn("lexicon.lex123") |> fstsafe
"<u>lexicon\.lex123</u>"
CitableParserBuilder.generate
— MethodGenerate all possible morphological analyses for a given lexeme and form.
generate(lex, mform, parser)
CitableParserBuilder.generate
— MethodGenerate all possible morphological analyses for a given lexeme and form.
generate(lex, mform, parser; delim)
CitableParserBuilder.generate
— MethodCatch failure to implement generate
function for a subtype of CitableParser
.
generate(lex, mform, p)
CitableParserBuilder.gettysburgParser
— MethodInstatiate GettysburgParser
as a dfParser from a local file source.
gettysburgParser(repo; delimiter)
CitableParserBuilder.gettysburgParser
— MethodInstantiate a GettysburgParser
as a dfParser by downloading source dictionary over the internet.
CitableParserBuilder.id
— MethodFunction required to get ID value of a Rule implementation.
CitableParserBuilder.id
— MethodFunction required to get ID value of a Stem implementation.
CitableParserBuilder.inflectiontype
— MethodFunction required to get string value for inflection class of a Rule implementation.
CitableParserBuilder.inflectiontype
— MethodFunction required to get string value for inflection class of a Stem implementation.
CitableParserBuilder.lexeme
— MethodFunction required to get lexeme value of a Stem implementation.
CitableParserBuilder.lexemedictionary
— MethodFrom a vector of AnalyzedToken
s and an index of tokens in a corpus, construct a dictionary keyed by lexemes, mapping to a further dictionary of surface forms to passages.
lexemedictionary(parses, tokenindex)
CitableParserBuilder.lexemehisto
— MethodCompute histogram of lexemes in AnalyzedTokenCollection
.
lexemehisto(parses)
All distinct lexemes for a token are counted; there is no weighting of counts for lexically ambiguous tokens.
CitableParserBuilder.lexemes
— MethodExtract a list of lexemes from a Vector of Analysis
objects.
lexemes(v)
CitableParserBuilder.lexemes
— MethodExtract a list of lexemes from a Vector of AnalyzedToken
objects.
lexemes(v)
CitableParserBuilder.lexemes
— MethodExtract a list of lexemes from an AnalyzedTokenCollection
object.
lexemes(atokens)
CitableParserBuilder.lexemeurn
— MethodIdentify lexeme identifed in analysis.
lexemeurn(a)
CitableParserBuilder.lexical_ambiguity
— MethodCompute lexical ambiguity in list of words analyzed by parser.
lexical_ambiguity(vocablist, p; data)
CitableParserBuilder.lexical_ambiguity
— MethodCompute lexical ambiguity in corpus analyzed by parser.
lexical_ambiguity(c, p; data)
CitableParserBuilder.lexical_frequencies
— MethodCompute frequencies of lexemes in corpus analyzed by parser.
lexical_frequencies(c, p; data)
CitableParserBuilder.lexically_ambiguous
— MethodTrue if atkn
can be analyzed to more than one lexeme.
lexically_ambiguous(atkn)
CitableParserBuilder.lexically_ambiguous
— MethodTrue if Analysis
items in a vector of analyses identify more than one lexeme.
lexically_ambiguous(vect)
CitableParserBuilder.mtoken
— MethodIdentify morphologocal token identified in analysis.
mtoken(a)
CitableParserBuilder.mtokenid
— MethodIdentify morphological token identified in analysis.
mtokenid(a)
CitableParserBuilder.no_id
— MethodTrue if any element in stringlist is empty.
CitableParserBuilder.orthography
— MethodGet orthographic system for a dataframe parser.
orthography(dfp)
CitableParserBuilder.orthography
— MethodCatch failure to implement orthography
function for a subtype of CitableParser
.
orthography(p)
CitableParserBuilder.parsecorpus
— MethodUse a CitableParser
to parse a CitableTextCorpus
with each citable node containing containg a single token of type LexicalToken
.
parsecorpus(c, p; data, countinterval)
Returns anAnalyzedTokenCollection
object.
CitableParserBuilder.parselist
— MethodParse a list of tokens with a CitableParser
.
parselist(vocablist, p; countinterval)
Returns a Dict mapping strings to a (possibly empty) vector of Analysis
objects. Blank lines in input are silently ignored.
CitableParserBuilder.parselist
— MethodRead a list of tokens from file f
and parse with p
.
parselist(f, p, reader; countinterval)
Returns a Dict mapping strings to a (possibly empty) vector of Analysis
objects.
CitableParserBuilder.parselist
— MethodRead a list of tokens from URL u
and parse with p
.
parselist(u, p, reader; countinterval)
Returns a Dict mapping strings to a (possibly empty) vector of Analysis
objects.
CitableParserBuilder.parsepassage
— MethodParse a CitablePassage
with text for a single token with a CitableParser
.
parsepassage(cn, p; data)
Returns a single AnalyzedToken
.
CitableParserBuilder.parsepassage
— MethodParse a CitablePassage
with text for a single token with a CitableParser
.
parsepassage(ct, p; data)
Returns a single AnalyzedToken
.
CitableParserBuilder.parsetoken
— MethodParse a single token using parser
.
parsetoken(s, parser)
CitableParserBuilder.passagesforlexeme
— MethodFind URNs for all tokens in a vector of AnalyzedToken
s parsed to a given lexeme.
passagesforlexeme(v, lexstr)
CitableParserBuilder.protectreserved
— MethodEscape characters reserved for SFST syntax.
protectreserved(s)
CitableParserBuilder.readfst
— MethodRead SFST output from file f
, and parse into a dictionary keying tokens to a (possibly empty) array of SFST strings.
readfst(f)
CitableParserBuilder.relationsblock
— FunctionCompose a CEX relationset
block for a set of analyses.
relationsblock(urn, label, v; ...)
relationsblock(urn, label, v, delim; registry)
CitableParserBuilder.rules
— MethodExtract a list of rules from a Vector of Analysis
objects.
rules(v)
CitableParserBuilder.rules
— MethodExtract a list of rules from a Vector of AnalyzedToken
objects.
rules(v)
CitableParserBuilder.rules
— MethodExtract a list of rules from an AnalyzedTokenCollection
object.
rules(atokens)
CitableParserBuilder.ruleurn
— MethodIdentify inflectional rule identifed in analysis.
ruleurn(a)
CitableParserBuilder.stems
— MethodExtract a list of stems from a Vector of Analysis
objects.
stems(v)
CitableParserBuilder.stems
— MethodExtract a list of stems from a Vector of AnalyzedToken
objects.
stems(v)
CitableParserBuilder.stems
— MethodExtract a list of stems from an AnalyzedTokenCollection
object.
stems(atokens)
CitableParserBuilder.stemurn
— MethodIdentify lexical stem identifed in analysis.
stemurn(a)
CitableParserBuilder.stringParser
— FunctionConstruct a string-backed parser from a dataframe.
stringParser(df)
stringParser(df, ortho)
stringParser(df, ortho, delim)
CitableParserBuilder.stringParser
— MethodInstantiate a StringParser
from a set of analyses read from a file.
stringParser(f, freader; o, delim)
CitableParserBuilder.stringParser
— MethodInstantiate a StringParser
from a set of analyses read from a string.
stringParser(s, freader; ortho, delim)
CitableParserBuilder.stringParser
— MethodInstantiate a StringParser
from a set of analyses read from a URL.
stringParser(u, ureader; o, delim)
CitableParserBuilder.stringsforlexeme
— MethodFind token string values for all tokens in a vector of AnalyzedToken
s parsed to a given lexeme.
stringsforlexeme(v, lexstr)
CitableParserBuilder.tofile
— MethodWrite dataframe parser to a delimited file.
tofile(dfp, outfile; delimiter)
CitableParserBuilder.token
— MethodIdentify orthographic token analyzed.
token(a)
CitableParserBuilder.tokens
— MethodExtract a list of string token values from a Vector of Analysis
objects.
tokens(v)
CitableParserBuilder.tokens
— MethodExtract a list of string token values from a Vector of AnalyzedToken
objects.
tokens(v)
CitableParserBuilder.tokens
— MethodExtract a list of string token values from an AnalyzedTokenCollection
object.
tokens(atokens)