Docstrings · CitableParserBuilder.jl

CitableParserBuilder.AbbreviatedUrn — Type

Short form of a Cite2Urn containing only collection and object ID.

CitableParserBuilder.AnalysesCex — Type

Value for CexTrait

CitableParserBuilder.Analysis — Type

Citable analysis of a string value.

An Analysis has seven members: string values for the orthographic token and the morphological token, four abbreviated URNs, one each for the lexeme, form, rule and stem, and a sequence ID for the morphological token.

CitableParserBuilder.AnalysisCex — Type

Define CexTrait for Analysis type.

CitableParserBuilder.AnalyzedToken — Type

Morphological analyses for a token identified by CTS URN.

CitableParserBuilder.AnalyzedTokenCollection — Type

A collection of analyzed tokens.

CitableParserBuilder.CexAnalyzedToken — Type

Value for CexTrait for AnalyzedToken

CitableParserBuilder.CitableAnalyses — Type

Value for CitableTrait.

CitableParserBuilder.CitableByAnalysis — Type

Value for CitableTrait.

CitableParserBuilder.DFParser — Type

A parser parsing tokens by looking them up in a precomputed data frame of all recognized forms.

CitableParserBuilder.DFParser — Method

Create a dataframe-backed parser from a string-backed parser.

DFParser(sp)

CitableParserBuilder.DFParserCex — Type

Serializable trait for DF Parser.

CitableParserBuilder.FormUrn — Type

Abbreviated URN for a morphological form.

CitableParserBuilder.GettysburgParser — Type

POS tagger keyed to the text of the Gettysburg address. data is a dictionary of tokens to form POS tag.

CitableParserBuilder.LexemeUrn — Type

Abbreviated URN for a lexeme.

CitableParserBuilder.Rule — Type

Supertype of all concrete Rule structures.

CitableParserBuilder.RuleUrn — Type

Abbreviated URN for rule.

CitableParserBuilder.Stem — Type

Supertype of all concrete Stem structures.

CitableParserBuilder.StemUrn — Type

Abbreviated URN for a morphological stem.

CitableParserBuilder.StringParser — Type

A parser parsing tokens by looking them up in a precomputed dictionary of all recognized forms.

CitableParserBuilder.StringParser — Method

Construct a string-backed parser from a dataframe-backed parser.

StringParser(dfp; delim)

Base.:== — Method

Override Base.== for Analysis.

==(a1, a2)

Base.:== — Method

Override Base.== for AnalyzedToken.

==(atoken1, atoken2)

Base.:== — Method

Override Base.== for AnalyzedTokenCollection.

==(at1, at2)

Base.:== — Method

Override Base.== for AbbreviatedUrn.

==(au1, au2)

Base.eltype — Method

Implement base element type for AnalyzedTokenCollection.

eltype(atc)

Base.iterate — Method

Implement iteration with state for AnalyzedTokenCollection.

iterate(atc, state)

Base.iterate — Method

Implement iteration for AnalyzedTokenCollection.

iterate(atc)

Base.objectid — Method

Default implementation of function to find the object identifier of AbbreviatedUrn.

objectid(au)

Base.show — Method

Override Base.show for AnalyzedTokenCollection.

show(io, atc)

Base.show — Method

Override Base.show for AnalyzedToken.

show(io, atoken)

Base.show — Method

Override Base.show for AbbreviatedUrn.

show(io, au)

CitableBase.cex — Method

Implementation of cex function for an Analysis.

cex(a; delim, registry)

CitableBase.cex — Method

Format an AnalyzedTokenCollection collection as a delimited-text string.

cex(atc; delimiter, registry)

Required function for Citable abstraction.

CitableBase.cex — Method

Serialize an AnalyzedToken as delimited text (required for Citable interface).

cex(at; delimiter, registry)

Uses abbreviated URNs. These can be expanded to full CITE2 URNs when read back with a URN registry, or the delimited function can be used with a URN registry to write full CITE2 URNs.

CitableBase.cex — Method

Compose delimited text string for a DFParser.

cex(dfp; delimiter)

CitableBase.cextrait — Method

DefineCexTrait value for AnalyzedTokenCollection.

cextrait(_)

CitableBase.cextrait — Method

DefineCexTrait value for CitableToken.

cextrait(_)

CitableBase.cextrait — Method

Get serializable trait for DFParser type.

cextrait(_)

CitableBase.citablecollectiontrait — Method

DefineCitableTrait value for AnalyzedTokenCollection.

citablecollectiontrait(_)

CitableBase.citabletrait — Method

DefineCitableTrait value for CitableToken.

citabletrait(_)

CitableBase.fromcex — Method

Parse a delimited-text string into an AnalyzedTokenCollection collection.

fromcex(trait, s, ; delimiter, configuration, strict)

CitableBase.fromcex — Method

Implementation of fromcex function for an Analysis.

fromcex(
    traitvalue,
    cexsrc,
    T;
    delimiter,
    configuration,
    strict
)

CitableBase.fromcex — Method

Parse a one-line delimited-text representation into an AnalyzedToken, using abbreviated URNs for identifiers. Note that for a sigle CEX line, the AnalyzedToken will have a single Analysis in its vector of analyses.

fromcex(
    traitvalue,
    cexsrc,
    T;
    delimiter,
    configuration,
    strict
)

CitableBase.fromcex — Method

Create a DFParser from a delimited-text source.

fromcex(trait, cexsrc, T; delimiter, configuration, strict)

CitableBase.label — Method

Label for analyses.

label(atc)

Required function for Citable abstraction.

CitableBase.label — Method

Label for AnalyzedToken (required for Citable interface).

label(at)

CitableBase.urn — Method

Unique identifier for AnalyzedToken (required for Citable interface).

urn(at)

CitableBase.urntype — Method

Typeof URN identifying analyses in an an AnalyzedTokenCollection collection.

urntype(analyses)

Required function for Citable abstraction.

CitableBase.urntype — Method

Identify URN type for an AnalyzedToken as CtsUrn.

urntype(at)

Required function for Citable abstraction.

CitableCorpus.text — Method

Get URN for analysis.

text(at)

CitableParserBuilder.abbreviate — Method

Constructs an AbbreviatedUrn string from a Cite2Urn.

abbreviate(urn)

Example:

julia> abbreviate(Cite2Urn("urn:cite2:kanones:lsj.v1:n123"))
"lsj.n123"

Example: a pipeline abbreviating a Cite2Urn and forming a LexemeUrn from the abbreviated string value.

julia> Cite2Urn("urn:cite2:kanones:lsj.v1:n123") |> abbreviate |> LexemeUrn
LexemeUrn("lsj", "n123")

CitableParserBuilder.collection — Method

Default implementation of function to find the collection value of an AbbreviatedUrn.

collection(au)

CitableParserBuilder.coverage — Method

Compute pct of words in list of words analyzed by parser.

coverage(vocablist, p; data)

CitableParserBuilder.coverage — Method

Compute pct of tokens in corpus analyzed by parser.

coverage(tokencorpus, p; data)

CitableParserBuilder.ctoken — Method

Get citable token for analysis.

ctoken(at)

CitableParserBuilder.datasource — Method

Get DataFrame object backing the dataframe parser.

datasource(dfp)

CitableParserBuilder.dfParser — Function

Create a DFParser from delimited text file.

dfParser(delimitedfile; ...)
dfParser(delimitedfile, ortho; delimiter)

CitableParserBuilder.expand — Method

Constructs a Cite2Urn from an AbbreviatedUrn and a dictionary mapping collection identifiers in AbbreviatedUrns's to full Cite2Urns for a versioned collection.

CitableParserBuilder.flatpairs — Method

Flatten a Vector of AnalyzedTokens into passage+anlaysis pairs.

flatpairs(v)

CitableParserBuilder.formal_ambiguity — Method

Compute morphological ambiguity in list of words analyzed by parser.

formal_ambiguity(vocablist, p; data)

CitableParserBuilder.formal_ambiguity — Method

Compute morphological ambiguity in corpus analyzed by parser.

formal_ambiguity(c, p; data)

CitableParserBuilder.formal_frequencies — Method

Compute frequencies of forms in corpus analyzed by parser.

formal_frequencies(c, p; data)

CitableParserBuilder.formally_ambiguous — Method

True if atkn can be analyzed to more than one form.

formally_ambiguous(atkn)

CitableParserBuilder.formally_ambiguous — Method

True if Analysis items in a vector of analyses identify more than one form.

formally_ambiguous(vect)

CitableParserBuilder.forms — Method

Extract a list of forms from a Vector of Analysis objects.

forms(v)

CitableParserBuilder.forms — Method

Extract a list of forms from a Vector of AnalyzedToken objects.

forms(v)

CitableParserBuilder.forms — Method

Extract a list of forms from an AnalyzedTokenCollection object.

forms(atokens)

CitableParserBuilder.formurn — Method

Identify morphlogical form identifed in analysis.

formurn(a)

CitableParserBuilder.fstsafe — Method

Compose SFST representation of an AbbreviatedUrn.

fstsafe(au)

Example:

julia> LexemeUrn("lexicon.lex123") |> fstsafe
"<u>lexicon\.lex123</u>"

CitableParserBuilder.generate — Method

Generate all possible morphological analyses for a given lexeme and form.

generate(lex, mform, parser)

CitableParserBuilder.generate — Method

Generate all possible morphological analyses for a given lexeme and form.

generate(lex, mform, parser; delim)

CitableParserBuilder.generate — Method

Catch failure to implement generate function for a subtype of CitableParser.

generate(lex, mform, p)

CitableParserBuilder.gettysburgParser — Method

Instatiate GettysburgParser as a dfParser from a local file source.

gettysburgParser(repo; delimiter)

CitableParserBuilder.gettysburgParser — Method

Instantiate a GettysburgParser as a dfParser by downloading source dictionary over the internet.

CitableParserBuilder.id — Method

Function required to get ID value of a Rule implementation.

CitableParserBuilder.id — Method

Function required to get ID value of a Stem implementation.

CitableParserBuilder.inflectiontype — Method

Function required to get string value for inflection class of a Rule implementation.

CitableParserBuilder.inflectiontype — Method

Function required to get string value for inflection class of a Stem implementation.

CitableParserBuilder.lexeme — Method

Function required to get lexeme value of a Stem implementation.

CitableParserBuilder.lexemedictionary — Method

From a vector of AnalyzedTokens and an index of tokens in a corpus, construct a dictionary keyed by lexemes, mapping to a further dictionary of surface forms to passages.

lexemedictionary(parses, tokenindex)

CitableParserBuilder.lexemehisto — Method

Compute histogram of lexemes in AnalyzedTokenCollection.

lexemehisto(parses)

All distinct lexemes for a token are counted; there is no weighting of counts for lexically ambiguous tokens.

CitableParserBuilder.lexemes — Method

Extract a list of lexemes from a Vector of Analysis objects.

lexemes(v)

CitableParserBuilder.lexemes — Method

Extract a list of lexemes from a Vector of AnalyzedToken objects.

lexemes(v)

CitableParserBuilder.lexemes — Method

Extract a list of lexemes from an AnalyzedTokenCollection object.

lexemes(atokens)

CitableParserBuilder.lexemeurn — Method

Identify lexeme identifed in analysis.

lexemeurn(a)

CitableParserBuilder.lexical_ambiguity — Method

Compute lexical ambiguity in list of words analyzed by parser.

lexical_ambiguity(vocablist, p; data)

CitableParserBuilder.lexical_ambiguity — Method

Compute lexical ambiguity in corpus analyzed by parser.

lexical_ambiguity(c, p; data)

CitableParserBuilder.lexical_frequencies — Method

Compute frequencies of lexemes in corpus analyzed by parser.

lexical_frequencies(c, p; data)

CitableParserBuilder.lexically_ambiguous — Method

True if atkn can be analyzed to more than one lexeme.

lexically_ambiguous(atkn)

CitableParserBuilder.lexically_ambiguous — Method

True if Analysis items in a vector of analyses identify more than one lexeme.

lexically_ambiguous(vect)

CitableParserBuilder.mtoken — Method

Identify morphologocal token identified in analysis.

mtoken(a)

CitableParserBuilder.mtokenid — Method

Identify morphological token identified in analysis.

mtokenid(a)

CitableParserBuilder.no_id — Method

True if any element in stringlist is empty.

CitableParserBuilder.orthography — Method

Get orthographic system for a dataframe parser.

orthography(dfp)

CitableParserBuilder.orthography — Method

Catch failure to implement orthography function for a subtype of CitableParser.

orthography(p)

CitableParserBuilder.parsecorpus — Method

Use a CitableParser to parse a CitableTextCorpus with each citable node containing containg a single token of type LexicalToken.

parsecorpus(c, p; data, countinterval)

Returns anAnalyzedTokenCollection object.

CitableParserBuilder.parselist — Method

Parse a list of tokens with a CitableParser.

parselist(vocablist, p; countinterval)

Returns a Dict mapping strings to a (possibly empty) vector of Analysis objects. Blank lines in input are silently ignored.

CitableParserBuilder.parselist — Method

Read a list of tokens from file f and parse with p.

parselist(f, p, reader; countinterval)

Returns a Dict mapping strings to a (possibly empty) vector of Analysis objects.

CitableParserBuilder.parselist — Method

Read a list of tokens from URL u and parse with p.

parselist(u, p, reader; countinterval)

Returns a Dict mapping strings to a (possibly empty) vector of Analysis objects.

CitableParserBuilder.parsepassage — Method

Parse a CitablePassage with text for a single token with a CitableParser.

parsepassage(cn, p; data)

Returns a single AnalyzedToken.

CitableParserBuilder.parsepassage — Method

Parse a CitablePassage with text for a single token with a CitableParser.

parsepassage(ct, p; data)

Returns a single AnalyzedToken.

CitableParserBuilder.parsetoken — Method

Parse a single token using parser.

parsetoken(s, parser)

CitableParserBuilder.passagesforlexeme — Method

Find URNs for all tokens in a vector of AnalyzedTokens parsed to a given lexeme.

passagesforlexeme(v, lexstr)

CitableParserBuilder.protectreserved — Method

Escape characters reserved for SFST syntax.

protectreserved(s)

CitableParserBuilder.readfst — Method

Read SFST output from file f, and parse into a dictionary keying tokens to a (possibly empty) array of SFST strings.

readfst(f)

CitableParserBuilder.relationsblock — Function

Compose a CEX relationset block for a set of analyses.

relationsblock(urn, label, v; ...)
relationsblock(urn, label, v, delim; registry)

CitableParserBuilder.rules — Method

Extract a list of rules from a Vector of Analysis objects.

rules(v)

CitableParserBuilder.rules — Method

Extract a list of rules from a Vector of AnalyzedToken objects.

rules(v)

CitableParserBuilder.rules — Method

Extract a list of rules from an AnalyzedTokenCollection object.

rules(atokens)

CitableParserBuilder.ruleurn — Method

Identify inflectional rule identifed in analysis.

ruleurn(a)

CitableParserBuilder.stems — Method

Extract a list of stems from a Vector of Analysis objects.

stems(v)

CitableParserBuilder.stems — Method

Extract a list of stems from a Vector of AnalyzedToken objects.

stems(v)

CitableParserBuilder.stems — Method

Extract a list of stems from an AnalyzedTokenCollection object.

stems(atokens)

CitableParserBuilder.stemurn — Method

Identify lexical stem identifed in analysis.

stemurn(a)

CitableParserBuilder.stringParser — Function

Construct a string-backed parser from a dataframe.

stringParser(df)
stringParser(df, ortho)
stringParser(df, ortho, delim)

CitableParserBuilder.stringParser — Method

Instantiate a StringParser from a set of analyses read from a file.

stringParser(f, freader; o, delim)

CitableParserBuilder.stringParser — Method

Instantiate a StringParser from a set of analyses read from a string.

stringParser(s, freader; ortho, delim)

CitableParserBuilder.stringParser — Method

Instantiate a StringParser from a set of analyses read from a URL.

stringParser(u, ureader; o, delim)

CitableParserBuilder.stringsforlexeme — Method

Find token string values for all tokens in a vector of AnalyzedTokens parsed to a given lexeme.

stringsforlexeme(v, lexstr)

CitableParserBuilder.tofile — Method

Write dataframe parser to a delimited file.

tofile(dfp, outfile; delimiter)

CitableParserBuilder.token — Method

Identify orthographic token analyzed.

token(a)

CitableParserBuilder.tokens — Method

Extract a list of string token values from a Vector of Analysis objects.

tokens(v)

CitableParserBuilder.tokens — Method

Extract a list of string token values from a Vector of AnalyzedToken objects.

tokens(v)

CitableParserBuilder.tokens — Method

Extract a list of string token values from an AnalyzedTokenCollection object.

tokens(atokens)