FranklinParser.ALPHANUM_ALL
— ConstantALPHANUM_ALL
ALPHA_ALL and digits.
FranklinParser.ALPHANUM_LATIN
— ConstantALPHANUM_LATIN
Convenience list of characters corresponding to a-zA-Z0-9.
FranklinParser.ALPHA_ALL
— ConstantALPHA_ALL
All 10_000 first characters.
FranklinParser.ALPHA_LATIN
— ConstantALPHA_LATIN
Convenience list of characters corresponding to letters a-zA-Z.
FranklinParser.CODE_LANG3_PAT
— ConstantCODE_LANG*_PAT
FranklinParser.END_OF_LINE
— ConstantEND_OF_LINE
All tokens that indicate the end of a line.
FranklinParser.EOS
— ConstantEOS
Mark the end of the string to parse (helps with corner cases where a token ends a document without being followed by a space).
FranklinParser.F_DIV_OPEN
— ConstantF_DIV_OPEN
Finder for @@div
checking that div
matches a simplified rule for allowed CSS class names. The complete rule being -?[_a-zA-Z]+[_a-zA-Z0-9-]*
which we simplify here to [a-zA-Z]+[_a-zA-Z0-9-]
and allow ,
for separation so @@d1,d2
is allowed and corresponds to a setting where we pass two classes class="d1 d2"
.
FranklinParser.F_EMOJI
— ConstantF_EMOJI
Finder for emojis (those will have to be validated separately to check Julia recognises them).
FranklinParser.F_FOOTNOTE
— ConstantF_FOOTNOTE
Finder for footnotes.
FranklinParser.F_HTML_ENTITY
— ConstantF_HTML_ENTITY
Finder for html entities.
FranklinParser.F_LANG_3
— ConstantF_LANG_*
Finder for code blocks. I.e. something like a sequence of 3, 4 or 5 backticks followed by a valid combination of letter defining a language.
FranklinParser.F_LINE_RETURN
— ConstantF_LINE_RETURN
Finder for a line return () followed by any number of whitespaces or tabs. These will subsequently be checked to see if they are followed by something that constitutes a list item or not.
FranklinParser.F_LX_COMMAND
— ConstantF_LX_COMMAND
Finder for latex command. First character is [a-zA-Z]
. We do allow numbers (there's no ambiguity because \com1
is not allowed to mean \com{1}
unlike in LaTeX). Underscores are allowed inside the command but not at the very start or very end to avoid confusion respectively with the escaped _
character or the emphasis in markdown, *
are not allowed anywhere (including at the end). See also the check pattern.
FranklinParser.HR1_PAT
— ConstantHR*_PAT
Pattern to match horizontal rule indicators.
FranklinParser.HTML_ENTITY_PAT
— ConstantHTML_ENTITY_PAT
Pattern for an html entity. Ref: https://dev.w3.org/html5/html-author/charref.
Examples:
- ⊓
- ⊓
- ⊓
- ⊓
Note: longest entity is ∳ so capping the max number of characters to 32.
FranklinParser.HTML_TOKENS
— ConstantHTML_TOKENS
Dictionary of tokens for HTML. See also MD_TOKENS
.
FranklinParser.LEADING_WHITESPACE_PAT
— Constant*_WHITESPACE_PAT
Pattern to match the whitespaces (tabs or spaces) at the start of a line, see dedent
.
FranklinParser.LX_COMMAND_PAT
— ConstantLX_COMMAND_PAT
Allowed latex command name. Underscore are allowed inside the command but not at extremities. The star *
is not allowed anywhere.
Examples:
- \com
- \ab1_cd*
FranklinParser.MD_HEADERS
— ConstantMD_HEADERS
Tokens for headers.
FranklinParser.MD_IGNORE
— ConstantMD_IGNORE
Tokens that may be left over after partition but should be ignored in text blocks.
FranklinParser.MD_MATH_TOKENS
— ConstantMDMATHTOKENS
Tokens that should be considered within a math environment.
FranklinParser.MD_TOKENS
— ConstantMD_TOKENS
Dictionary of tokens for Markdown. Note that for each, there may be several possibilities to consider in which case the order is important: the first case that works will be taken.
Dev: F* are greedy match, see `mdutils.jl`.
Try: https://spec.commonmark.org/dingus
FranklinParser.NUM_CHAR
— ConstantNUM_CHAR
Convenience list of characters corresponding to digits.
FranklinParser.SPACE_CHAR
— ConstantSPACE_CHARS
List of characters that correspond to a \s
regex + EOS.
Ref: https://github.com/JuliaLang/julia/blob/master/base/strings/unicode.jl.
FranklinParser.AbstractSpan
— TypeAbstractSpan
Section of a parent String with a specific meaning for Franklin. All subtypes of AbstractBlock
must have an ss
field corresponding to the substring associated to the block. This field is necessarily of type SubString{String}
.
FranklinParser.Block
— TypeBlock <: AbstractSpan
Blocks are defined by an opening and a closing Token
, they may be nested. For instance braces block are formed of an opening {
and a closing }
.
FranklinParser.BlockTemplate
— TypeBlockTemplate
Template for a block to find. A block goes from a token with a given opening
name to one of several possible closing
names. Blocks can allow or disallow nesting. For instance brace blocks can be nested {.{.}.}
but not comments. When nesting is enabled, Franklin will try to find the closing token taking into account the balance in opening-closing tokens.
FranklinParser.Chomp
— TypeChomp
Structure to encapsulate rules around a token such as whether it's fine at the end of a string, what are allowed following characters and, in the greedy case, what characters are allowed.
FranklinParser.Group
— TypeGroup <: AbstractSpan
A Group contains 1 or more more Blocks and will map to either a Paragraph or something else like a code block.
FranklinParser.Token
— TypeToken <: AbstractSpan
A token is a subtype of AbstractSpan
which typically determines the start or end of a block. It can also be used for special characters.
FranklinParser.TokenFinder
— TypeTokenFinder
Structure to find a token keeping track of how many characters should be seen, some rules with respect to positioning or following chars (see Chomp) and possibly a validator that checks whether a candidate respects a rule.
FranklinParser.TextBlock
— FunctionTextBlock
Spans of text which should be left to the fallback engine (such as CommonMark for instance). Text blocks can also have inner tokens that are non-block delimiters such as emojis or html entities.
FranklinParser._error_message
— Method_error_message(title, body, ss)
Helper function to write an informative error message that
FranklinParser._find_blocks!
— Function_find_blocks!(...)
Helper function to resolve each of the passes looking at a different set of templates.
FranklinParser._hrule!
— Method_hrule!(blocks, token, regex)
Helper function to match and process a hrule.
FranklinParser.aggregate!
— Methodaggregate!(blocks, items, acc, case)
Merge a bunch of blocks into a parent block. For instance at this point there may be multiple BLOCKQUOTE_LINE, this function will aggregate them into one BLOCKQUOTE block.
Arguments
* blocks: the current vector of blocks we're working with
* items: list of names of blocks that would trigger the aggregation
* acc: list of names of blocks that would be taken in the aggregation
* case: name of the block resulting from the aggregation
FranklinParser.block_not_closed_exception
— Methodblock_not_closed_exception(ot)
Throw a FranklinParserException
caused by a block opened by a token ot
and left open.
FranklinParser.check
— Methodcheck(tokenfinder, ss)
Check whether a substring verifies the regex of a token finder.
FranklinParser.content
— Methodcontent(block)
Return the content of a Block
, for instance the content of a {...}
block would be ...
. Note EOS is a special '0 length' case to deal with the fact that a text can end with a token (which would then be an overlapping token and an EOS).
FranklinParser.dedent
— Methoddedent(s)
Remove the common leading whitespace from each non-empty line. The returned text is decoupled from the original text (forced to String).
This is used in the context of lxdef in Franklin for instance, see tryformlxdef.
FranklinParser.env_not_closed_exception
— Methodenv_not_closed_exception(b)
Throw a FranklinParserException
caused by an environment opened by a block b
and either not formed properly or left open.
FranklinParser.find_blocks
— Methodfind_blocks(tokens, templates)
Given a list of tokens and a dictionary of block templates, find all blocks matching templates. The blocks are sorted by order of appearance and inner blocks are weeded out.
FranklinParser.find_tokens
— Methodfind_tokens(s, templates)
Go through a text left to right, one (valid) char at the time and keep track of sequences of chars that match specific tokens. The list of tokens found is returned.
Arguments
s
: the initial texttemplates
: dictionary of possible tokens
Errors
This should not throw any error, everything should be explicitly handled by a code path.
FranklinParser.fixed_lookahead
— Methodfixed_lookahead(tokenfinder, candidate, at_eos)
Applies a fixed lookahead step corresponding to a token finder. This is used as a helper function in find_tokens
.
FranklinParser.form_dbb!
— Methodform_dbb!(blocks)
Find CU_BRACKETS blocks that start with {{
and and with }}
and mark them as :DBB.
FranklinParser.form_links!
— Methodform_links!(blocks)
Here we catch the following:
* [A] LINK_A for <a href="ref(A)">html(A)</a>
* [A][B] LINK_AR for <a href="ref(B)">html(A)</a>
* [A](B) LINK_AB for <a href="escape(B)">html(A)</a>
* ![A] IMG_A <img src="ref(A)" alt="esc(A)" />
* ![A](B) IMG_AB <img src="escape(B)" alt="esc(A)" />
* [A]: B REF (--> aggregate B, will need to distinguish later)
where 'A' is necessarily non empty, 'B' may be empty.
Note: currently we DO NOT support links with titles such as the following out of simplicity:
- [A]: B C
- A
this allows to not have to check whether B is a link and C is text. If the user wants links with titles, they should create a command for it. We also do not support link destinations between <...>.
Note: in the case of a LINK_A, we check around if the previous non whitespace character and the next non whitespace character don't happen to be } {. In that specific case, the link is
FranklinParser.forward_match
— Functionforward_match(refstring, next_chars, is_followed)
Return a TokenFinder corresponding to a forward lookup checking if a sequence of characters matches a refstring
and is followed (or not followed if is_followed==false
) by a char out of a list of chars (next_chars
).
FranklinParser.from
— Methodfrom(o)
Given a SubString ss
, returns a valid string index where the substring starts. If ss
is a String, return 1. Returns an Int. ```
FranklinParser.get_classes
— Methodget_classes(divblock)
Return the classe(s) of a div block. E.g. @@c1,c2
will return "c1 c2"
so that it can be injected in a <div class="..."
.
FranklinParser.greedy_lookahead
— Methodgreedy_lookahead(tokenfinder, nchars, probe_char)
Applies a greedy lookahead step corresponding to a token finder. This is used as a helper function in find_tokens
.
FranklinParser.greedy_match
— Methodgreedy_match(head_chars, tail_chars, check)
Lazily accept the next char and stop as soon as it fails to verify λ(c)
.
FranklinParser.insert
— Methodinsert(token)
For tokens representing special characters, insert the relevant string.
FranklinParser.md_grouper
— Methodmd_grouper(blocks)
Form begin-end spans keeping track of tokens and group text and inline blocks after partition, this helps in forming paragraphs.
FranklinParser.next_chars
— Functionnext_chars(o, n)
Return the characters just after the object. Empty vector if there isn't the number of characters required.
FranklinParser.next_index
— Methodnext_index(o)
Return the index just after the object o
.
FranklinParser.parent_string
— Methodparent_string(o)
Returns the parent string corresponding to s
; i.e. s
itself if it is a String, or the parent string if s
is a SubString. Returns a String.
FranklinParser.partition
— Methodpartition(s, tokenizer, blockifier, tokens; disable, postproc)
Go through a piece of text, either with an existing tokenization or an empty one, tokenize if needed with the given tokenizer, blockify with the given blockifier, and return a partition of the text into a vector of Blocks.
Args
KwArgs
* disable: list of token names to ignore (e.g. if want to allow math)
* postproc: postprocessing to
FranklinParser.prepare_text
— Methodprepare_text(blocks)
For a text block, replace the remaining tokens for special characters.
FranklinParser.prev_index
— Methodprev_index(o)
Return the index just before the object o
.
FranklinParser.previous_chars
— Functionprevious_chars(o, n)
Return the characters just before the object. Empty vector if there isn't the number of characters required.
FranklinParser.process_autolink_close_tokens!
— Methodprocess_autolink_close_tokens!(tokens)
Discard :AUTOLINK_CLOSE that are preceded by a space.
FranklinParser.process_emphasis_tokens!
— Methodprocess_emphasis_tokens!(tokens)
Process emphasis token candidates and either take them or discard them if they don't look correct.
sTs
with tokenT
iss
is a spacexTs
with tokenT
is a valid CLOSE ifx
is a character ands
a spacesTx
with tokenT
is a valid OPEN ifx
is a character ands
a spacexTy
with tokenT
is a valid MIXED ifx
,y
are characters
FranklinParser.process_header_tokens!
— Methodprocess_header_tokens!(tokens)
Discard header tokens that are not at the start of a line or only preceded by whitespaces.
FranklinParser.process_line_return!
— Methodprocess_line_return!(blocks, tokens, i)
Process a line return followed by any number of white spaces and one or more characters. Depending on these characters, it will lead to a different interpretation and an update of the token.
if the next non-space character(s) is/are:
- another lret –> interpret as paragraph break (double line skip)
- two -,* or _ –> a hrule that will need to be validated later
- one *, +, -, etc. –> an item candidate
- | –> table row candidate
--> a blockquote (startswith >).
We disambiguate the different cases based on the two characters after the whitespaces of the line return (the line return token captures [ ]*
).
FranklinParser.remove_inner!
— Methodremove_inner!(blocks)
Remove blocks which are part of larger blocks (these will get re-formed and re-processed at an ulterior step).
FranklinParser.split_args
— Methodsplit_args(s)
Take a string like 'foo "bar baz" 1' and return a string that is split along whitespaces preserving quoted strings. So ["foo", ""bar baz"", "1"].
FranklinParser.subs
— Methodsubs(...)
Facilitate taking a SubString of an AS. The bounds given are expected to be valid String indices. Returns a SubString.
FranklinParser.to
— Methodto(o)
Given a SubString ss
, returns a valid string index where the substring ends. If ss
is a String, return the last index. Returns an Int.
FranklinParser.tokenizer_factory
— Methodtokenizer_factory(; templates, postproc)
Arguments:
templates: a dictionary or matchers to find tokens.
postproc: a function to apply on tokens after they've been found e.g. to merge
them or filter them etc.
Returns:
A function that takes a string and returns a vector of tokens.
FranklinParser.until_next_line_return
— Methoduntil_next_line_return(o)
Return the substring following the object and until the next line return or end.
FranklinParser.until_previous_line_return
— Methoduntil_previous_line_return(o)
Return the substring preceding the object until the preceding line return if any.