Docstrings · FranklinParser.jl

Finder for @@div checking that div matches a simplified rule for allowed CSS class names. The complete rule being -?[_a-zA-Z]+[_a-zA-Z0-9-]* which we simplify here to [a-zA-Z]+[_a-zA-Z0-9-] and allow , for separation so @@d1,d2 is allowed and corresponds to a setting where we pass two classes class="d1 d2".

FranklinParser.F_EMOJI — Constant

F_EMOJI

Finder for emojis (those will have to be validated separately to check Julia recognises them).

FranklinParser.F_FOOTNOTE — Constant

F_FOOTNOTE

Finder for footnotes.

FranklinParser.F_HTML_ENTITY — Constant

F_HTML_ENTITY

Finder for html entities.

FranklinParser.F_LANG_3 — Constant

F_LANG_*

Finder for code blocks. I.e. something like a sequence of 3, 4 or 5 backticks followed by a valid combination of letter defining a language.

FranklinParser.F_LINE_RETURN — Constant

F_LINE_RETURN

Finder for a line return () followed by any number of whitespaces or tabs. These will subsequently be checked to see if they are followed by something that constitutes a list item or not.

FranklinParser.F_LX_COMMAND — Constant

F_LX_COMMAND

Finder for latex command. First character is [a-zA-Z]. We do allow numbers (there's no ambiguity because \com1 is not allowed to mean \com{1} unlike in LaTeX). Underscores are allowed inside the command but not at the very start or very end to avoid confusion respectively with the escaped _ character or the emphasis in markdown, * are not allowed anywhere (including at the end). See also the check pattern.

FranklinParser.HR1_PAT — Constant

HR*_PAT

Pattern to match horizontal rule indicators.

FranklinParser.HTML_ENTITY_PAT — Constant

HTML_ENTITY_PAT

Pattern for an html entity. Ref: https://dev.w3.org/html5/html-author/charref.

Examples:

&sqcap;
&SquareIntersection;
⊓
⊓

Note: longest entity is &CounterClockwiseContourIntegral; so capping the max number of characters to 32.

FranklinParser.HTML_TOKENS — Constant

HTML_TOKENS

Dictionary of tokens for HTML. See also MD_TOKENS.

FranklinParser.LEADING_WHITESPACE_PAT — Constant

*_WHITESPACE_PAT

Pattern to match the whitespaces (tabs or spaces) at the start of a line, see dedent.

FranklinParser.LX_COMMAND_PAT — Constant

LX_COMMAND_PAT

Allowed latex command name. Underscore are allowed inside the command but not at extremities. The star * is not allowed anywhere.

Examples:

\com
\ab1_cd*

FranklinParser.MD_HEADERS — Constant

MD_HEADERS

Tokens for headers.

FranklinParser.MD_IGNORE — Constant

MD_IGNORE

Tokens that may be left over after partition but should be ignored in text blocks.

FranklinParser.MD_MATH_TOKENS — Constant

MDMATHTOKENS

Tokens that should be considered within a math environment.

FranklinParser.MD_TOKENS — Constant

MD_TOKENS

Dictionary of tokens for Markdown. Note that for each, there may be several possibilities to consider in which case the order is important: the first case that works will be taken.

Dev: F* are greedy match, see `mdutils.jl`.

Try: https://spec.commonmark.org/dingus

FranklinParser.NUM_CHAR — Constant

NUM_CHAR

Convenience list of characters corresponding to digits.

FranklinParser.SPACE_CHAR — Constant

SPACE_CHARS

List of characters that correspond to a \s regex + EOS.

Ref: https://github.com/JuliaLang/julia/blob/master/base/strings/unicode.jl.

FranklinParser.AbstractSpan — Type

AbstractSpan

Section of a parent String with a specific meaning for Franklin. All subtypes of AbstractBlock must have an ss field corresponding to the substring associated to the block. This field is necessarily of type SubString{String}.

FranklinParser.Block — Type

Block <: AbstractSpan

Blocks are defined by an opening and a closing Token, they may be nested. For instance braces block are formed of an opening { and a closing }.

FranklinParser.BlockTemplate — Type

BlockTemplate

Template for a block to find. A block goes from a token with a given opening name to one of several possible closing names. Blocks can allow or disallow nesting. For instance brace blocks can be nested {.{.}.} but not comments. When nesting is enabled, Franklin will try to find the closing token taking into account the balance in opening-closing tokens.

FranklinParser.Chomp — Type

Chomp

Structure to encapsulate rules around a token such as whether it's fine at the end of a string, what are allowed following characters and, in the greedy case, what characters are allowed.

FranklinParser.Group — Type

Group <: AbstractSpan

A Group contains 1 or more more Blocks and will map to either a Paragraph or something else like a code block.

FranklinParser.Token — Type

Token <: AbstractSpan

A token is a subtype of AbstractSpan which typically determines the start or end of a block. It can also be used for special characters.

FranklinParser.TokenFinder — Type

TokenFinder

Structure to find a token keeping track of how many characters should be seen, some rules with respect to positioning or following chars (see Chomp) and possibly a validator that checks whether a candidate respects a rule.

FranklinParser.TextBlock — Function

TextBlock

Spans of text which should be left to the fallback engine (such as CommonMark for instance). Text blocks can also have inner tokens that are non-block delimiters such as emojis or html entities.

FranklinParser._error_message — Method

_error_message(title, body, ss)

Helper function to write an informative error message that

FranklinParser._find_blocks! — Function

_find_blocks!(...)

Helper function to resolve each of the passes looking at a different set of templates.

FranklinParser._hrule! — Method

_hrule!(blocks, token, regex)

Helper function to match and process a hrule.

FranklinParser.aggregate! — Method

aggregate!(blocks, items, acc, case)

Merge a bunch of blocks into a parent block. For instance at this point there may be multiple BLOCKQUOTE_LINE, this function will aggregate them into one BLOCKQUOTE block.

Arguments

* blocks: the current vector of blocks we're working with
* items:  list of names of blocks that would trigger the aggregation
* acc:    list of names of blocks that would be taken in the aggregation
* case:   name of the block resulting from the aggregation

FranklinParser.block_not_closed_exception — Method

block_not_closed_exception(ot)

Throw a FranklinParserException caused by a block opened by a token ot and left open.

FranklinParser.check — Method

check(tokenfinder, ss)

Check whether a substring verifies the regex of a token finder.

FranklinParser.content — Method

content(block)

Return the content of a Block, for instance the content of a {...} block would be .... Note EOS is a special '0 length' case to deal with the fact that a text can end with a token (which would then be an overlapping token and an EOS).

FranklinParser.dedent — Method

dedent(s)

Remove the common leading whitespace from each non-empty line. The returned text is decoupled from the original text (forced to String).

This is used in the context of lxdef in Franklin for instance, see tryformlxdef.

FranklinParser.env_not_closed_exception — Method

env_not_closed_exception(b)

Throw a FranklinParserException caused by an environment opened by a block b and either not formed properly or left open.

FranklinParser.find_blocks — Method

find_blocks(tokens, templates)

Given a list of tokens and a dictionary of block templates, find all blocks matching templates. The blocks are sorted by order of appearance and inner blocks are weeded out.

FranklinParser.find_tokens — Method

find_tokens(s, templates)

Go through a text left to right, one (valid) char at the time and keep track of sequences of chars that match specific tokens. The list of tokens found is returned.

Arguments

s: the initial text
templates: dictionary of possible tokens

Errors

This should not throw any error, everything should be explicitly handled by a code path.

FranklinParser.fixed_lookahead — Method

fixed_lookahead(tokenfinder, candidate, at_eos)

Applies a fixed lookahead step corresponding to a token finder. This is used as a helper function in find_tokens.

FranklinParser.form_dbb! — Method

form_dbb!(blocks)

Find CU_BRACKETS blocks that start with {{ and and with }} and mark them as :DBB.

FranklinParser.form_links! — Method

form_links!(blocks)

Here we catch the following:

* [A]     LINK_A   for <a href="ref(A)">html(A)</a>
* [A][B]  LINK_AR  for <a href="ref(B)">html(A)</a>
* [A](B)  LINK_AB  for <a href="escape(B)">html(A)</a>
* ![A]    IMG_A    <img src="ref(A)" alt="esc(A)" />
* ![A](B) IMG_AB   <img src="escape(B)" alt="esc(A)" />
* [A]: B  REF      (--> aggregate B, will need to distinguish later)

where 'A' is necessarily non empty, 'B' may be empty.

Note: currently we DO NOT support links with titles such as the following out of simplicity:

[A]: B C
A

this allows to not have to check whether B is a link and C is text. If the user wants links with titles, they should create a command for it. We also do not support link destinations between <...>.

Note: in the case of a LINK_A, we check around if the previous non whitespace character and the next non whitespace character don't happen to be } {. In that specific case, the link is

FranklinParser.forward_match — Function

forward_match(refstring, next_chars, is_followed)

Return a TokenFinder corresponding to a forward lookup checking if a sequence of characters matches a refstring and is followed (or not followed if is_followed==false) by a char out of a list of chars (next_chars).

FranklinParser.from — Method

from(o)

Given a SubString ss, returns a valid string index where the substring starts. If ss is a String, return 1. Returns an Int. ```

FranklinParser.get_classes — Method

get_classes(divblock)

Return the classe(s) of a div block. E.g. @@c1,c2 will return "c1 c2" so that it can be injected in a <div class="...".

FranklinParser.greedy_lookahead — Method

greedy_lookahead(tokenfinder, nchars, probe_char)

Applies a greedy lookahead step corresponding to a token finder. This is used as a helper function in find_tokens.

FranklinParser.greedy_match — Method

greedy_match(head_chars, tail_chars, check)

Lazily accept the next char and stop as soon as it fails to verify λ(c).

FranklinParser.insert — Method

insert(token)

For tokens representing special characters, insert the relevant string.

FranklinParser.md_grouper — Method

md_grouper(blocks)

Form begin-end spans keeping track of tokens and group text and inline blocks after partition, this helps in forming paragraphs.

FranklinParser.next_chars — Function

next_chars(o, n)

Return the characters just after the object. Empty vector if there isn't the number of characters required.

FranklinParser.next_index — Method

next_index(o)

Return the index just after the object o.

FranklinParser.parent_string — Method

parent_string(o)

Returns the parent string corresponding to s; i.e. s itself if it is a String, or the parent string if s is a SubString. Returns a String.

FranklinParser.partition — Method

partition(s, tokenizer, blockifier, tokens; disable, postproc)

Go through a piece of text, either with an existing tokenization or an empty one, tokenize if needed with the given tokenizer, blockify with the given blockifier, and return a partition of the text into a vector of Blocks.

Args

KwArgs

* disable:  list of token names to ignore (e.g. if want to allow math)
* postproc: postprocessing to

FranklinParser.prepare_text — Method

prepare_text(blocks)

For a text block, replace the remaining tokens for special characters.

FranklinParser.prev_index — Method

prev_index(o)

Return the index just before the object o.

FranklinParser.previous_chars — Function

previous_chars(o, n)

Return the characters just before the object. Empty vector if there isn't the number of characters required.

FranklinParser.process_autolink_close_tokens! — Method

process_autolink_close_tokens!(tokens)

Discard :AUTOLINK_CLOSE that are preceded by a space.

FranklinParser.process_emphasis_tokens! — Method

process_emphasis_tokens!(tokens)

Process emphasis token candidates and either take them or discard them if they don't look correct.

sTs with token T is s is a space
xTs with token T is a valid CLOSE if x is a character and s a space
sTx with token T is a valid OPEN if x is a character and s a space
xTy with token T is a valid MIXED if x, y are characters

FranklinParser.process_header_tokens! — Method

process_header_tokens!(tokens)

Discard header tokens that are not at the start of a line or only preceded by whitespaces.

FranklinParser.process_line_return! — Method

process_line_return!(blocks, tokens, i)

Process a line return followed by any number of white spaces and one or more characters. Depending on these characters, it will lead to a different interpretation and an update of the token.

if the next non-space character(s) is/are:

another lret –> interpret as paragraph break (double line skip)
two -,* or _ –> a hrule that will need to be validated later
one *, +, -, etc. –> an item candidate
| –> table row candidate

            --> a blockquote (startswith >).

We disambiguate the different cases based on the two characters after the whitespaces of the line return (the line return token captures [ ]*).

FranklinParser.remove_inner! — Method

remove_inner!(blocks)

Remove blocks which are part of larger blocks (these will get re-formed and re-processed at an ulterior step).

FranklinParser.split_args — Method

split_args(s)

Take a string like 'foo "bar baz" 1' and return a string that is split along whitespaces preserving quoted strings. So ["foo", ""bar baz"", "1"].

FranklinParser.subs — Method

subs(...)

Facilitate taking a SubString of an AS. The bounds given are expected to be valid String indices. Returns a SubString.

FranklinParser.to — Method

to(o)

Given a SubString ss, returns a valid string index where the substring ends. If ss is a String, return the last index. Returns an Int.

FranklinParser.tokenizer_factory — Method

tokenizer_factory(; templates, postproc)

Arguments:

templates: a dictionary or matchers to find tokens.
postproc: a function to apply on tokens after they've been found e.g. to merge
    them or filter them etc.

Returns:

A function that takes a string and returns a vector of tokens.

FranklinParser.until_next_line_return — Method

until_next_line_return(o)

Return the substring following the object and until the next line return or end.

FranklinParser.until_previous_line_return — Method

until_previous_line_return(o)

Return the substring preceding the object until the preceding line return if any.