Using CombinedParsers


Printing CombinedParsers uses AbstractTrees.jl for printing. The tree nodes are printed with

  1. a colored regular expressionsish prefix
  2. ๐Ÿ—„ Sub-parsers are shown as children branches.
  3. CombinedParsers.WrappedParser constructors are displayed with pipe |> syntax.

In the last line of printing the infered result type of the CombinedParser is printed.

Printing is useful to understand the structure of regular expressions, while also learning CombinedParser syntax:

julia> p = trim(re"(?:a+c)*b")
๐Ÿ—„ Sequence[2]
โ”œโ”€ (?>[\h]*) CharIn |> Repeat |> Atomic
โ”œโ”€ ๐Ÿ—„ Sequence
โ”‚  โ”œโ”€ ๐Ÿ—„* Sequence |> Repeat
โ”‚  โ”‚  โ”œโ”€ a+  |> Repeat
โ”‚  โ”‚  โ””โ”€ c 
โ”‚  โ””โ”€ b 
โ””โ”€ (?>[\h]*) CharIn |> Repeat |> Atomic
::Tuple{Vector{Tuple{Vector{Char}, Char}}, Char}

Parser templates


Base.match โ€” Function
Base.match(parser::CombinedParser,sequence::AbstractString[, idx::Integer]; log=nothing)

Search for the first match of parser in sequence and return a ParseMatch object containing the match, or nothing if the match failed.

The optional idx argument specifies an index at which to start the search.

If log!==nothing, parser is transformed with log_names(p, log).

The matching substring can be retrieved by accessing m.match.


If parser isa CombinedParsers.Regexp.ParserWithCaptures, match behaves like a plug-in replacement for equivalent match(::Regex,sequence):

julia> m = match(re"(?<a>so)+ (or)", "soso or")
ParseMatch("soso or", a="so", 2="or")

julia> m[:a]

julia> m[2]

julia> m.match, m.captures
("soso or", SubString{String}["so", "or"])


CombinedParser comprise of a pattern as well transformation functions to produce a Julia result_type from a match with get.

julia> match(trim(re"(?:a+c)*b"), "aacacb")

julia> get(m)
([(['a', 'a'], 'c'), (['a'], 'c')], 'b')

Defining transformations is detailed in the transformation section.

Base.get โ€” Function
Base.get(parser::Assertion{MatchState, <:Assertion}, sequence, till, after, i, state)

Most assertions return the assertion parser as a result (AtStart, AtEnd, Always, Never, NegativeLookahead, NegativeLookbehind).


Get the result of a match result.

julia> m = match(re"(?<a>so)+ (or)", "soso or")
ParseMatch("soso or", a="so", 2="or")

julia> get(m)
([('s', 'o'), ('s', 'o')], ' ', ('o', 'r'))

julia> m[2]

julia> m.match, m.captures
("soso or", SubString{String}["so", "or"])
Base.get(parser::PositiveLookbehind, sequence, till, after, i, state)

get result of PositiveLookbehind


The result is currently for a reversed sequence, and you might find it difficult to a lookbehind parser match. If you require this functionality please open an issue for discussion.

Assertions do not consume input, so typically these input chars are parsed/mapped outside of the assertion.

julia> p = Sequence(!re"a+b", PositiveLookbehind(!re"a+b"))
๐Ÿ—„ Sequence
โ”œโ”€ ๐Ÿ—„ Sequence |> !
โ”‚  โ”œโ”€ a+  |> Repeat
โ”‚  โ””โ”€ b
โ””โ”€ (?<=๐Ÿ—„) Sequence |> ! |> PositiveLookbehind
   โ”œโ”€ b
   โ””โ”€ a+  |> Repeat
::Tuple{SubString{String}, SubString{String}}

julia> p("aaab")
("aaab", "baaa")
Base.get(parser::Bytes{N,T}, sequence::Vector{UInt8})

Endianness can be achieved by just mapping bswap

julia> map(bswap, Bytes(2,UInt16))([0x16,0x11])

julia> Bytes(2,UInt16)([0x16,0x11])
Base.get(parser::Transformation{<:Function}, a...)
Base.get(parser::Transformation{<:Type}, a...)

Function call parser.transform(get(parser.parser,a...)).

Base.get(parser::Transformation{<:IndexAt}, a...)


Base.parse โ€” Function
parse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)

Parse sequence with parser at start and produce an instance of result_type(parser). If log!==nothing, parser is transformed with log_names(p, log) before matching.

tryparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)

returns either a result value or nothing if sequence does not start with with a match.

tryparse_pos(parser::CombinedParser, str::AbstractString[, idx=firstindex(sequence)[, till=lastindex(s)]]; log=nothing)

returns either a tuple of result value and the position after the match, or nothing if sequence does not start with with a match.


julia> using TextParse

julia> p = ("Number: "*TextParse.Numeric(Int))[2]
๐Ÿ—„ Sequence[2]
โ”œโ”€ Number\:\
โ””โ”€ <Int64>

julia> parse(p,"Number: 42")

Iterating matches

CombinedParsers iterates through all matches if parsing is ambiguous. How to write custom parser match iterations is detailed in the internals section.

Base.iterate โ€” Function
Base.iterate(x::ParseMatch[, m::ParseMatch=x])

Returns next ParseMatch at m.offset after m.state, see iterate_state(m).

Base.iterate(x::MatchesIterator[, s::ParseMatch=ParseMatch(x)])

Iterate match s at current position. While no match is found and s.offset<=x.stop, s.offset is incremented to search.

Return first next ParseMatch (as return value and state) or nothing when at x.stop.

CombinedParsers.parse_all โ€” Function
parse_all(parser::CombinedParser, sequence, idx=1)

Returns an iterator over all parsings of the sequence offset at idx.