PCRE Regular expressions

You can use PCRE @re_str in combination with CombinedParser's constructors.

Constructing Regular expressions

CombinedParsers.Regexp.@re_str โ€” Macro

Construct a ParserWithCaptures from PCRE regex syntax, such as re"^[a-z]*$", without interpolation and unescaping (except for quotation mark " which still has to be escaped). Plug-in replacement for PCRE string macro @r_str.

The regex also accepts one or more flags, listed after the ending quote, to change its behaviour:

  • i enables case-insensitive matching
  • m treats the ^ and $ tokens as matching the start and end of individual lines, as opposed to the whole string.
  • s allows the . modifier to match newlines.
  • x enables "comment mode": whitespace is ignored except when escaped with \, and # is treated as starting a comment.
  • a disables UCP mode (enables ASCII mode). By default \B, \b, \D, \d, \S, \s, \W, \w, etc. match based on Unicode character properties. With this option, these sequences only match ASCII characters.
  • xx enables "extended comment mode": whitespace in bracket character matchers are ignored.
julia> re"a|c"i
|๐Ÿ—„ Either
โ”œโ”€ [aA] ValueIn
โ””โ”€ [cC] ValueIn

julia> re"a+c"
๐Ÿ—„ Sequence
โ”œโ”€ a+  |> Repeat
โ””โ”€ c
::Tuple{Vector{Char}, Char}

See also Regcomb, parse_options.

CombinedParsers.Regexp โ€” Module

A regular expression parser transforming a PCRE string to a CombinedParser equivalent to the regular expression.

Base.getindex โ€” Method

Gets capture i as SubString.

See API of RegexMatch.

Base.getproperty โ€” Method

enable m.captures and m.match.

See API of RegexMatch.

CombinedParsers.regex_escape โ€” Function

regular expression metacharacters are escaped along with whitespace.

Compatibility & Unit Tests

CombinedParsers.Regexp.character_class โ€” Constant
julia> CombinedParsers.Regexp.character_class
๐Ÿ—„ Sequence |> map(#57)
โ”œโ”€ \[\: 
โ”œโ”€ |๐Ÿ—„ Either
โ”‚  โ”œโ”€ alpha  => [\p{L}] ValueIn
โ”‚  โ”œโ”€ lower  => [\p{Ll}] ValueIn
โ”‚  โ”œโ”€ upper  => [\p{Lu}] ValueIn
โ”‚  โ”œโ”€ word  => [\p{L}\p{Nl}\p{Nd}\p{Pc}] ValueIn
โ”‚  โ”œโ”€ digit  => [\p{Nd}] ValueIn
โ”‚  โ”œโ”€ xdigit  => [[:xdigit:]] ValueIn
โ”‚  โ”œโ”€ alnum  => [\p{L}\p{N}] ValueIn
โ”‚  โ”œโ”€ blank  => [\t\p{Zs}] ValueIn
โ”‚  โ”œโ”€ cntrl  => [\p{Cc}] ValueIn
โ”‚  โ”œโ”€ graph  => [^\p{Z}\p{C}] ValueNotIn
โ”‚  โ”œโ”€ print  => [\p{C}] ValueIn
โ”‚  โ”œโ”€ punct  => [\p{P}] ValueIn
โ”‚  โ””โ”€ space  => [\r\v\n\f\t\p{Z}] ValueIn
โ””โ”€ \:\] 


By default, characters with values greater than 128 do not match any of the POSIX character classes. However, if the PCREUCP option is passed to pcrecompile(), some of the classes are changed so that Unicode character properties are used. This is achieved by replacing certain POSIX classes by other sequences, as follows:

  • [:alnum:] becomes \p{Xan}
  • [:alpha:] becomes \p{L}
  • [:blank:] becomes \h
  • [:digit:] becomes \p{Nd}
  • [:lower:] becomes \p{Ll}
  • [:space:] becomes \p{Xps}
  • [:upper:] becomes \p{Lu}
  • [:word:] becomes \p{Xwd}
Base.:== โ€” Method

equal iif values of .match, .offset, .ncodeunits and .captures are equal.

CombinedParsers.Regexp.@pcre_tests โ€” Macro

Define @syntax pcre_test and @syntax pcre_tests for parsing unit test output of the PCRE library. The parser is used for testing CombinedParser and benchmarking against Regex.


Missing docstring.

Missing docstring for CombinedParsers._iterate(::CombinedParsers.Regexp.ParserWithCaptures,::CombinedParsers.Regexp.SequenceWithCaptures,a...). Check Documenter's build log for details.

Parsing Options

PCRE options are supported

CombinedParsers.Regexp.with_options โ€” Function

Return 'xifiszero(0), otherwiseStringWithOptionswithflags`.


Return 'xifiszero(0), otherwiseCharWithOptionswithflags`.


Return with_options(parse_options(options),x), see parse_options.

with_options(set_flags::UInt32, unset_flags::UInt32,x)

Set options set_flags | ( x.flags & ~unset_flags ) if x isa WithOptions, set options set_flags otherwise.

CombinedParsers.Regexp.parse_options โ€” Function

Return PCRE option mask parsed from options.

Parser for flags in @re_str.

julia> CombinedParsers.Regexp.pcre_options_parser
๐Ÿ—„ Sequence[2]
โ”œโ”€ ^ AtStart
โ”œโ”€ ๐Ÿ—„* Sequence[1] |> Repeat |> map(splat_or)
โ”‚  โ”œโ”€ |๐Ÿ—„ Either
โ”‚  โ”‚  โ”œโ”€ dupnames  => 0x00000040 |> with_name(:DUPNAMES)
โ”‚  โ”‚  โ”œโ”€ xx  => 0x01000000 |> with_name(:EXTENDED_MORE)
โ”‚  โ”‚  โ”œโ”€ i  => 0x00000008 |> with_name(:CASELESS)
โ”‚  โ”‚  โ”œโ”€ m  => 0x00000400 |> with_name(:MULTILINE)
โ”‚  โ”‚  โ”œโ”€ n  => 0x00002000 |> with_name(:NO_AUTO_CAPTURE)
โ”‚  โ”‚  โ”œโ”€ U  => 0x00040000 |> with_name(:UNGREEDY)
โ”‚  โ”‚  โ”œโ”€ J  => 0x00000040 |> with_name(:DUPNAMES)
โ”‚  โ”‚  โ”œโ”€ s  => 0x00000020 |> with_name(:DOTALL)
โ”‚  โ”‚  โ”œโ”€ x  => 0x00000080 |> with_name(:EXTENDED)
โ”‚  โ”‚  โ”œโ”€ B  => 0x00000000 |> with_name(:BINCODE)
โ”‚  โ”‚  โ””โ”€ I  => 0x00000000 |> with_name(:INFO)
โ”‚  โ””โ”€ ,? |missing
โ””โ”€ $ AtEnd
CombinedParsers.Regexp.StringWithOptions โ€” Type

A lazy element transformation type (e.g. AbstractString), getindex wraps elements in with_options(flags,...).

With parsing options

TODO: make flags a transformation function?

CombinedParsers.Regexp.CharWithOptions โ€” Type

A lazy element transformation type (e.g. AbstractString), getindex wraps elements in with_options(flags,...).

With parsing options

TODO: make flags a transformation function?

CombinedParsers.Regexp.on_options โ€” Function

create parser that matches if flags are set in sequence, and parser matches.

Used for PCRE parsing, e.g.

           '^' => at_linestart),
    parser('^' => AtStart())
CombinedParsers.Regexp.FilterOptions โ€” Type

Lazy wrapper for a sequence, masking elements in getindex with MatchingNever if any of flags are not set.

TODO: make flags a filter function? resolve confound of sequence and value, like StringWithOptions, CharWithOptions

Regular Expression Types

CombinedParsers.Regexp.SequenceWithCaptures โ€” Type

SequenceWithCaptures ensapsulates a sequence to be parsed, and parsed captures.

This struct will allow for captures a sequence-level state. For next version, a match-level state passed as iterate_state argument is considered.

See also ParserWithCaptures

CombinedParsers.Regexp.Backreference โ€” Type



Parser matching previously captured sequence, optionally with a name. index field is recursively set when calling 'ParserWithCaptures` on the parser.

CombinedParsers.Regexp.Subroutine โ€” Type

Parser matching preceding capture, optionally with a name. index field is recursively set when calling ParserWithCaptures on the parser.

CombinedParsers.Regexp.Conditional โ€” Type

Conditional parser, iterate_state cycles conditionally on iterate_state_condition through matches in field yes and no respectively.

CombinedParsers.Regexp.DupSubpatternNumbers โ€” Type

Parser wrapper for ParserWithCaptures, setting resetindex=true in `deepmapparser(::typeof(indexedcaptures),...)`.

julia> p = re"(?|(a)|(b))\1"
๐Ÿ—„ Sequence |> regular expression combinator with 1 capturing groups
โ”œโ”€ |๐Ÿ—„ Either |> DupSubpatternNumbers
โ”‚  โ”œโ”€ (a)  |> Capture 1
โ”‚  โ””โ”€ (b)  |> Capture 1
โ””โ”€ \g{1} Backreference
::Tuple{Char, AbstractString}

julia> match(p, "aa")
ParseMatch("aa", 1="a")

julia> match(p, "bb")
ParseMatch("bb", 1="b")

See also pcre doc