NewlineLexers.jl

Quote-aware newline finder.

julia> data = collect(codeunits(""" abc\n "efg\n" \n """));

# Quote-unaware newline finder
julia> newlines = findall(==(UInt8('\n')), data)
3-element Vector{Int64}:
  5
 11
 14

#                                escape       open quote  close quote
julia> l = Lexer(IOBuffer(data), UInt8('\\'), UInt8('"'), UInt8('"'));

julia> out = Int32[];

# Doesn't include the newline that appears inside a string
julia> find_newlines!(l, data, out); # max size of `data` is 2GiB

julia> out
2-element Vector{Int32}:
  5
 14

Acknowledgement

This package was heavily inspired by the simdjson library by Daniel Lemire, namely by his branchless approach to finding escape characters which we reused almost verbatim where applicable.

Note

To avoid codegen issues with PackageCompiler we disable the usage of avx2/sse3/clmul instructions unless NEWLINELEXERS_NATIVE variable is set.