Parquet2
A pure Julia implementation of the apache parquet format.
Installation
using Pkg; Pkg.add("Parquet2")
or, in the REPL ]add Parquet2
Basic Usage
using Parquet2, Tables, DataFrames
ds = Parquet2.Dataset("/path/to/file")
sch = Tables.schema(ds) # view table schema
t = Tables.columntable(ds) # load as a NamedTuple of columns
df = DataFrame(ds; copycols=false) # load entire dataset as a DataFrame
df1 = DataFrame(ds[1]; copycols=false) # load first RowGroup as a DataFrame
# load *only* columns (col1, col2) as a DataFrame
dfc = ds |> TableOperations.select(:col1, :col2) |> DataFrame
using AWSS3 # for recognizing S3 url's
s3ds = Parquet2.Dataset("s3://path/to/file")
# can load multi-file datasets
dsd = Parquet2.Dataset("/path/to/directory/")
# multi-file datasets don't read everything by default
append!(dsd, A="1", B="alpha") # can append by partition columns
# or read it all eagerly (WARNING! don't do this for gigantic directories)
dsd = Parquet2.Dataset("/path/to/directory/"; load_initial=true)
# write a file
df = DataFrame(A=1:5, B=randn(5))
Parquet2.writefile("/path/to/file", df)
# write a file to S3
Parquet2.writefile("s3://path/to/file", df)
For more information please see the documentation.