Parquet2
A pure Julia implementation of the apache parquet format.
Installation
using Pkg; Pkg.add("Parquet2")
or, in the REPL ]add Parquet2
Basic Usage
using Parquet2, Tables, DataFrames
ds = Parquet2.Dataset("/path/to/file")
sch = Tables.schema(ds) # view table schema
t = Tables.columntable(ds) # load as a NamedTuple of columns
df = DataFrame(ds; copycols=false) # load entire dataset as a DataFrame
df1 = DataFrame(ds[1]; copycols=false) # load first RowGroup as a DataFrame
# load *only* columns (col1, col2) as a DataFrame
dfc = ds |> TableOperations.select(:col1, :col2) |> DataFrame
# understands other data sources with extensions (see docs)
s3ds = Parquet2.Dataset("s3://path/to/file")
# can load multi-file datasets
dsd = Parquet2.Dataset("/path/to/directory/")
# write a file
df = DataFrame(A=1:5, B=randn(5))
Parquet2.writefile("/path/to/directory/", df)
# write a file to S3
Parquet2.writefile("s3://path/to/file", df)
For more information please see the documentation.