CSV.jl Documentation
CSV.jl is built to be a fast and flexible pure-Julia library for handling delimited text files.
- CSV.jl Documentation
- Key Functions
- Examples
- Basic
- Auto-Delimiter Detection
- String Delimiter
- No Header
- Normalize Column Names
- Datarow
- Reading Chunks
- Transposed Data
- Commented Rows
- Missing Strings
- Fixed Width Files
- Quoted & Escaped Fields
- DateFormat
- Custom Decimal Separator
- Custom Bool Strings
- Matrix-like Data
- Providing Types
- Typemap
- Pooled Values
- Select/Drop Columns From File
- Reading CSV from gzip (.gz) and zip files
Key Functions
CSV.File
— Type.CSV.File(source; kwargs...) => CSV.File
Read a UTF-8 CSV input (a filename given as a String or FilePaths.jl type, or any other IO source), returning a CSV.File
object.
Opens the file and uses passed arguments to detect the number of columns and column types, unless column types are provided manually via the types
keyword argument. Note that passing column types manually can increase performance and reduce the memory use for each column type provided (column types can be given as a Vector
for all columns, or specified per column via name or index in a Dict
). For text encodings other than UTF-8, see the StringEncodings.jl package for re-encoding a file or IO stream. The returned CSV.File
object supports the Tables.jl interface and can iterate CSV.Row
s. CSV.Row
supports propertynames
and getproperty
to access individual row values. CSV.File
also supports entire column access like a DataFrame
via direct property access on the file object, like f = CSV.File(file); f.col1
. Note that duplicate column names will be detected and adjusted to ensure uniqueness (duplicate column name a
will become a_1
). For example, one could iterate over a csv file with column names a
, b
, and c
by doing:
for row in CSV.File(file)
println("a=$(row.a), b=$(row.b), c=$(row.c)")
end
By supporting the Tables.jl interface, a CSV.File
can also be a table input to any other table sink function. Like:
# materialize a csv file as a DataFrame, without copying columns from CSV.File; these columns are read-only
df = CSV.File(file) |> DataFrame!
# load a csv file directly into an sqlite database table
db = SQLite.DB()
tbl = CSV.File(file) |> SQLite.load!(db, "sqlite_table")
Supported keyword arguments include:
- File layout options:
header=1
: theheader
argument can be anInt
, indicating the row to parse for column names; or aRange
, indicating a span of rows to be concatenated together as column names; or an entireVector{Symbol}
orVector{String}
to use as column names; if a file doesn't have column names, either provide them as aVector
, or setheader=0
orheader=false
and column names will be auto-generated (Column1
,Column2
, etc.)normalizenames=false
: whether column names should be "normalized" into valid Julia identifier symbols; useful when iterating rows and accessing column values of a row viagetproperty
(e.g.row.col1
)datarow
: anInt
argument to specify the row where the data starts in the csv file; by default, the next row after theheader
row is used. Ifheader=0
, then the 1st row is assumed to be the start of dataskipto::Int
: similar todatarow
, specifies the number of rows to skip before starting to read datafooterskip::Int
: number of rows at the end of a file to skip parsinglimit
: anInt
to indicate a limited number of rows to parse in a csv file; use in combination withskipto
to read a specific, contiguous chunk within a filetranspose::Bool
: read a csv file "transposed", i.e. each column is parsed as a rowcomment
: rows that begin with thisString
will be skipped while parsinguse_mmap::Bool=!Sys.iswindows()
: whether the file should be mmapped for reading, which in some cases can be fasterignoreemptylines::Bool=false
: whether empty rows/lines in a file should be ignored (iffalse
, each column will be assignedmissing
for that empty row)threaded::Bool
: whether parsing should utilize multiple threads; by default threads are used on large enough files, but isn't allowed whentranspose=true
or whenlimit
is used; only available in Julia 1.3+select
: anAbstractVector
ofInt
,Symbol
,String
, orBool
, or a "selector" function of the form(i, name) -> keep::Bool
; only columns in the collection or for which the selector function returnstrue
will be parsed and accessible in the resultingCSV.File
. Invalid values inselect
are ignored.drop
: inverse ofselect
; anAbstractVector
ofInt
,Symbol
,String
, orBool
, or a "drop" function of the form(i, name) -> drop::Bool
; columns in the collection or for which the drop function returnstrue
will ignored in the resultingCSV.File
. Invalid values indrop
are ignored.
- Parsing options:
missingstrings
,missingstring
: either aString
, orVector{String}
to use as sentinel values that will be parsed asmissing
; by default, only an empty field (two consecutive delimiters) is consideredmissing
delim=','
: aChar
orString
that indicates how columns are delimited in a file; if no argument is provided, parsing will try to detect the most consistent delimiter on the first 10 rows of the fileignorerepeated::Bool=false
: whether repeated (consecutive) delimiters should be ignored while parsing; useful for fixed-width files with delimiter padding between cellsquotechar='"'
,openquotechar
,closequotechar
: aChar
(or different start and end characters) that indicate a quoted field which may contain textual delimiters or newline charactersescapechar='"'
: theChar
used to escape quote characters in a quoted fielddateformat::Union{String, Dates.DateFormat, Nothing}
: a date format string to indicate how Date/DateTime columns are formatted for the entire filedecimal='.'
: aChar
indicating how decimals are separated in floats, i.e.3.14
used '.', or3,14
uses a comma ','truestrings
,falsestrings
:Vectors of Strings
that indicate howtrue
orfalse
values are represented; by default onlytrue
andfalse
are treated asBool
- Column Type Options:
type
: a single type to use for parsing an entire file; i.e. all columns will be treated as the same type; useful for matrix-like data filestypes
: a Vector or Dict of types to be used for column types; a Dict can map column indexInt
, or nameSymbol
orString
to type for a column, i.e. Dict(1=>Float64) will set the first column as a Float64, Dict(:column1=>Float64) will set the column named column1 to Float64 and, Dict("column1"=>Float64) will set the column1 to Float64; if aVector
if provided, it must match the # of columns provided or detected inheader
typemap::Dict{Type, Type}
: a mapping of a type that should be replaced in every instance with another type, i.e.Dict(Float64=>String)
would change every detectedFloat64
column to be parsed asString
pool::Union{Bool, Float64}=0.1
: iftrue
, all columns detected asString
will be internally pooled; alternatively, the proportion of unique values below whichString
columns should be pooled (by default 0.1, meaning that if the # of unique strings in a column is under 10%, it will be pooled)categorical::Bool=false
: whether pooled columns should be copied as CategoricalArray instead of PooledArray; note that inCSV.read
, by default, columns are not copied, so pooled columns will have typeCSV.Column{String, PooledString}
; to getCategoricalArray
columns, also passcopycols=true
strict::Bool=false
: whether invalid values should throw a parsing error or be replaced withmissing
silencewarnings::Bool=false
: ifstrict=false
, whether invalid value warnings should be silenced
CSV.read
— Function.CSV.read(source; copycols::Bool=false, kwargs...)
=> DataFrame
Parses a delimited file into a DataFrame
. copycols
determines whether a copy of columns should be made when creating the DataFrame; by default, no copy is made, and the DataFrame is built with immutable, read-only CSV.Column
vectors. If mutable operations are needed on the DataFrame columns, set copycols=true
.
CSV.read
supports the same keyword arguments as CSV.File
.
CSV.Rows
— Type.CSV.Rows(source; kwargs...) => CSV.Rows
Read a csv input (a filename given as a String or FilePaths.jl type, or any other IO source), returning a CSV.Rows
object.
While similar to CSV.File
, CSV.Rows
provides a slightly different interface, the tradeoffs including:
- Very minimal memory footprint; while iterating, only the current row values are buffered
- Only provides row access via iteration; to access columns, one can stream the rows into a table type
- Performs no type inference; each column/cell is essentially treated as
Union{String, Missing}
, users can utilize the performantParsers.parse(T, str)
to convert values to a more specific type if needed
Opens the file and uses passed arguments to detect the number of columns, ***but not*** column types. The returned CSV.Rows
object supports the Tables.jl interface and can iterate rows. Each row object supports propertynames
, getproperty
, and getindex
to access individual row values. Note that duplicate column names will be detected and adjusted to ensure uniqueness (duplicate column name a
will become a_1
). For example, one could iterate over a csv file with column names a
, b
, and c
by doing:
for row in CSV.Rows(file)
println("a=$(row.a), b=$(row.b), c=$(row.c)")
end
Supported keyword arguments include:
- File layout options:
header=1
: theheader
argument can be anInt
, indicating the row to parse for column names; or aRange
, indicating a span of rows to be concatenated together as column names; or an entireVector{Symbol}
orVector{String}
to use as column names; if a file doesn't have column names, either provide them as aVector
, or setheader=0
orheader=false
and column names will be auto-generated (Column1
,Column2
, etc.)normalizenames=false
: whether column names should be "normalized" into valid Julia identifier symbols; useful when iterating rows and accessing column values of a row viagetproperty
(e.g.row.col1
)datarow
: anInt
argument to specify the row where the data starts in the csv file; by default, the next row after theheader
row is used. Ifheader=0
, then the 1st row is assumed to be the start of dataskipto::Int
: similar todatarow
, specifies the number of rows to skip before starting to read datalimit
: anInt
to indicate a limited number of rows to parse in a csv file; use in combination withskipto
to read a specific, contiguous chunk within a filetranspose::Bool
: read a csv file "transposed", i.e. each column is parsed as a rowcomment
: rows that begin with thisString
will be skipped while parsinguse_mmap::Bool=!Sys.iswindows()
: whether the file should be mmapped for reading, which in some cases can be fasterignoreemptylines::Bool=false
: whether empty rows/lines in a file should be ignored (iffalse
, each column will be assignedmissing
for that empty row)
- Parsing options:
missingstrings
,missingstring
: either aString
, orVector{String}
to use as sentinel values that will be parsed asmissing
; by default, only an empty field (two consecutive delimiters) is consideredmissing
delim=','
: aChar
orString
that indicates how columns are delimited in a file; if no argument is provided, parsing will try to detect the most consistent delimiter on the first 10 rows of the fileignorerepeated::Bool=false
: whether repeated (consecutive) delimiters should be ignored while parsing; useful for fixed-width files with delimiter padding between cellsquotechar='"'
,openquotechar
,closequotechar
: aChar
(or different start and end characters) that indicate a quoted field which may contain textual delimiters or newline charactersescapechar='"'
: theChar
used to escape quote characters in a quoted fieldstrict::Bool=false
: whether invalid values should throw a parsing error or be replaced withmissing
silencewarnings::Bool=false
: ifstrict=false
, whether warnings should be silenced
- Iteration options:
reusebuffer=false
: while iterating, whether a single row buffer should be allocated and reused on each iteration; only use if each row will be iterated once and not re-used (e.g. it's not safe to use this option if doingcollect(CSV.Rows(file))
)
CSV.write
— Function.CSV.write(file, table; kwargs...) => file
table |> CSV.write(file; kwargs...) => file
Write a Tables.jl interface input to a csv file, given as an IO
argument or String
/FilePaths.jl type representing the file name to write to.
Supported keyword arguments include:
delim::Union{Char, String}=','
: a character or string to print out as the file's delimiterquotechar::Char='"'
: ascii character to use for quoting text fields that may contain delimiters or newlinesopenquotechar::Char
: instead ofquotechar
, useopenquotechar
andclosequotechar
to support different starting and ending quote charactersescapechar::Char='"'
: ascii character used to escape quote characters in a text fieldmissingstring::String=""
: string to print formissing
valuesdateformat=Dates.default_format(T)
: the date format string to use for printing outDate
&DateTime
columnsappend=false
: whether to append writing to an existing file/IO, iftrue
, it will not write column names by defaultwriteheader=!append
: whether to write an initial row of delimited column names, not written by default if appendingheader
: pass a list of column names (Symbols or Strings) to use instead of the column names of the input tablenewline='\n'
: character or string to use to separate rows (lines in the csv file)quotestrings=false
: whether to force all strings to be quoted or notdecimal='.'
: character to use as the decimal point when writing floating point numberstransform=(col,val)->val
: a function that is applied to every cell e.g. we can transform allnothing
values tomissing
using(col, val) -> something(val, missing)
bom=false
: whether to write a UTF-8 BOM header (0xEF 0xBB 0xBF) or not
Examples
Basic
File
col1,col2,col3,col4,col5,col6,col7,col8
,1,1.0,1,one,2019-01-01,2019-01-01T00:00:00,true
,2,2.0,2,two,2019-01-02,2019-01-02T00:00:00,false
,3,3.0,3.14,three,2019-01-03,2019-01-03T00:00:00,true
Syntax
CSV.File(file)
By default, CSV.File
will automatically detect this file's delimiter ','
, and the type of each column. By default, it treats "empty fields" as missing
(the entire first column in this example). It also automatically handles promoting types, like the 4th column, where the first two values are Int
, but the 3rd row has a Float64
value (3.14
). The resulting column's type will be Float64
. Parsing can detect Int64
, Float64
, Date
, DateTime
, and Bool
types, with String
as the fallback type for any column.
Auto-Delimiter Detection
File
col1|col2
1|2
3|4
Syntax
CSV.File(file)
By default, CSV.File
will try to detect a file's delimiter from the first 10 lines of the file; candidate delimiters include ','
, '\t'
, ' '
, '|'
, ';'
, and ':'
. If it can't auto-detect the delimiter, it will assume ','
. If your file includes a different character or string delimiter, just pass delim=X
where X
is the character or string. For this file you could also do CSV.File(file; delim='|')
.
String Delimiter
File
col1::col2
1::2
3::4
Syntax
CSV.File(file; delim="::")
In this example, our file has fields separated by the string "::"
; we can pass this as the delim
keyword argument.
No Header
File
1,2,3
4,5,6
7,8,9
Syntax
CSV.File(file; header=false)
CSV.File(file; header=["col1", "col2", "col3"])
CSV.File(file; header=[:col1, :col2, :col3])
In this file, there is no header row that contains column names. In the first option, we pass header=false
, and column names will be generated like [:Column1, :Column2, :Column3]
. In the two latter examples, we pass our own explicit column names, either as strings or symbols.
Normalize Column Names
File
column one,column two, column three
1,2,3
4,5,6
Syntax
CSV.File(file; normalizenames=true)
In this file, our column names have spaces in them. It can be convenient with a CSV.File
or DataFrame
to access entire columns via property access, e.g. if f = CSV.File(file)
with column names like [:col1, :col2]
, I can access the entire first column of the file like f.col1
, or for the second, f.col2
. The call of f.col1
actually gets rewritten to the function call getproperty(f, :col1)
, which is the function implemented in CSV.jl that returns the col1
column from the file. When a column name is not a single atom Julia identifier, this is inconvient, because f.column one
is not valid, so I would have to manually call getproperty(f, Symbol("column one")
. normalizenames=true
comes to our rescue; it will replace invalid identifier characters with underscores to ensure each column is a valid Julia identifier, so for this file, we would end up with column names like [:column_one, :column_two]
. You can call propertynames(f)
on any CSV.File
to see the parsed column names.
Datarow
File
col1,col2,col3
metadata1,metadata2,metadata3
extra1,extra2,extra3
1,2,3
4,5,6
7,8,9
Syntax
CSV.File(file; datarow=4)
CSV.File(file; skipto=4)
This file has extra rows in between our header row col1,col2,col3
and the start of our data 1,2,3
on row 4. We can use the datarow
or skipto
keyword arguments to provide a row number where the "data" of our file begins.
Reading Chunks
File
col1,col2,col3
1,2,3
4,5,6
7,8,9
10,11,12
13,14,15
16,17,18
19,20,21
Syntax
CSV.File(file; limit=3)
CSV.File(file; skipto=4, limit=1)
CSV.File(file; skipto=7, footerskip=1)
In this example, we desire to only read a subset of rows from the file. Using the limit
, skipto
, and footerskip
keyword arguments, we can specify the exact rows we wish to parse.
Transposed Data
File
col1,1,2,3
col2,4,5,6
col3,7,8,9
Syntax
CSV.File(file; transpose=true)
This file has the column names in the first column, and data that extends alongs rows horizontally. The data for col1
is all on the first row, similarly for col2
and its data on row 2. In this case, we wish to read the file "transposed", or treating rows as columns. By passing transpose=true
, CSV.jl will read column names from the first column, and the data for each column from its corresponding row.
Commented Rows
File
col1,col2,col3
# this row is commented and we'd like to ignore it while parsing
1,2,3
4,5,6
Syntax
CSV.File(file; comment="#")
CSV.File(file; datarow=3)
This file has some rows that begin with the "#"
string and denote breaks in the data for commentary. We wish to ignore these rows for purposes of reading data. We can pass comment="#"
and parsing will ignore any row that begins with this string. Alternatively, we can pass datarow=3
for this example specifically since there is only the one row to skip.
Missing Strings
File
code,age,score
0,21,3.42
1,42,6.55
-999,81,NA
-999,83,NA
Syntax
CSV.File(file; missingstring="-999")
CSV.File(file; missingstrings=["-999", "NA"])
In this file, our code
column has two expected codes, 0
and 1
, but also a few "invalid" codes, which are input as -999
. We'd like to read the column as Int64
, but treat the -999
values as "missing" values. By passing missingstring="-999"
, we signal that this value should be replaced with the literal missing
value builtin to the Julia language. We can then do things like dropmissing(f.col1)
to ignore those values, for example. In the second recommended syntax, we also want to treat the NA
values in our score
column as missing
, so we pass both strings like missingstrings=["-999", "NA"]
.
Fixed Width Files
File
col1 col2 col3
123431 2 3421
2355 346 7543
Syntax
CSV.File(file; delim=' ', ignorerepeated=true)
This is an example of a "fixed width" file, where each column is the same number of characters away from each other on each row. This is different from a normal delimited file where each occurence of a delimiter indicates a separate field. With fixed width, however, fields are "padded" with extra delimiters (in this case ' '
) so that each column is the same number of characters each time. In addition to our delim
, we can pass ignorerepeated=true
, which tells parsing that consecutive delimiters should be treated as a single delimiter.
Quoted & Escaped Fields
File
col1,col2
"quoted field with a delimiter , inside","quoted field that contains a \\n newline and ""inner quotes"""
unquoted field,unquoted field with "inner quotes"
Syntax
CSV.File(file; quotechar='"', escapechar='"')
CSV.File(file; openquotechar='"', closequotechar='"', escapechar='"')
In this file, we have a few "quoted" fields, which means the field's value starts and ends with quotechar
(or openquotechar
and closequotechar
, respectively). Quoted fields allow the field to contain characters that would otherwise be significant to parsing, such as delimiters or newline characters. When quoted, parsing will ignore these otherwise signficant characters until the closing quote character is found. For quoted fields that need to also include the quote character itself, an escape character is provided to tell parsing to ignore the next character when looking for a close quote character. In the syntax examples, the keyword arguments are passed explicitly, but these also happen to be the default values, so just doing CSV.File(file)
would result in successful parsing.
DateFormat
File
code,date
0,2019/01/01
1,2019/01/02
Syntax
CSV.File(file; dateformat="yyyy/mm/dd")
In this file, our date
column has dates that are formatted like yyyy/mm/dd
. We can pass just such a string to the dateformat
keyword argument to tell parsing to use it when looking for Date
or DateTime
columns. Note that currently, only a single dateformat
string can be passed to parsing, meaning multiple columns with different date formats cannot all be parsed as Date
/DateTime
.
Custom Decimal Separator
File
col1;col2;col3
1,01;2,02;3,03
4,04;5,05;6,06
Syntax
CSV.File(file; delim=';', decimal=',')
In many places in the world, floating point number decimals are separated with a comma instead of a period (3,14
vs. 3.14
). We can correctly parse these numbers by passing in the decimal=','
keyword argument. Note that we probably need to explicitly pass delim=';'
in this case, since the parser will probably think that it detected ','
as the delimiter.
Custom Bool Strings
File
id,paid,attended
0,T,TRUE
1,F,TRUE
2,T,FALSE
3,F,FALSE
Syntax
CSV.File(file; truestrings=["T", "TRUE"], falsestrings=["F", "FALSE"])
By default, parsing only considers the string values true
and false
as valid Bool
values. To consider alternative values, we can pass a Vector{String}
to the truestrings
and falsestrings
keyword arguments.
Matrix-like Data
File
1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0
Syntax
CSV.File(file; header=false)
CSV.File(file; header=false, delim=' ', type=Float64)
This file contains a 3x3 identity matrix of Float64
. By default, parsing will detect the delimiter and type, but we can also explicitly pass delim= ' '
and type=Float64
, which tells parsing to explicitly treat each column as Float64
, without having to guess the type on its own.
Providing Types
File
col1,col2,col3
1,2,3
4,5,invalid
6,7,8
Syntax
CSV.File(file; types=Dict(3 => Int))
CSV.File(file; types=Dict(:col3 => Int))
CSV.File(file; types=Dict("col3" => Int))
CSV.File(file; types=[Int, Int, Int])
CSV.File(file; types=[Int, Int, Int], silencewarnings=true)
CSV.File(file; types=[Int, Int, Int], strict=true)
In this file, our 3rd column has an invalid value on the 2nd row invalid
. Let's imagine we'd still like to treat it as an Int
column, and ignore the invalid
value. The syntax examples provide several ways we can tell parsing to treat the 3rd column as Int
, by referring to column index 3
, or column name with Symbol
or String
. We can also provide an entire Vector
of types for each column (and which needs to match the length of columns in the file). There are two additional keyword arguments that control parsing behavior; in the first 4 syntax examples, we would see a warning printed like "warning: invalid Int64 value on row 2, column 3"
. In the fifth example, passing silencewarnings=true
will suppress this warning printing. In the last syntax example, passing strict=true
will result in an error being thrown during parsing.
Typemap
File
zipcode,score
03494,9.9
12345,6.7
84044,3.4
Syntax
CSV.File(file; typemap=Dict(Int => String))
CSV.File(file; types=Dict(:zipcode => String))
In this file, we have U.S. zipcodes in the first column that we'd rather not treat as Int
, but parsing will detect it as such. In the first syntax example, we pass typemap=Dict(Int => String)
, which tells parsing to treat any detected Int
columns as String
instead. In the second syntax example, we alternatively set the zipcode
column type manually.
Pooled Values
File
id,code
A18E9,AT
BF392,GC
93EBC,AT
54EE1,AT
8CD2E,GC
Syntax
CSV.File(file)
CSV.File(file; pool=0.4)
CSV.File(file; pool=0.6)
In this file, we have an id
column and a code
column. There can be advantages with various DataFrame/table operations like joining and grouping when String
values are "pooled", meaning each unique value is mapped to a UInt64
. By default, pool=0.1
, so string columns with low cardinality are pooled by default. Via the pool
keyword argument, we can provide greater control: pool=0.4
means that if 40% or less of a column's values are unique, then it will be pooled.
Select/Drop Columns From File
File
a,b,c
1,2,3
4,5,6
7,8,9
Syntax
# select
CSV.File(file; select=[1, 3])
CSV.File(file; select=[:a, :c])
CSV.File(file; select=["a", "c"])
CSV.File(file; select=[true, false, true])
CSV.File(file; select=(i, nm) -> i in (1, 3))
# drop
CSV.File(file; drop=[2])
CSV.File(file; drop=[:b])
CSV.File(file; drop=["b"])
CSV.File(file; drop=[false, true, false])
CSV.File(file; drop=(i, nm) -> i == 2)
For this file, we have columns a
, b
, and c
; we might only be interested in the data in columns a
and c
. Using the select
or drop
keyword arguments can allow efficiently choosing of columns from a file; columns not selected or dropped will be efficiently skipped while parsing, allowing for performance boosts. The arguments to select
or drop
can be one of: AbstractVector{Int}
a collection of column indices; AbstractVector{Symbol}
or AbstractVector{String}
a collection of column names as Symbol
or String
; AbstractVector{Bool}
a collection of Bool
equal in length to the # of columns signaling whether a column should be selected or dropped; or a selector/drop function of the form (i, name) -> keep_or_drop::Bool
, i.e. it takes a column index i
and column name name
and returns a Bool
signaling whether a column should be selected or dropped.
Reading CSV from gzip (.gz) and zip files
Example: reading from a gzip (.gz) file
using CSV, DataFrames, CodecZlib
a = DataFrame(a = 1:3)
CSV.write("a.csv", a)
# Windows users who do not have gzip available on the PATH should manually gzip the CSV
;gzip a.csv
a_copy = open("a.csv.gz") do io
CSV.read(GzipDecompressorStream(io))
end
a == a_copy # true; restored successfully
Example: reading from a zip file
using ZipFile, CSV, DataFrames
a = DataFrame(a = 1:3)
CSV.write("a.csv", a)
# zip the file; Windows users who do not have zip available on the PATH can manual zip the CSV
;zip a.zip a.csv
z = ZipFile.Reader("a.zip")
# identify the right file in zip
a_file_in_zip = filter(x->x.name == "a.csv", z.files)[1]
a_copy = CSV.read(a_file_in_zip)
a == a_copy