Module for loading, and saving of ARFFFiles.

See ARFFFiles.load and


Represents the header information in an ARFF file.

It has these fields:

  • relation: the @relation name.
  • attributes: vector of each @attribute as an ARFFAttribute.

An object holding an IO stream of an ARFF file, used to access its data.

Header information is in the header field, of type ARFFHeader.

It has the following functionality:

  • nextrow(r) returns the next row of data as a NamedTuple{names, types}, or nothing if everything has been read.
  • read(r, [n]) reads up to n rows as a vector.
  • read!(xs, r) reads up to length(xs) rows into the given vector, returning the number of rows read.
  • close(r) closes the underlying IO stream, unless it was created with own=false.
  • eof(r) tests whether the IO stream is at the end.
  • Iteration yields rows of r.
  • It satisfies the Tables.jl interface, so e.g. DataFrame(r) does what you think.

Abstract type of ARFF types. Concrete subtypes are ARFFNumericType, ARFFStringType, ARFFDateType and ARFFNominalType.

load(file, ...)
load(f, file, ...)

The first form loads the entire ARFF file as a table. It is equivalent to load(readcolumns, file, ...)

The second form is equivalent to f(loadstreaming(file, ...)) but ensures that the file is closed afterwards.

See loadstreaming for the available keyword parameters.

For example load(DataFrame, file) loads the file as a DataFrame. Replace DataFrame with your favourite table type.

load_header(file, ...)

Equivalent to load(r->r.header, file, ...), which loads just the header from the given file as a ARFFHeader.

loadchunks(file, ...)
loadchunks(f, file, ...)

The first form opens the ARFF file and returns an iterator over chunks of the file. It is equivalent to Tables.partitions(loadstreaming(file, ...)).

The second form is equivalent to f(loadchunks(file, ...)) but ensures that the file is closed afterwards.

loadstreaming(io::IO, own=false; [missingcols=true], [missingnan=false], [categorical=true], [chunkbytes=2^26])
loadstreaming(filename::AbstractString; ...)

An ARFFReader object for reading the given ARFF file one record at a time.

Option missingcols specifies which columns can contain missing data. It can be :auto (columns with missing values are automatically detected, the default), :all or true (all columns), :none or false (no columns), a set or vector of column names, or a function taking a column name and returning true or false. Note that :auto does not apply if the table is being read in a streaming fashion, in which case it behaves like :all.

Option missingnan specifies whether or not to convert missing values in numeric columns to NaN. This is equivalent to excluding these columns in missingcols.

Option categorical specifies whether or not to convert nominal columns to CategoricalValue or String.

Option chunkbytes specifies approximately how many bytes to read per chunk when iterating over chunks or rows.

nextrow(r::ARFFReader{names, types}) :: Union{Nothing, NamedTuple{names, types}}

The next row of data from the given ARFFReader, or nothing if everything has been read.


Convert the given Java date format string to the equivalent Julia DateFormat.


Only the following format characters are currently supported: yy (year), Mm (month), dd (day), HH (hour), mM (minute), sS (second) and Ss (millisecond).

readcolumns(r::ARFFReader, maxbytes=nothing)

Read the data from r into a columnar table.

By default the entire table is read. If maxbytes is given, approximately this many bytes of the input stream is read instead, allowing for reading the table in chunks.

The same can be achieved by iterating over Tables.partitions(r).

save(file, table; relation="data", comment=...)

Save the Tables.jl-compatible table in ARFF format to file, which must be an IO stream or file.

The relation name is relation. The given comment is written at the top of the file.