FileIO

Build StatusBuild statusCoverage Status

FileIO aims to provide a common framework for detecting file formats and dispatching to appropriate readers/writers. The two core functions in this package are called load and save, and offer high-level support for formatted files (in contrast with julia's low-level read and write). To avoid name conflicts, packages that provide support for standard file formats through functions named load and save are encouraged to extend the definitions here. Supported Files

Installation

Install FileIO via Pkg.add("FileIO").

Usage

If your format has been registered, it might be as simple as

using FileIO
obj = load(filename)

to read data from a formatted file. Likewise, saving might be as simple as

save(filename, obj)

If you just want to inspect a file to determine its format, then

file = query(filename)
s = query(io)   # io is a stream

will return a File or Stream object that also encodes the detected file format.

Sometimes you want to read or write files that are larger than your available memory, or might be an unknown or infinite length (e.g. reading an audio or video stream from a socket). In these cases it might not make sense to process the whole file at once, but instead process it a chunk at a time. For these situations FileIO provides the loadstreaming and savestreaming functions, which return an object that you can read or write, rather than the file data itself.

This would look something like:

using FileIO
audio = loadstreaming("bigfile.wav")
try
    while !eof(audio)
        chunk = read(audio, 4096) # read 4096 frames
        # process the chunk
    end
finally
    close(audio)
end

or use do syntax to auto-close the stream:

using FileIO
loadstreaming("bigfile.wav") do audio
    while !eof(audio)
        chunk = read(audio, 4096) # read 4096 frames
        # process the chunk
    end
end

Note that in these cases you may want to use read! with a pre-allocated buffer for maximum efficiency.

Adding new formats

You register a new format by adding add_format(fmt, magic, extension) to the registry. To do so, please just open a pull request (you can just edit the file in Github). fmt is a DataFormat type, most conveniently created as format"IDENTIFIER". magic typically contains the magic bytes that identify the format. Here are some examples:

# A straightforward format
add_format(format"PNG", [0x89,0x50,0x4e,0x47,0x0d,0x0a,0x1a,0x0a], ".png")

# A format that uses only ASCII characters in its magic bytes, and can
# have one of two possible file extensions
add_format(format"NRRD", "NRRD", [".nrrd",".nhdr"])

# A format whose magic bytes might not be at the beginning of the file,
# necessitating a custom function `detecthdf5` to find them
add_format(format"HDF5", detecthdf5, [".h5", ".hdf5"])

# A fictitious format that, unfortunately, provides no magic
# bytes. Here we have to place our faith in the file extension.
add_format(format"DICEY", (), ".dcy")

You can also declare that certain formats require certain packages for I/O support:

add_loader(format"HDF5", :HDF5)
add_saver(format"PNG", :ImageMagick)

These packages will be automatically loaded as needed. You can also define the loaders and savers in a short form like this:

add_format(format"OFF", "OFF", ".off", [:MeshIO])

This means MeshIO supports loading and saving of the off format. You can add multiple loaders and specifiers like this:

add_format(
    format"BMP",
    UInt8[0x42,0x4d],
    ".bmp",
    [:OSXNativeIO, LOAD, OSX],
    [:ImageMagick]
)

This means, OSXNative has first priority (gets loaded first) and only supports loading bmp on OSX. So on windows, OSXNativeIO will be ignored and ImageMagick has first priority. You can add any combination of LOAD, SAVE, OSX, Unix, Windows and Linux.

Users are encouraged to contribute these definitions to the registry.jl file of this package, so that information about file formats exists in a centralized location.

Implementing loaders/savers

In your package, write code like the following:

using FileIO

# See important note about scope below
function load(f::File{format"PNG"})
    open(f) do s
        skipmagic(s)  # skip over the magic bytes
        # You can just call the method below...
        ret = load(s)
        # ...or implement everything here instead
    end
end

# You can support streams and add keywords:
function load(s::Stream{format"PNG"}; keywords...)
    # s is already positioned after the magic bytes
    # Do the stuff to read a PNG file
    chunklength = read(s, UInt32)
    ...
end

function save(f::File{format"PNG"}, data)
    open(f, "w") do s
        # Don't forget to write the magic bytes!
        write(s, magic(format"PNG"))
        # Do the rest of the stuff needed to save in PNG format
    end
end

Note that these are load and save, notFileIO.load and FileIO.save. Because a given format might have multiple packages that are capable of reading it, FileIO will dispatch to these using module-scoping, e.g., SomePkg.load(args...). Consequently, packages should define "private" load and save methods (also loadstreaming and savestreaming if you implement them), and not extend (import) FileIO's.

If you run into a naming conflict with the load and save functions (for example, you already have another function in your package that has one of these names), you can instead name your loaders fileio_load, fileio_save etc. Note that you cannot mix and match these styles: either all your loaders have to be named load, or all of them should be called fileio_load, but you cannot use both conventions in one module.

load(::File) and save(::File) should close any streams they open. (If you use the do syntax, this happens for you automatically even if the code inside the do scope throws an error.) Conversely, load(::Stream) and save(::Stream) should not close the input stream.

loadstreaming and savestreaming use the same query mechanism, but return a decoded stream that users can read or write. You should also implement a close method on your reader or writer type. Just like with load and save, if the user provided a filename, your close method should be responsible for closing any streams you opened in order to read or write the file. If you are given a Stream, your close method should only do the clean up for your reader or writer type, not close the stream.

struct WAVReader
    io::IO
    ownstream::Bool
end

function Base.read(reader::WAVReader, frames::Int)
    # read and decode audio samples from reader.io
end

function Base.close(reader::WAVReader)
    # do whatever cleanup the reader needs
    reader.ownstream && close(reader.io)
end

# FileIO has fallback functions that make these work using `do` syntax as well,
# and will automatically call `close` on the returned object.
loadstreaming(f::File{format"WAV"}) = WAVReader(open(f), true)
loadstreaming(s::Stream{format"WAV"}) = WAVReader(s, false)

If you choose to implement loadstreaming and savestreaming in your package, you can easily add save and load methods in the form of:

function save(q::Formatted{format"WAV"}, data, args...; kwargs...)
    savestreaming(q, args...; kwargs...) do stream
        write(stream, data)
    end
end

function load(q::Formatted{format"WAV"}, args...; kwargs...)
    loadstreaming(q, args...; kwargs...) do stream
        read(stream)
    end
end

Help

You can get an API overview by typing ?FileIO at the REPL prompt. Individual functions have their own help too, e.g., ?add_format.