HDF5.jl

Overview

HDF5 stands for Hierarchical Data Format v5 and is closely modeled on file systems. In HDF5, a "group" is analogous to a directory, a "dataset" is like a file. HDF5 also uses "attributes" to associate metadata with a particular group or dataset. HDF5 uses ASCII names for these different objects, and objects can be accessed by Unix-like pathnames, e.g., "/sample1/tempsensor/firsttrial" for a top-level group "sample1", a subgroup "tempsensor", and a dataset "firsttrial".

For simple types (scalars, strings, and arrays), HDF5 provides sufficient metadata to know how each item is to be interpreted. For example, HDF5 encodes that a given block of bytes is to be interpreted as an array of Int64, and represents them in a way that is compatible across different computing architectures.

However, to preserve Julia objects, one generally needs additional type information to be supplied, which is easy to provide using attributes. This is handled for you automatically in the JLD/JLD2. These specific formats (conventions) provide "extra" functionality, but they are still both regular HDF5 files and are therefore compatible with any HDF5 reader or writer.

Language wrappers for HDF5 are often described as either "low level" or "high level." This package contains both flavors: at the low level, it directly wraps HDF5's functions, thus copying their API and making them available from within Julia. At the high level, it provides a set of functions built on the low-level wrap which may make the usage of this library more convenient.

Installation

julia>]
pkg> add HDF5

Starting from Julia 1.3, the HDF5 binaries are by default downloaded using the HDF5_jll package.

Using custom or system provided HDF5 binaries

To use system-provided HDF5 binaries instead, set the environment variable JULIA_HDF5_PATH to the top-level installation directory HDF5, i.e. the library should be located in ${JULIA_HDF5_PATH}/lib or ${JULIA_HDF5_PATH}/lib64 . Then run import Pkg; Pkg.build("HDF5"). In particular, this is required if you need parallel HDF5 support, which is not provided by the HDF5_jll binaries.

If the library is in your library search path, then JULIA_HDF5_PATH can be set to an empty string.

For example, to use HDF5 (libhdf5-mpich-dev) with MPI using system libraries on Ubuntu 20.04, you would run:

$ sudo apt install mpich libhdf5-mpich-dev
$ JULIA_HDF5_PATH=/usr/lib/x86_64-linux-gnu/hdf5/mpich/
$ JULIA_MPI_BINARY=system

Then in Julia, run:

pkg> build

Opening and closing files

"Plain" (i.e., with no extra formatting conventions) HDF5 files are created and/or opened with the h5open command:

fid = h5open(filename, mode)

The mode can be any one of the following:

modeMeaning
"r"read-only
"r+"read-write, preserving any existing contents
"cw"read-write, create file if not existing, preserve existing contents
"w"read-write, destroying any existing contents (if any)

For example

julia> using HDF5
julia> fname = tempname(); # temporary file
julia> fid = h5open(fname, "w")πŸ—‚οΈ HDF5.File: (read-write) /tmp/jl_0XIjWfcdED

This produces an object of type HDF5File, a subtype of the abstract type DataFile. This file will have no elements (groups, datasets, or attributes) that are not explicitly created by the user.

When you're finished with a file, you should close it:

close(fid)

Closing a file also closes any other open objects (e.g., datasets, groups) in that file. In general, you need to close an HDF5 file to "release" it for use by other applications.

Creating a group or dataset

Groups can be created via the function create_group

julia> create_group(fid, "mygroup")πŸ“‚ HDF5.Group: /mygroup (file: /tmp/jl_0XIjWfcdED)

We can write the "mydataset" by indexing into fid. This also happens to write data to the dataset.

julia> fid["mydataset"] = rand()0.2646535687657876

Alternatively, we can call create_dataset, which does not write data to the dataset. It merely creates the dataset.

julia> create_dataset(fid, "myvector", Int, (10,))πŸ”’ HDF5.Dataset: /myvector (file: /tmp/jl_0XIjWfcdED xfer_mode: 0)

Creating a dataset within a group is as simple as indexing into the group with the name of the dataset or calling create_dataset with the group as the first argument.

julia> g = fid["mygroup"]πŸ“‚ HDF5.Group: /mygroup (file: /tmp/jl_0XIjWfcdED)
julia> g["mydataset"] = "Hello World!""Hello World!"
julia> create_dataset(g, "myvector", Int, (10,))πŸ”’ HDF5.Dataset: /mygroup/myvector (file: /tmp/jl_0XIjWfcdED xfer_mode: 0)

The do syntax is also supported. The file, group, and dataset handles will automatically be closed after the do block terminates.

julia> h5open("example2.h5", "w") do fid
           g = create_group(fid, "mygroup")
           dset = create_dataset(g, "myvector", Float64, (10,))
           write(dset,rand(10))
       end

Opening and closing objects

If you have a file object fid, and this has a group or dataset called "mygroup" at the top level of a file, you can open it in the following way:

julia> obj = fid["mygroup"]πŸ“‚ HDF5.Group: /mygroup (file: /tmp/jl_0XIjWfcdED)
β”œβ”€ πŸ”’ mydataset
└─ πŸ”’ myvector

This does not read any data or attributes associated with the object, it's simply a handle for further manipulations. For example:

julia> g = fid["mygroup"]πŸ“‚ HDF5.Group: /mygroup (file: /tmp/jl_0XIjWfcdED)
β”œβ”€ πŸ”’ mydataset
└─ πŸ”’ myvector
julia> dset = g["mydataset"]πŸ”’ HDF5.Dataset: /mygroup/mydataset (file: /tmp/jl_0XIjWfcdED xfer_mode: 0)

or simply

julia> dset = fid["mygroup/mydataset"]πŸ”’ HDF5.Dataset: /mygroup/mydataset (file: /tmp/jl_0XIjWfcdED xfer_mode: 0)

When you're done with an object, you can close it using close(obj). If you forget to do this, it will be closed for you anyway when the file is closed, or if obj goes out of scope and gets garbage collected.

Reading and writing data

Suppose you have a group g which contains a dataset with path "mydataset", and that you've also opened this dataset as dset = g["mydataset"]. You can read information in this dataset in any of the following ways:

A = read(dset)
A = read(g, "mydataset")
Asub = dset[2:3, 1:3]

The last syntax reads just a subset of the data array (assuming that dset is an array of sufficient size). libhdf5 has internal mechanisms for slicing arrays, and consequently if you need only a small piece of a large array, it can be faster to read just what you need rather than reading the entire array and discarding most of it.

Datasets can be created with either

g["mydataset"] = rand(3,5)
write(g, "mydataset", rand(3,5))

One can use the high level interface load and save from FileIO, where an optional OrderedDict can be passed (track_order inferred). Note that using track_order=true or passing an OrderedDict is a promise that the read file has been created with the appropriate ordering flags.

julia> using OrderedCollections, FileIO
julia> save("track_order.h5", OrderedDict("z"=>1, "a"=>2, "g/f"=>3, "g/b"=>4))
julia> load("track_order.h5"; dict=OrderedDict())
OrderedDict{Any, Any} with 4 entries:
  "z"   => 1
  "a"   => 2
  "g/f" => 3
  "g/b" => 4

Passing parameters

It is often required to pass parameters to specific routines, which are collected in so-called property lists in HDF5. There are different property lists for different tasks, e.g. for the access/creation of files, datasets, groups. In this high level framework multiple parameters can be simply applied by appending them at the end of function calls as keyword arguments.

g["A"] = A  # basic
g["A", chunk=(5,5)] = A # add chunks

B = h5read(fn,"mygroup/B", # two parameters
  fapl_mpio=(ccomm,cinfo), # if parameter requires multiple args use tuples
  dxpl_mpio=HDF5.H5FD_MPIO_COLLECTIVE )

This will automatically create the correct property lists, add the properties, and apply the property list while reading/writing the data. The naming of the properties generally follows that of HDF5, i.e. the key fapl_mpio returns the HDF5 functions h5pget/set_fapl_mpio and their corresponding property list type H5P_FILE_ACCESS. The complete list if routines and their interfaces is available at the H5P: Property List Interface documentation. Note that not all properties are available. When searching for a property check whether the corresponding h5pget/set functions are available.

Chunking and compression

You can also optionally "chunk" and/or compress your data. For example,

A = rand(100,100)
g["A", chunk=(5,5)] = A

stores the matrix A in 5-by-5 chunks. Chunking improves efficiency if you write or extract small segments or slices of an array, if these are not stored contiguously.

A = rand(100,100)
g1["A", chunk=(5,5), compress=3] = A
g2["A", chunk=(5,5), shuffle=(), deflate=3] = A
using H5Zblosc # load in Blosc
g3["A", chunk=(5,5), blosc=3] = A

Standard compression in HDF5 ("compress") corresponds to ("deflate") and uses the deflate/zlib algorithm. The deflate algorithm is often more efficient if prefixed by a "shuffle" filter. Blosc is generally much faster than deflate – however, reading Blosc-compressed HDF5 files require Blosc to be installed. This is the case for Julia, but often not for vanilla HDF5 distributions that may be used outside Julia. (In this case, the structure of the HDF5 file is still accessible, but compressed datasets cannot be read.) Compression requires chunking, and heuristic chunking is automatically used if you specify compression but don't specify chunking.

It is also possible to write to subsets of an on-disk HDF5 dataset. This is useful to incrementally save to very large datasets you don't want to keep in memory. For example,

dset = create_dataset(g, "B", datatype(Float64), dataspace(1000,100,10), chunk=(100,100,1))
dset[:,1,1] = rand(1000)

creates a Float64 dataset in the file or group g, with dimensions 1000x100x10, and then writes to just the first 1000 element slice. If you know the typical size of subset reasons you'll be reading/writing, it can be beneficial to set the chunk dimensions appropriately.

More fine-grained control is also available.

Memory mapping

If you will frequently be accessing individual elements or small regions of array datasets, it can be substantially more efficient to bypass HDF5 routines and use direct memory mapping. This is possible only under particular conditions: when the dataset is an array of standard "bits" types (e.g., Float64 or Int32) and no chunking/compression is being used. You can use the ismmappable function to test whether this is possible; for example,

dset = g["x"]
if HDF5.ismmappable(dset)
    dset = HDF5.readmmap(dset)
end
val = dset[15]

Note that readmmap returns an Array rather than an HDF5 object.

Note: if you use readmmap on a dataset and subsequently close the file, the array data are still available–-and file continues to be in use–-until all of the arrays are garbage-collected. This is in contrast to standard HDF5 datasets, where closing the file prevents further access to any of the datasets, but the file is also detached and can safely be rewritten immediately.

Under the default allocation-time policy, a newly added ismmappable dataset can only be memory mapped after it has been written to. The following fails:

vec_dset = create_dataset(g, "v", datatype(Float64), dataspace(10_000,1))
HDF5.ismmappable(vec_dset)    # == true
vec = HDF5.readmmap(vec_dset) # throws ErrorException("Error mmapping array")

because although the dataset description has been added, the space within the HDF5 file has not yet actually been allocated (so the file region cannot be memory mapped by the OS). The storage can be allocated by making at least one write:

vec_dset[1,1] = 0.0      # force allocation of /g/v within the file
vec = HDF5.readmmap(vec_dset) # and now the memory mapping can succeed

Alternatively, the policy can be set so that the space is allocated immediately upon creation of the data set with the alloc_time keyword:

mtx_dset = create_dataset(g, "M", datatype(Float64), dataspace(100, 1000),
                    alloc_time = HDF5.H5D_ALLOC_TIME_EARLY)
mtx = HDF5.readmmap(mtx_dset) # succeeds immediately

Supported data types

HDF5.jl knows how to store values of the following types: signed and unsigned integers of 8, 16, 32, and 64 bits, Float32, Float64; Complex versions of these numeric types; Arrays of these numeric types (including complex versions); ASCIIString and UTF8String; and Arrays of these two string types. Arrays of strings are supported using HDF5's variable-length-strings facility. By default Complex numbers are stored as compound types with r and i fields following the h5py convention. When reading data, compound types with matching field names will be loaded as the corresponding Complex Julia type. These field names are configurable with the HDF5.set_complex_field_names(real::AbstractString, imag::AbstractString) function and complex support can be completely enabled/disabled with HDF5.enable/disable_complex_support().

For Arrays, note that the array dimensionality is preserved, including 0-length dimensions:

fid["zero_vector"] = zeros(0)
fid["zero_matrix"] = zeros(0, 0)
size(fid["zero_vector"]) # == (0,)
size(fid["zero_matrix"]) # == (0, 0)

An exception to this rule is Julia's 0-dimensional Array, which is stored as an HDF5 scalar because there is a value to be preserved:

fid["zero_dim_value"] = fill(1.0Ο€)
read(fid["zero_dim_value"]) # == 3.141592653589793, != [3.141592653589793]

HDF5 also has the concept of a null array which contains a type but has neither size nor contents, which is represented by the type HDF5.EmptyArray:

fid["empty_array"] = HDF5.EmptyArray{Float32}()
HDF5.isnull(fid["empty_array"]) # == true
size(fid["empty_array"]) # == ()
eltype(fid["empty_array"]) # == Float32

This module also supports HDF5's VLEN, OPAQUE, and REFERENCE types, which can be used to encode more complex types. In general, you need to specify how you want to combine these more advanced facilities to represent more complex data types. For many of the data types in Julia, the JLD module implements support. You can likewise define your own file format if, for example, you need to interact with some external program that has explicit formatting requirements.

Creating groups and attributes

Create a new group in the following way:

g = create_group(parent, name)

The named group will be created as a child of the parent.

Attributes can be created using

attributes(parent)[name] = value

where attributes simply indicates that the object referenced by name (a string) is an attribute, not another group or dataset. (Datasets cannot have child datasets, but groups can have either.) value must be a simple type: BitsKinds, strings, and arrays of either of these. The HDF5 standard recommends against storing large objects as attributes.

The value stored in an attribute can be retrieved like

read_attribute(parent, name)

You can also access the value of an attribute by indexing, like so:

julia> attr = attribute(parent)[name];
julia> attr[]

Getting information

HDF5.name(obj)

will return the full HDF5 pathname of object obj.

keys(g)

returns a string array containing all objects inside group g. These relative pathnames, not absolute pathnames.

You can iterate over the objects in a group, i.e.,

for obj in g
  data = read(obj)
  println(data)
end

This gives you a straightforward way of recursively exploring an entire HDF5 file.

If you need to know whether group g has a dataset named mydata, you can test that with

if haskey(g, "mydata")
   ...
end
tf = haskey(g, "mydata")

If instead you want to know whether g has an attribute named myattribute, do it this way:

tf = haskey(attributes(g), "myattribute")

If you have an HDF5 object, and you want to know where it fits in the hierarchy of the file, the following can be useful:

p = parent(obj)     # p is the parent object (usually a group)
fn = HDF5.filename(obj)  # fn is a string
g = HDF5.root(obj)       # g is the group "/"

For array objects (datasets and attributes) the following methods work:

dims = size(dset)
nd = ndims(dset)
len = length(dset)

Objects can be created with properties, and you can query those properties in the following way:

p = HDF5.get_create_properties(dset)
chunksz = HDF5.get_chunk(p)

The simpler syntax chunksz = HDF5.get_chunk(dset) is also available.

Finally, sometimes you need to be able to conveniently test whether a file is an HDF5 file:

tf = HDF5.ishdf5(filename)

Mid-level routines

Sometimes you might want more fine-grained control, which can be achieved using a different set of routines. For example,

g = open_group(parent, name)
dset = open_dataset(parent, name[, apl])
attr = open_attribute(parent, name)
t = open_datatype(parent, name)

These open the named group, dataset, attribute, and committed datatype, respectively. For datasets, apl stands for "access parameter list" and provides opportunities for more sophisticated control (see the HDF5 documentation).

New objects can be created in the following ways:

g = create_group(parent, name[, lcpl, gcpl]; properties...)
dset = create_dataset(parent, name, data; properties...)
attr = create_attribute(parent, name, data)

creates groups, datasets, and attributes without writing any data to them. You can then use write(obj, data) to store the data. The optional properties and property lists allow even more fine-grained control. This syntax uses data to infer the object's "HDF5.datatype" and "HDF5.dataspace"; for the most explicit control, data can be replaced with dtype, dspace, where dtype is an HDF5.Datatype and dspace is an HDF5.Dataspace.

Analogously, to create committed data types, use

t = commit_datatype(parent, name, dtype[, lcpl, tcpl, tapl])

You can create and write data in one step,

write_dataset(parent, name, data; properties...)
write_attribute(parent, name, data)

You can use extendible dimensions,

d = create_dataset(parent, name, dtype, (dims, max_dims), chunk=(chunk_dims))
HDF5.set_extent_dims(d, new_dims)

where dims is a tuple of integers. For example

b = create_dataset(fid, "b", Int, ((1000,),(-1,)), chunk=(100,)) #-1 is equivalent to typemax(hsize_t)
HDF5.set_extent_dims(b, (10000,))
b[1:10000] = collect(1:10000)

when dimensions are reduced, the truncated data is lost. A maximum dimension of -1 is often referred to as unlimited dimensions, though it is limited by the maximum size of an unsigned integer.

Finally, it's possible to delete objects:

delete_object(parent, name)   # for groups, datasets, and datatypes
delete_attribute(parent, name)   # for attributes

Low-level routines

Many of the most commonly-used libhdf5 functions have been wrapped. These are not exported, so you need to preface them with HDF5.function to use them. The library follows a consistent convention: for example, libhdf5's H5Adelete is wrapped with a Julia function called h5a_delete. The arguments are exactly as specified in the HDF5 reference manual.

HDF5 is a large library, and the low-level wrap is not complete. However, many of the most-commonly used functions are wrapped, and in general wrapping a new function takes only a single line of code. Users who need additional functionality are encourage to contribute it.

Note that Julia's HDF5 directly uses the "2" interfaces, e.g., H5Dcreate2, so you need to have version 1.8 of the HDF5 library or later.

Language interoperability with row- and column-major order arrays

There are two main methods for storing multidimensional arrays in linear storage row-major order and column-major order. Julia, like Fortran and MATLAB, stores multidimensional arrays in column-major order, while other languages, including C and Python (NumPy), use row-major order. Therefore when reading an array in Julia from row-major order language the dimensions may be inverted.

To read a multidimensional array into the original shape from an HDF5 file written by Python (numpy and h5py) or C/C++/Objective-C, simply reverse the dimensions. For example, one may add the following line after reading the dataset dset:

dset = permutedims(dset, reverse(1:ndims(dset)))

Note that some languages or libraries use both methods, so please check the datset's description for details. For example, NumPy arrays are row-major by default, but NumPy can use either row-major or column-major ordered arrays.

Credits

  • Konrad Hinsen initiated Julia's support for HDF5

  • Tim Holy and Simon Kornblith (primary authors)

  • Tom Short contributed code and ideas to the dictionary-like interface

  • Blake Johnson made several improvements, such as support for iterating over attributes

  • Isaiah Norton and Elliot Saba improved installation on Windows and OSX

  • Steve Johnson contributed the do syntax and Blosc compression

  • Mike Nolta and Jameson Nash contributed code or suggestions for improving the handling of HDF5's constants

  • Thanks also to the users who have reported bugs and tested fixes

API Reference

Below we include a limited number of API references. Note not all of these are public interfaces, thus proceed with caution.

HDF5.CONTEXT β€” Constant
HDF5.CONTEXT

Internal API

Default HDF5Context.

HDF5.SHOW_TREE_ICONS β€” Constant
SHOW_TREE_ICONS = Ref{Bool}(true)

Configurable option to control whether emoji icons (true) or a plain-text annotation (false) is used to indicate the object type by show_tree.

HDF5.SHOW_TREE_MAX_DEPTH β€” Constant
SHOW_TREE_MAX_DEPTH = Ref{Int}(5)

Maximum recursive depth to descend during printing.

HDF5.BlockRange β€” Method
HDF5.BlockRange(;start::Integer, stride::Integer=1, count::Integer=1, block::Integer=1)

A BlockRange represents a selection along a single dimension of a HDF5 hyperslab. It is similar to a Julia range object, with some extra features for selecting multiple contiguous blocks.

  • start: the index of the first element in the first block (1-based).

  • stride: the step between the first element of each block (must be >0)

  • count: the number of blocks (can be -1 for an unlimited number of blocks)

  • block: the number of elements in each block.

    BlockRange(obj::Union{Integer, OrdinalRange})

Convert obj to a BlockRange object.

External links

HDF5.Dataspace β€” Type
HDF5.Dataspace

A dataspace defines the size and the shape of a dataset or an attribute.

A dataspace is typically constructed by calling dataspace.

The following functions have methods defined for Dataspace objects

  • ==
  • ndims
  • size
  • length
  • isempty
  • isnull
HDF5.HDF5Context β€” Type
HDF5Context

Internal API

An HDF5Context is a collection of HDF5 property lists. It is meant to be used as a Task local mechanism to store state and change the default property lists for new objects.

Use the function get_context_property(name::Symbol) to access a property list within the local context.

The context in task_local_storage()[:hdf5_context] will be checked first. A common global HDF5Context is stored in the constant HDF5.CONTEXT and serves as the default context if the current task does not have a :hdf5_context.

Fields

  • attribute_access
  • attribute_create
  • dataset_access
  • dataset_create
  • dataset_tranfer
  • datatype_access
  • datatype_create
  • file_access
  • file_create
  • file_mount
  • group_access
  • group_create
  • link_access
  • link_create
  • object_copy
  • object_create
  • string_create
HDF5.VirtualLayout β€” Type
VirtualLayout(dcpl::DatasetCreateProperties)

The collection of VirtualMappings associated with dcpl. This is an AbstractVector{VirtualMapping}, supporting length, getindex and push!.

HDF5.VirtualMapping β€” Type
VirtualMapping(
    vspace::Dataspace,
    srcfile::AbstractString,
    srcdset::AbstractString,
    srcspace::Dataspace
)

Specify a map of elements of the virtual dataset (VDS) described by vspace to the elements of the source dataset described by srcspace. The source dataset is identified by the name of the file where it is located, srcfile, and the name of the dataset, srcdset.

Both srcfile and srcdset support "printf"-style formats with %b being replaced by the block count of the selection.

For more details on how source file resolution works, see H5P_SET_VIRTUAL.

Base.copyto! β€” Method
copyto!(output_buffer::AbstractArray{T}, obj::Union{DatasetOrAttribute}) where T

Copy [part of] a HDF5 dataset or attribute to a preallocated output buffer. The output buffer must be convertible to a pointer and have a contiguous layout.

Base.isopen β€” Method

isopen(obj::HDF5.File)

Returns true if obj has not been closed, false if it has been closed.

Base.similar β€” Method
similar(obj::DatasetOrAttribute, [::Type{T}], [dims::Integer...]; normalize = true)

Return a Array{T} or Matrix{UInt8} to that can contain [part of] the dataset.

The normalize keyword will normalize the buffer for string and array datatypes.

HDF5.attributes β€” Method
attributes(object::Union{File,Object})

The attributes of a file or object: this returns an Attributes object, which is Dict-like object for accessing the attributes of object: getindex will return an Attribute object, and setindex! will call write_attribute.

HDF5.attrs β€” Method
attrs(object::Union{File,Group,Dataset,Datatype})

The attributes dictionary of object. Returns an AttributeDict, a Dict-like object for accessing the attributes of object.

attrs(object)["name"] = value  # create/overwrite an attribute
attr = attrs(object)["name"]   # read an attribute
delete!(attrs(object), "name") # delete an attribute
keys(attrs(object))            # list the attribute names
HDF5.create_attribute β€” Method
create_attribute(parent::Union{File,Object}, name::AbstractString, dtype::Datatype, space::Dataspace)
create_attribute(parent::Union{File,Object}, name::AbstractString, data)

Create a new Attribute object named name on the object parent, either by specifying the Datatype and Dataspace of the attribute, or by providing the data. Note that no data will be written: use write_attribute to write the data.

HDF5.create_dataset β€” Method
create_dataset(parent, path, datatype, dataspace; properties...)

Arguments

  • parent - File or Group
  • path - String describing the path of the dataset within the HDF5 file or nothing to create an anonymous dataset
  • datatype - Datatype or Type or the dataset
  • dataspace - Dataspace or Dims of the dataset
  • properties - keyword name-value pairs set properties of the dataset

Keywords

There are many keyword properties that can be set. Below are a few select keywords.

  • chunk - Dims describing the size of a chunk. Needed to apply filters.
  • filters - AbstractVector{<: Filters.Filter} describing the order of the filters to apply to the data. See Filters
  • external - Tuple{AbstractString, Intger, Integer} (filepath, offset, filesize) External dataset file location, data offset, and file size. See API.h5p_set_external.

Additionally, the initial create, transfer, and access properties can be provided as a keyword:

See also

HDF5.create_external β€” Method
create_external(source::Union{HDF5.File, HDF5.Group}, source_relpath, target_filename, target_path;
                lcpl_id=HDF5.API.H5P_DEFAULT, lapl_id=HDF5.H5P.DEFAULT)

Create an external link such that source[source_relpath] points to target_path within the file with path target_filename; Calls [H5Lcreate_external](https://www.hdfgroup.org/HDF5/doc/RM/RM_H5L.html#Link-CreateExternal).

HDF5.create_external_dataset β€” Function
create_external_dataset(parent, name, filepath, dtype, dspace, offset = 0)

Create a external dataset with data in an external file.

  • parent - File or Group
  • name - Name of the Dataset
  • filepath - File path to where the data is tored
  • dtype - Datatype, Type, or value where datatype is applicable
  • offset - Offset, in bytes, from the beginning of the file to the location in the file where the data starts.

Use API.h5p_set_external to link to multiple segments.

See also API.h5p_set_external

HDF5.dataspace β€” Method
dataspace(data)

The default Dataspace used for representing a Julia object data:

  • strings or numbers: a scalar Dataspace
  • arrays: a simple Dataspace
  • struct types: a scalar Dataspace
  • nothing or an EmptyArray: a null dataspace
HDF5.dataspace β€” Method
dataspace(dims::Tuple; maxdims::Tuple=dims)

Construct a simple Dataspace for the given dimensions dims. The maximum dimensions maxdims specifies the maximum possible size: -1 can be used to indicate unlimited dimensions.

HDF5.delete_attribute β€” Method
delete_attribute(parent::Union{File,Object}, name::AbstractString)

Delete the Attribute named name on the object parent.

HDF5.do_read_chunk β€” Method
do_read_chunk(dataset::Dataset, offset)

Read a raw chunk at a given offset. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.

HDF5.do_read_chunk β€” Method
do_read_chunk(dataset::Dataset, index::Integer)

Read a raw chunk at a given index. index is 1-based and consecutive up to the number of chunks.

HDF5.do_write_chunk β€” Function
do_write_chunk(dataset::Dataset, index, chunk_bytes::AbstractArray, filter_mask=0)

Write a raw chunk at a given linear index. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. index is 1-based and consecutive up to the number of chunks.

HDF5.do_write_chunk β€” Function
do_write_chunk(dataset::Dataset, offset, chunk_bytes::AbstractArray, filter_mask=0)

Write a raw chunk at a given offset. chunk_bytes is an AbstractArray that can be converted to a pointer, Ptr{Cvoid}. offset is a 1-based list of rank ndims(dataset) and must fall on a chunk boundary.

HDF5.get_chunk_index β€” Method
HDF5.get_chunk_index(dataset_id, offset)

Get 0-based index of chunk from 0-based offset returned in Julia's column-major order. For a 1-based API, see HDF5.ChunkStorage.

HDF5.get_chunk_length β€” Method
HDF5.get_chunk_length(dataset_id)

Retrieves the chunk size in bytes. Equivalent to API.h5d_get_chunk_info(dataset_id, index)[:size].

HDF5.get_chunk_offset β€” Method
HDF5.get_chunk_offset(dataset_id, index)

Get 0-based offset of chunk from 0-based index. The offsets are returned in Julia's column-major order rather than hdf5 row-major order. For a 1-based API, see HDF5.ChunkStorage.

HDF5.get_context_property β€” Method
get_context_property(name::Symbol)

Internal API

Retrieve a property list from the task local context, defaulting to HDF5.CONTEXT if task_local_storage()[:hdf5_context] does not exist.

HDF5.get_datasets β€” Method
get_datasets(file::HDF5.File) -> datasets::Vector{HDF5.Dataset}

Get all the datasets in an hdf5 file without loading the data.

HDF5.get_extent_dims β€” Method
HDF5.get_extent_dims(obj::Union{HDF5.Dataspace, HDF5.Dataset, HDF5.Attribute}) -> dims, maxdims

Get the array dimensions from a dataspace, dataset, or attribute and return a tuple of dims and maxdims.

HDF5.get_num_chunks β€” Method
HDF5.get_num_chunks(dataset_id)

Returns the number of chunks in a dataset. Equivalent to API.h5d_get_num_chunks(dataset_id, HDF5.H5S_ALL).

HDF5.get_num_chunks_per_dim β€” Method
HDF5.get_num_chunks_per_dim(dataset_id)

Get the number of chunks in each dimension in Julia's column-major order.

HDF5.h5open β€” Function
h5open(filename::AbstractString, mode::AbstractString="r"; swmr=false, pv...)

Open or create an HDF5 file where mode is one of:

  • "r" read only
  • "r+" read and write
  • "cw" read and write, create file if not existing, do not truncate
  • "w" read and write, create a new file (destroys any existing contents)

Pass swmr=true to enable (Single Writer Multiple Reader) SWMR write access for "w" and "r+", or SWMR read access for "r".

Properties can be specified as keywords for FileAccessProperties and FileCreateProperties.

Also the keywords fapl and fcpl can be used to provide default instances of these property lists. Property lists passed in via keyword will be closed. This is useful to set properties not currently defined by HDF5.jl.

Note that h5open uses fclose_degree = :strong by default, but this can be overriden by the fapl keyword.

HDF5.h5open β€” Method
function h5open(f::Function, args...; pv...)

Apply the function f to the result of h5open(args...; kwargs...) and close the resulting HDF5.File upon completion. For example with a do block:

h5open("foo.h5","w") do h5
    h5["foo"]=[1,2,3]
end
HDF5.h5readattr β€” Method
h5readattr(filename, name::AbstractString, data::Dict)

Read the attributes of the object at name in the HDF5 file filename, returning a Dict.

HDF5.h5writeattr β€” Method
h5writeattr(filename, name::AbstractString, data::Dict)

Write data as attributes to the object at name in the HDF5 file filename.

HDF5.has_parallel β€” Method
has_parallel()

Returns true if the HDF5 libraries were compiled with parallel support, and if parallel functionality was loaded into HDF5.jl.

For the second condition to be true, MPI.jl must be imported before HDF5.jl.

HDF5.has_ros3 β€” Method
has_ros3()

Returns true if the HDF5 libraries were compiled with ros3 support

HDF5.ishdf5 β€” Method
ishdf5(name::AbstractString)

Returns true if the file specified by name is in the HDF5 format, and false otherwise.

HDF5.isnull β€” Method
isnull(dspace::Union{HDF5.Dataspace, HDF5.Dataset, HDF5.Attribute})

Determines whether the given object has no size (consistent with the API.H5S_NULL dataspace).

Examples

julia> HDF5.isnull(dataspace(HDF5.EmptyArray{Float64}()))
true

julia> HDF5.isnull(dataspace((0,)))
false
HDF5.open_attribute β€” Function
open_attribute(parent::Union{File,Group,Dataset,Datatype}, name::AbstractString)

Open the Attribute named name on the object parent.

HDF5.open_group β€” Function
open_group(parent::Union{File,Group}, path::AbstratString)

Open an existing Group at path under the parent object.

HDF5.read_attribute β€” Method
read_attribute(parent::Union{File,Group,Dataset,Datatype}, name::AbstractString)

Read the value of the named attribute on the parent object.

Example

julia> HDF5.read_attribute(g, "time")
2.45
HDF5.read_chunk β€” Function
HDF5.read_chunk(dataset_id, index::Integer, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())

Helper method to read chunks via 0-based integer index.

Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.h5d_get_chunk_info. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.

This method returns Vector{UInt8}.

HDF5.read_chunk β€” Function
HDF5.read_chunk(dataset_id, offset, [buf]; dxpl_id = HDF5.API.H5P_DEFAULT, filters = Ref{UInt32}())

Helper method to read chunks via 0-based offsets in a Tuple.

Argument buf is optional and defaults to a Vector{UInt8} of length determined by HDF5.get_chunk_length. Argument dxpl_id can be supplied a keyword and defaults to HDF5.API.H5P_DEFAULT. Argument filters can be retrieved by supplying a Ref{UInt32} value via a keyword argument.

This method returns Vector{UInt8}.

HDF5.rename_attribute β€” Method
rename_attribute(parent::Union{File,Object}, oldname::AbstractString, newname::AbstractString)

Rename the Attribute of the object parent named oldname to newname.

HDF5.select_hyperslab! β€” Method
HDF5.select_hyperslab!(dspace::Dataspace, [op, ], idxs::Tuple)

Selects a hyperslab region of the dspace. idxs should be a tuple of integers, ranges or blockrange objects.

  • op determines how the new selection is to be combined with the already selected dataspace:
    • :select (default): replace the existing selection with the new selection.
    • :or: adds the new selection to the existing selection. Aliases: |, βˆͺ, union.
    • :and: retains only the overlapping portions of the new and existing selection. Aliases: &, ∩, intersect.
    • :xor: retains only the elements that are members of the new selection or the existing selection, excluding elements that are members of both selections. Aliases: ⊻, xor
    • :notb: retains only elements of the existing selection that are not in the new selection. Alias: setdiff.
    • :nota: retains only elements of the new selection that are not in the existing selection.
HDF5.set_extent_dims β€” Function
HDF5.set_extent_dims(dspace::HDF5.Dataspace, new_dims::Dims, max_dims::Union{Dims,Nothing} = nothing)

Change the dimensions of a dataspace dspace to new_dims, optionally with the maximum possible dimensions max_dims different from the active size new_dims. If not given, max_dims is set equal to new_dims.

HDF5.set_extent_dims β€” Method
HDF5.set_extent_dims(dset::HDF5.Dataset, new_dims::Dims)

Change the current dimensions of a dataset to new_dims, limited by max_dims = get_extent_dims(dset)[2]. Reduction is possible and leads to loss of truncated data.

HDF5.setproperties! β€” Method
setproperties!(props::Properties...; kwargs...)

For each (key, value) pair in kwargs, set the corresponding properties in each Properties object in props. Returns a Dict of any pairs which didn't match properties in props.

HDF5.write_attribute β€” Method
write_attribute(parent::Union{File,Object}, name::AbstractString, data)

Write data as an Attribute named name on the object parent.

HDF5.write_chunk β€” Method
HDF5.write_chunk(dataset_id, offset, buf::AbstractArray; dxpl_id = HDF5.API.H5P_DEFAULT, filter_mask = 0)

Helper method to write chunks via 0-based offsets offset as a Tuple.

HDF5.write_chunk β€” Method
HDF5.write_chunk(dataset_id, index::Integer, buf::AbstractArray; dxpl_id = API.H5P_DEFAULT, filter_mask = 0)

Helper method to write chunks via 0-based integer index.

HDF5.@bool_property β€” Macro
@bool_property(name)

Wrap property getter/setter API functions that use 0/1 to use Bool values

HDF5.@enum_property β€” Macro
@enum_property(name, sym1 => enumvalue1, sym2 => enumvalue2, ...)

Wrap property getter/setter API functions that use enum values to use symbol instead.

HDF5.@propertyclass β€” Macro
@propertyclass P classid

Define a new subtype of P <: Properties corresponding to a HDF5 property list with class identifier classid.

Once defined, the following interfaces can be defined:

superclass(::Type{P})

This should return the type from which P inherits. If not defined, it will inherit from GenericProperties.

class_propertynames(::Type{P})

This should return a Tuple of Symbols, being the names of the properties associated with P.

class_getproperty(::Type{P}, p::Properties, name::Symbol)

If name is an associated property of type P, this should return the value of the property, otherwise call class_getproperty(superclass(P), p, name).

class_setproperty!(::Type{P}, p::Properties, name::Symbol, val)

If name is an associated property of type P, this should set the value of the property, otherwise call class_setproperty!(superclass(P), p, name, val).