Data.toml
A collection of data sets may be encapsulated in a Data.toml
file, the structure of which is described here.
Overall structure
data_config_version=0
name="data collection name"
uuid="a UUIDv4"
plugins=["plugin1", "plugin2", ...]
[config]
# [Properties of the data collection itself]
[[mydataset]]
uuid="a UUIDv4"
# other properties...
[[mydataset.TRANSFORMER]]
driver="transformer driver"
type=["a QualifiedType", ...]
priority=1 # (optional)
# other properties...
[[mydataset]]
# There may be multiple data sets by the same name,
# but they must be uniquely identifyable by their properties
[[exampledata]]
# Another data set
Attributes of the data collection
There are four top-level non-table properties currently recognised.
data_config_version
:: The (integer) version of the format. Currently0
as this project is still in the alpha phase of development, moving towards beta.name
:: an identifying string. Cannot contain:
, and characters outside of[A-Za-z0-9_]
are recommended against.- uuid :: a UUIDv4 used to uniquely refer to the data collection, should it be renamed etc.
- plugins :: a list of plugins which should be used when working with this data collection
In addition to these four, a special table of the name config
is recognised. This holds custom attributes of the data collection, e.g.
[config]
mykey="value"
[config.defaults]
description="Ooops, somebody forgot to describe this."
[config.defaults.storage.filesystem]
priority=2
Note that as a consequence of this special table, no data set may be named "config".
DataToolkitBase reserves exactly one config attribute: locked
. This is used to indicate that the Data.toml
file should not be modified, and to override it the attribute must be changed within the Data.toml
file. By setting config.locked = true
, you protecct yourself from accidental modifications to the data file.
This functionality is provided here rather than in DataToolkitsCommon etc. because it supported via the implementation of Base.iswritable(::DataCollection)
, and so downstream packages would only be able to support this by overriding this method.
Structure of a data set
[[mydataset]]
uuid="a UUIDv4"
# other properties...
[[mydataset.TRANSFORMER]]
driver="transformer driver"
type=["a QualifiedType", ...] # probably optional
type="a QualifiedType" # single-value alternative form
priority=1 # (optional)
# other properties...
A data set is a top-level instance of an array of tables, with any name other than config
. Data set names need not be unique, but should be able to be uniquely identified by the combination of their name and parameters.
Apart from data transformers, there is one recognised data property: uuid
, a UUIDv4 string. Any number of additional properties may be given (so long as they do not conflict with the transformer names), they may have special behaviour based on plugins or extensions loaded, but will not be treated specially by DataToolkitBase.
A data set can have any number of data transformers, but at least two are needed for a functional data set. Data transformers are instances of an array of tables (like data sets), but directly under the data set table.
Structure of a data transformer
There are three data transformers types, with the following names:
storage
loader
writer
All transformers recognise three properties:
driver
, the transformer driver name, as a stringtype
, a singleQualifiedType
string, or an array of thempriority
, an integer which sets the order in which multiple transformers should be considered
The driver
property is mandatory. type
and priority
can be omitted, in which case they will adopt the default values. The default type
value is either determined dynamically from the available methods, or set for that particular transformer.