addimport!(pkg::Symbol, property::Symbol,
           alias::Union{Symbol, Nothing}=nothing; imports::ImportList)
addimport!(pkg::Union{Expr, Symbol}, property::Expr,
           alias::Union{Symbol, Nothing}=nothing; imports::ImportList)

Add the appropriate information to import for property to be loaded from pkg, either by the default name, or as alias if specified.

addpkg!(name::Union{Expr, Symbol}, alias::Union{Symbol, Nothing}=nothing;
        pkgs::PkgList, imports::ImportList)

Add the appropriate information to pkgs and perhaps imports for the package name to be loaded, either using the default name, or as alias if provided.


Where terms is a set of terms produced from parsing an expression like foo as bar, baz, flatten out the individual tokens.


julia> flattenterms((:(, :as, :((bar, baz,, other.thing)), :as, :((other, more))))
9-element Vector{Union{Expr, Symbol}}:
graft(parent::Symbol, child::Union{Expr, Symbol})

Return a "grafted" expression takes the child property of parent.


julia> graft(:a, :(b.c.d))

julia> graft(:a, :b)

Represent a nested property expression as a vector of symbols.


julia> propertylist(:(a.b.c.d))
4-element Vector{Symbol}:

Split an nested property expression into the first term and the remaining terms


julia> splitfirst(:(a.b.c.d))
(:a, :(b.c.d))
@import pkg1, pkg2...
@import pkg1 as name1, pkg2 as name2...
@import pkg: foo, bar...
@import pkg: foo as bar, bar as baz...

Fetch modules previously registered with @addpkg, and import them into the current namespace. This macro tries to largely mirror the syntax of using.

If a required package had to be loaded for the @import statement, a PkgRequiredRerunNeeded singleton will be returned.


@import pkg
# Alternative form
@import pkg: dothing

When writing data configuration TOML file, the keys are (recursively) sorted. Some keys are particularly important though, and so to ensure they are placed higher a mappings from such keys to a higher sort priority string can be registered here.

For example, "config" => "x01" ensures that the special configuration section is placed before all of the data sets.

This can cause odd behaviour if somebody gives a dataset the same name as a special key, but frankly that would be a bit silly (given the key names, e.g. "uuid") and so this is of minimal concern.


The data specification TOML format constructs a DataCollection, which itself contains DataSets, comprised of metadata and AbstractDataTransformers.

├─ DataSet
│  ├─ AbstractDataTransformer
│  └─ AbstractDataTransformer
├─ DataSet

Within each scope, there are certain reserved attributes. They are listed in this Dict under the following keys:

  • :collection for DataCollection
  • :dataset for DataSet
  • :transformer for AbstractDataTransformer

The set of packages loaded by each module via @addpkg, for import with @import.

More specifically, when a module M invokes @addpkg pkg id then EXTRA_PACKAGES[M][pkg] = id is set, and then this information is used with @import to obtain the package from the root module.


The help-string for the help command itself. This contains the template string "<SCOPE>", which is replaced with the relevant scope at runtime.


A mapping from severity numbers (see LINT_SEVERITY_MAPPING) to a tuple giving the color the message should be accented with and the severity title string.


A symbol identifying the Data REPL. This is used in a few places, such as the command history.


The color that should be used for question text presented in a REPL context. This should be a symbol present in Base.text_colors.


The supertype for methods producing or consuming data.

                 ╵               ▼
Storage ◀────▶ Data          Information
                 ▲               ╷

There are three subtypes:

  • DataStorage
  • DataLoader
  • DataWrite

Each subtype takes a Symbol type parameter designating the driver which should be used to perform the data operation. In addition, each subtype has the following fields:

  • dataset::DataSet, the data set the method operates on
  • type::Vector{<:QualifiedType}, the Julia types the method supports
  • priority::Int, the priority with which this method should be used, compared to alternatives. Lower values have higher priority.
  • parameters::SmallDict{String, Any}, any parameters applied to the method.
Advice{func, context} <: Function

Advices allow for composable, highly flexible modifications of data by encapsulating a function call. They are inspired by elisp's advice system, namely the most versatile form — :around advice, and Clojure's advisors.

A Advice is essentially a function wrapper, with a priority::Int attribute. The wrapped functions should be of the form:

(action::Function, args...; kargs...) ->
  ([post::Function], action::Function, args::Tuple, [kargs::NamedTuple])

Short-hand return values with post or kargs omitted are also accepted, in which case default values (the identity function and (;) respectively) will be automatically substituted in.

    input=(action args kwargs)
         ┃                 ┏╸post=identity
       ╭─╂────advisor 1────╂─╮
       ╭─╂────advisor 2────╂─╮
       ╭─╂────advisor 3────╂─╮
         ┃                 ┃
         ▼                 ▽
action(args; kargs) ━━━━▶ post╺━━▶ result

To specify which transforms a Advice should be applied to, ensure you add the relevant type parameters to your transducing function. In cases where the transducing function is not applicable, the Advice will simply act as the identity function.

After all applicable Advices have been applied, action(args...; kargs...) |> post is called to produce the final result.

The final post function is created by rightwards-composition with every post entry of the advice forms (i.e. at each stage post = post ∘ extra is run).

The overall behaviour can be thought of as shells of advice.

        ╭╌ advisor 1 ╌╌╌╌╌╌╌╌─╮
        ┆ ╭╌ advisor 2 ╌╌╌╌╌╮ ┆
        ┆ ┆                 ┆ ┆
input ━━┿━┿━━━▶ function ━━━┿━┿━━▶ result
        ┆ ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯ ┆


Advice(priority::Int, f::Function)
Advice(f::Function) # priority is set to 1


1. Logging every time a DataSet is loaded.

loggingadvisor = Advice(
    function(post::Function, f::typeof(load), loader::DataLoader, input, outtype)
        @info "Loading $("
        (post, f, (loader, input, outtype))

2. Automatically committing each data file write.

writecommitadvisor = Advice(
    function(post::Function, f::typeof(write), writer::DataWriter{:filesystem}, output, info)
        function writecommit(result)
            run(`git add $output`)
            run(`git commit -m "update $output"`)
        (post ∘ writecommit, writefn, (output, info))
(advice::Advice)((post::Function, func::Function, args::Tuple, kwargs::NamedTuple))

Apply advice to the function call func(args...; kwargs...), and return the new (post, func, args, kwargs) tuple.


A collection of Advices sourced from available Plugins.

Like individual Advices, a AdviceAmalgamation can be called as a function. However, it also supports the following convenience syntax:

(::AdviceAmalgamation)(f::Function, args...; kargs...) # -> result


AdviceAmalgamation(adviseall::Function, advisors::Vector{Advice},
                   plugins_wanted::Vector{String}, plugins_used::Vector{String})
AmbiguousIdentifier(identifier::Union{String, UUID}, matches::Vector, [collection])

Searching for identifier (optionally within collection), found multiple matches (provided as matches).

Example occurrence

julia> d"multimatch"
ERROR: AmbiguousIdentifier: "multimatch" matches multiple data sets
    ■:multimatch [45685f5f-e6ff-4418-aaf6-084b847236a8]
    ■:multimatch [92be4bda-55e9-4317-aff4-8d52ee6a5f2c]
Stacktrace: [...]

The version of the collection currently being acted on is not supported by the current version of DataToolkitBase.

Example occurrence

julia> fromspec(DataCollection, SmallDict{String, Any}("data_config_version" => -1))
ERROR: CollectionVersionMismatch: -1 (specified) ≠ 0 (current)
  The data collection specification uses the v-1 data collection format, however
  the installed DataToolkitBase version expects the v0 version of the format.
  In the future, conversion facilities may be implemented, for now though you
  will need to manually upgrade the file to the v0 format.
Stacktrace: [...]

An attempt was made to perform an operation on a collection within the data stack, but the data stack is empty.

Example occurrence

julia> getlayer(nothing) # with an empty STACK
ERROR: EmptyStackError: The data collection stack is empty
Stacktrace: [...]
struct FilePath path::String end

Crude stand in for a file path type, which is strangely absent from Base.

This allows for load/write method dispatch, and the distinguishing of file content (as a String) from file paths.


julia> string(FilePath("some/path"))
Identifier(dataset::DataSet, collection::Union{Symbol, Nothing}=:name,
           name::Symbol=something(collection, :name))

Create an Identifier referring to dataset, specifying the collection dataset comes from as well (when collection is not nothing) as all of its parameters, but without any type information.

Should collection and name default to the symbol :name, which signals that the collection and dataset reference of the generated Identifier should use the names of the collection/dataset. If set to :uuid, the UUID is used instead. No other value symbols are supported.


A description that can be used to uniquely identify a DataSet.

Four fields are used to describe the target DataSet:

  • collection, the name or UUID of the collection (optional).
  • dataset, the name or UUID of the dataset.
  • type, the type that should be loaded from the dataset.
  • parameters, any extra parameters of the dataset that should match.


Identifier(collection::Union{AbstractString, UUID, Nothing},
           dataset::Union{AbstractString, UUID},
           type::Union{QualifiedType, Nothing},
           parameters::SmallDict{String, Any})


An Identifier can be represented as a string with the following form, with the optional components enclosed by square brackets:


Such forms can be parsed to an Identifier by simply calling the parse function, i.e. parse(Identifier, "mycollection:dataset").

ImpossibleTypeException(qt::QualifiedType, mod::Union{Module, Nothing})

The qualified type qt could not be converted to a Type, for some reason or another (mod is the parent module used in the attempt, should it be successfully identified, and nothing otherwise).


Representation of a lint item.


LintItem(source, severity::Union{Int, Symbol}, id::Symbol, message::String,
         fixer::Union{Function, Nothing}=nothing, autoapply::Bool=false)

source is the object that the lint applies to.

severity should be one of the following values:

  • 0 or :debug, for messages that may assist with debugging problems that may be associated with particular configuration.
  • 1 or :info, for informational messages understandable to end-users.
  • 2 or :warning, for potentially harmful situations.
  • 3 or :error, for severe issues that will prevent normal functioning.

id is a symbol representing the type of lint (e.g. :unknown_driver)

message is a message, intelligible to the end-user, describing the particular nature of the issue with respect to source. It should be as specific as possible.

fixer can be set to a function which modifies source to resolve the issue. If autoapply is set to true then fixer will be called spontaneously. The function should return true or false to indicate whether it was able to successfully fix the issue.

As a general rule, fixers that do or might require user input should not be run automatically, and fixers that can run without any user input and always "do the right thing" should be run automatically.




struct LintItem{S}
    source    ::S
    severity  ::UInt8
    id        ::Symbol
    message   ::String
    fixer     ::Union{Function, Nothing}
    autoapply ::Bool

The package pkg was asked for, but does not seem to be available in the current environment.

Example occurrence

julia> @addpkg Bar "00000000-0000-0000-0000-000000000000"
Bar [00000000-0000-0000-0000-000000000000]

julia> @import Bar
[ Info: Lazy-loading Bar [00000000-0000-0000-0000-000000000001]
ERROR: MissingPackage: Bar [00000000-0000-0000-0000-000000000001] has been required, but does not seem to be installed.
Stacktrace: [...]

The data set (dataset) is no longer a child of its parent collection.

This error should not occur, and is intended as a sanity check should something go quite wrong.


A representation of a Julia type that does not need the type to be defined in the Julia session, and can be stored as a string. This is done by storing the type name and the module it belongs to as Symbols.


While QualifiedType is currently quite capable, it is not currently able to express the full gamut of Julia types. In future this will be improved, but it will likely always be restricted to a certain subset.


While the subtype operator cannot work on QualifiedTypes (<: is a built-in), when the Julia types are defined the subset operator can be used instead. This works by simply converting the QualifiedTypes to the corresponding Type and then applying the subtype operator.

julia> QualifiedTypes(:Base, :Vector) ⊆ QualifiedTypes(:Core, :Array)

julia> Matrix ⊆ QualifiedTypes(:Core, :Array)

julia> QualifiedTypes(:Base, :Vector) ⊆ AbstractVector

julia> QualifiedTypes(:Base, :Foobar) ⊆ AbstractVector


QualifiedType(parentmodule::Symbol, typename::Symbol)


A QualifiedType can be expressed as a string as "$parentmodule.$typename". This can be easily parsed as a QualifiedType, e.g. parse(QualifiedType, "Core.IO").


Modification of collection is not viable, as it is read-only.

Example Occurrence

julia> lockedcollection = DataCollection(SmallDict{String, Any}("uuid" => Base.UUID(rand(UInt128)), "config" => SmallDict{String, Any}("locked" => true)))
julia> write(lockedcollection)
ERROR: ReadonlyCollection: The data collection unnamed#298 is locked
Stacktrace: [...]

A command that can be used in the Data REPL (accessible through '}').

A ReplCmd must have a:

  • name, a symbol designating the command keyword.
  • trigger, a string used as the command trigger (defaults to String(name)).
  • description, a short overview of the functionality as a string or displayable object.
  • execute, either a list of sub-ReplCmds, or a function which will perform the command's action. The function must take a single argument, the rest of the command as an AbstractString (for example, 'cmd arg1 arg2' will call the execute function with "arg1 arg2").


ReplCmd{name::Symbol}(trigger::String, description::Any, execute::Function)
ReplCmd{name::Symbol}(description::Any, execute::Function)
ReplCmd(name::Union{Symbol, String}, trigger::String, description::Any, execute::Function)
ReplCmd(name::Union{Symbol, String}, description::Any, execute::Function)


ReplCmd(:echo, "print the argument", identity)
ReplCmd(:addone, "return the input plus one", v -> 1 + parse(Int, v))
ReplCmd(:math, "A collection of basic integer arithmetic",
    [ReplCmd(:add, "a + b + ...", nums -> sum(parse.(Int, split(nums))))],
     ReplCmd(:mul, "a * b * ...", nums -> prod(parse.(Int, split(nums)))))


help(::ReplCmd) # -> print detailed help
allcompletions(::ReplCmd) # -> list all candidates
completions(::ReplCmd, sofar::AbstractString) # -> list relevant candidates
SmallDict{K, V}

A little (ordered) dictionary type for a small number of keys. Rather than doing anything clever with hashes, it just keeps a list of keys, and a list of values.

For a small number of items, this has time and particularly space advantages over a standard dictionary — a Dict{String, Any}("a" => 1) is about 4x larger than the equivalent SmallDict. Further testing indicates this provides a ~40% reduction on the overall in-memory size of a DataCollection.


A catch-all for issues involving data transformers, with details given in msg.

Example occurrence

julia> emptydata = DataSet(DataCollection(), "empty", SmallDict{String, Any}("uuid" => Base.UUID(rand(UInt128))))
DataSet empty

julia> read(emptydata)
ERROR: TransformerError: Data set "empty" could not be loaded in any form.
Stacktrace: [...]
UnregisteredPackage(pkg::Symbol, mod::Module)

The package pkg was asked for within mod, but has not been registered by mod, and so cannot be loaded.

Example occurrence

julia> @import Foo
ERROR: UnregisteredPackage: Foo has not been registered by Main, see @addpkg for more information
Stacktrace: [...]
UnresolveableIdentifier{T}(identifier::Union{String, UUID}, [collection::DataCollection])

No T (optionally from collection) could be found that matches identifier.

Example occurrences

julia> d"iirs"
ERROR: UnresolveableIdentifier: "iirs" does not match any available data sets
  Did you perhaps mean to refer to one of these data sets?
    ■:iris (75% match)
Stacktrace: [...]

julia> d"iris::Int"
ERROR: UnresolveableIdentifier: "iris::Int" does not match any available data sets
  Without the type restriction, however, the following data sets match:
    dataset:iris, which is available as a DataFrame, Matrix, CSV.File
Stacktrace: [...]
UnsatisfyableTransformer{T}(dataset::DataSet, types::Vector{QualifiedType})

A transformer (of type T) that could provide any of types was asked for, but there is no transformer that satisfies this restriction.

Example occurrence

julia> emptydata = DataSet(DataCollection(), "empty", SmallDict{String, Any}("uuid" => Base.UUID(rand(UInt128))))
DataSet empty

julia> read(emptydata, String)
ERROR: UnsatisfyableTransformer: There are no loaders for "empty" that can provide a String. The defined loaders are as follows:
Stacktrace: [...]

Remove dataset from its parent collection.


Check whether a data collection is backed by a writable file.

open(dataset::DataSet, as::Type; write::Bool=false)

Obtain the data of dataset in the form of as, with the appropriate storage provider automatically selected.

A write flag is also provided, to help the driver pick a more appropriate form of as.

This executes this component of the overall data flow:

                 ╵               ▼
Storage ◀────▶ Data          Information
read(filename::AbstractString, DataCollection; writer::Union{Function, Nothing})

Read the entire contents of a file as a DataCollection.

The default value of writer is self -> write(filename, self).

read(dataset::DataSet, as::Type)
read(dataset::DataSet) # as default type

Obtain information from dataset in the form of as, with the appropriate loader and storage provider automatically determined.

This executes this component of the overall data flow:

                 ╵               ▼
Storage ◀────▶ Data          Information

The loader and storage provider are selected by identifying the highest priority loader that can be satisfied by a storage provider. What this looks like in practice is illustrated in the diagram below.

      read(dataset, Matrix) ⟶ ::Matrix ◀╮
         ╭───╯        ╰────────────▷┬───╯
╔═════╸dataset╺══════════════════╗  │
║ STORAGE      LOADERS           ║  │
║ (⟶ File)─┬─╮ (File ⟶ String)   ║  │
║ (⟶ IO)   ┊ ╰─(File ⟶ Matrix)─┬─╫──╯
║ (⟶ File)┄╯   (IO ⟶ String)   ┊ ║
║              (IO ⟶ Matrix)╌╌╌╯ ║

  ─ the load path used
  ┄ an option not taken

TODO explain further
read(io::IO, DataCollection; path::Union{String, Nothing}=nothing, mod::Module=Base.Main)

Read the entirety of io, as a DataCollection.

replace!(dataset::DataSet; [name, uuid, parameters, storage, loaders, writers])

Perform an in-place update of dataset, optionally replacing any of the name, uuid, parameters, storage, loaders, or writers fields.

write(dataset::DataSet, info::Any)

TODO write docstring


Obtain the relevant AdviceAmalgamation for thing.

_dataadvisecall(func::Function, args...; kwargs...)

Identify the first data-like argument of args (i.e. a DataCollection, DataSet, or AbstractDataTransformer), obtain its advise, and perform an advised call of func(args...; kwargs...).

_read(dataset::DataSet, as::Type)

The advisible implementation of read(dataset::DataSet, as::Type) This is essentially an excersise in useful indirection.

add(::Type{DataSet}, name::String, spec::Dict{String, Any}, source::String="";
    collection::DataCollection=first(STACK), storage::Vector{Symbol}=Symbol[],
    loaders::Vector{Symbol}=Symbol[], writers::Vector{Symbol}=Symbol[],

Create a new DataSet with a name and spec, and add it to collection. The data transformers will be constructed with each of the backends listed in storage, loaders, and writers from source. If the symbol * is given, all possible drivers will be searched and the highest priority driver available (according to createpriority) used. Should no transformer of the specified driver and type exist, it will be skipped.


Obtain all possible String completion candidates for r. This defaults to the empty vector String[].

allcompletions is only called when completions(r, sofar::AbstractString) is not implemented.

complete_repl_cmd(line::AbstractString; commands::Vector{ReplCmd}=REPL_CMDS)

Return potential completion candidates for line provided by commands. More specifically, the command being completed is identified and completions(cmd::ReplCmd{:cmd}, sofar::AbstractString) called.

Special behaviour is implemented for the help command.

completions(r::ReplCmd, sofar::AbstractString)

Obtain a list of String completion candidates based on sofar. All candidates should begin with sofar.

Should this function not be implemented for the specific ReplCmd r, allcompletions(r) will be called and filter to candidates that begin with sofar.

If r has subcommands, then the subcommand prefix will be removed and completions re-called on the relevant subcommand.

           collection::DataCollection=first(STACK), quiet::Bool=false)

Obtain the configuration value at propertypath in collection.

When no value is set, nothing is returned instead and if quiet is unset "unset" is printed.

config_set([collection::DataCollection=first(STACK)], propertypath::Vector{String}, value::Any;

Return a variation of collection with the configuration at propertypath set to value.

Unless quiet is set, a success message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

config_unset([collection::DataCollection=first(STACK)], propertypath::Vector{String};

Return a variation of collection with the configuration at propertypath removed.

Unless quiet is set, a success message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

confirm_yn(question::AbstractString, default::Bool=false)

Interactively ask question and accept y/Y/n/N as the response. If any other key is pressed, then default will be taken as the response. A " [y/n]: " string will be appended to the question, with y/n capitalised to indicate the default value.


julia> confirm_yn("Do you like chocolate?", true)
Do you like chocolate? [Y/n]: y
create(T::Type{<:AbstractDataTransformer}, source::String, dataset::DataSet)

If source/dataset can be used to construct a data transformer of type T, do so and return it. Otherwise return nothing.

Specific transformers should implement specialised forms of this function, that either return nothing to indicate that it is not applicable, or a "create spec form". A "create spec form" is simply a list of key::String => value entries, giving properties of the to-be-created transformer, e.g.

["foo" => "bar",
 "baz" => 2]

In addition to accepting TOML-representable values, a NamedTuple value can be given that specifies an interactive prompt to put to the user.

(; prompt::String = "$key",
   type::Type{String or Bool or <:Number} = String,
   default::type = false or "",
   optional::Bool = false,
   skipvalue::Any = nothing,
   post::Function = identity)

The value can also be a Function that takes the current specification as an argument and returns a TOML-representable value or NamedTuple.

Lastly true/false can be returned as a convenient way of simply indicating whether an empty (no parameters) driver should be created.

create(T::Type{<:AbstractDataTransformer}, driver::Symbol, source::String, dataset::DataSet;
       minpriority::Int=-100, maxpriority::Int=100)

Create a new T with driver driver from source/dataset.

If driver is the symbol * then all possible drivers are checked and the highest priority (according to createpriority) valid driver used. Drivers with a priority outside minprioritymaxpriority will not be considered.

The created data transformer is returned, unless the given driver is not valid, in which case nothing is returned instead.


The priority with which a transformer of type T should be created. This can be any integer, but try to keep to -100–100 (see create).

dataset([collection::DataCollection], identstr::AbstractString, [parameters::Dict{String, Any}])
dataset([collection::DataCollection], identstr::AbstractString, [parameters::Pair{Symbol, Any}...])

Return the data set identified by identstr, optionally specifying the collection the data set should be found in and any parameters that apply.

dataset_parameters(source::Union{DataCollection, DataSet, AbstractDataTransformer},
                   action::Val{:extract|:resolve|:encode}, value::Any)

Obtain a form (depending on action) of value, a property within source.


:extract Look for DataSet references ("📇DATASET<<...>>") within value, and turn them into Identifiers (the inverse of :encode).

:resolve Look for Identifiers in value, and resolve them to the referenced DataSet/value.

:encode Look for Identifiers in value, and turn them into DataSet references (the inverse of :extract).

displaytable(headers::Vector, rows::Vector{<:Vector};
             spacing::Integer=2, maxwidth::Int=80)

Prepend the displaytable for rows with a header row given by headers.

             spacing::Integer=2, maxwidth::Int=80)

Return a vector of strings, formed from each row in rows.

Each string is of the same displaywidth, and individual values are separated by spacing spaces. Values are truncated if necessary to ensure the no row is no wider than maxwidth.

                 scope::String="Data REPL")

Examine line and identify the leading command, then:

  • Show an error if the command is not given in commands
  • Show help, if help is asked for (see help_show)
  • Call the command's execute function, if applicable
  • Call execute_repl_cmd on the argument with commands set to the command's subcommands and scope set to the command's trigger, if applicable
find_repl_cmd(cmd::AbstractString; warn::Bool=false,
              scope::String="Data REPL")

Examine the command string cmd, and look for a command from commands that is uniquely identified. Either the identified command or nothing will be returned.

Should cmd start with help or ? then a ReplCmd{:help} command is returned.

If cmd is ambiguous and warn is true, then a message listing all potentially matching commands is printed.

If cmd does not match any of commands and warn is true, then a warning message is printed. Additionally, should the named command in cmd have more than a 3/5th longest common subsequence overlap with any of commands, then those commands are printed as suggestions.

fromspec(ADT::Type{<:AbstractDataTransformer}, dataset::DataSet, spec::Dict{String, Any})

Create an ADT of dataset according to spec.

ADT can either contain the driver name as a type parameter, or it will be read from the "driver" key in spec.

fromspec(::Type{DataCollection}, spec::Dict{String, Any};
         path::Union{String, Nothing}=nothing, mod::Module=Base.Main)

Create a DataCollection from spec.

The path and mod keywords are used as the values for the corresponding fields in the DataCollection.

fromspec(::Type{DataSet}, collection::DataCollection, name::String, spec::Dict{String, Any})

Create a DataSet for collection called name, according to spec.

get_package(from::Module, name::Symbol)

Obtain a module specified by either pkg or identified by name and declared by from. Should the package not be currently loaded, in Julia ≥ 1.7 DataToolkit will attempt to lazy-load the package and return its module.

Failure to either locate name or require pkg will result in an exception being thrown.


Find the DataCollection in STACK with name/uuid.


Print the help string for r.

help(r::ReplCmd{<:Any, Vector{ReplCmd}})

Print the help string and subcommand table for r.

help_cmd_table(; maxwidth::Int=displaysize(stdout)[2],

Print a table showing the triggers and descriptions (limited to the first line) of commands, under the headers "Command" and "Action" (or "Subcommand" if sub is set). The table is truncated if necessary so it is no wider than maxwidth.

help_show(cmd::AbstractString; commands::Vector{ReplCmd}=REPL_CMDS)

If cmd refers to a command in commands, show its help (via help). If cmd is empty, list commands via help_cmd_table.


Show documentation of a particular data transformer (should it exist).

In the special case that transformer is Symbol(""), a list of all documented transformers is printed.

highlight_lcs(io::IO, a::String, b::String;
              before::String="\e[1m", after::String="\e[22m",

Print a, highlighting the longest common subsequence between a and b by inserting before prior to each subsequence region and after afterwards.

If invert is set, the before/after behaviour is switched.

init(name::Union{AbstractString, Missing},
     path::Union{AbstractString, Nothing};
     uuid::UUID=uuid4(), plugins::Vector{String}=DEFAULT_PLUGINS,
     write::Bool=true, addtostack::Bool=true, quiet::Bool=false)

Create a new data collection.

This can be an in-memory data collection, when path is set to nothing, or a collection which corresponds to a Data TOML file, in which case path should be set to either a path to a .toml file or a directory in which a Data.toml file should be placed.

When path is a string and write is set, the data collection file will be immediately written, overwriting any existing file at the path.

When addtostack is set, the data collection will also be added to the top of the data collection stack.

Unless quiet is set, a message will be send to stderr reporting successful creating of the data collection file.


julia> init("test", "/tmp/test/Data.toml")

Construct the Data REPL LineEdit.Prompt and configure it and the REPL to behave appropriately. Other than boilerplate, this basically consists of:

  • Setting the prompt style
  • Setting the execution function (toplevel_execute_repl_cmd)
  • Setting the completion to use DataCompletionProvider
invokepkglatest(f, args...; kwargs...)

Call f(args...; kwargs...) via invokelatest, and re-run if PkgRequiredRerunNeeded is returned.

issubseq(a, b)

Return true if a is a subsequence of b, false otherwise.


julia> issubseq("abc", "abc")

julia> issubseq("adg", "abcdefg")

julia> issubseq("gda", "abcdefg")

Call all of the relevant linter functions on obj. More specifically, the method table is searched for lint(obj::T, ::Val{:linter_id}) methods (where :linter_id is a stand-in for the actual IDs used), and each specific lint function is invoked and the results combined.


Each specific linter function should return a vector of relevant LintItems, i.e.

lint(obj::T, ::Val{:linter_id}) -> Union{Vector{LintItem{T}}, LintItem{T}, Nothing}

See the documentation on LintItem for more information on how it should be constructed.


Attempt to fix as many issues raised in report as possible.

load(loader::DataLoader{driver}, source::Any, as::Type)

Using a certain loader, obtain information in the form of as from the data given by source.

This fulfils this component of the overall data flow:

  ╵               ▼
Data          Information

When the loader produces nothing this is taken to indicate that it was unable to load the data for some reason, and that another loader should be tried if possible. This can be considered a soft failure. Any other value is considered valid information.

loadcollection!(source::Union{<:AbstractString, <:IO}, mod::Module=Base.Main;
                soft::Bool=false, index::Int=1)

Load a data collection from source and add it to the data stack at index. source must be accepted by read(source, DataCollection).

mod should be set to the Module within which loadcollection! is being invoked. This is important when code is run by the collection. As such, it is usually appropriate to call:

loadcollection!(source, @__MODULE__; soft)

When soft is set, should an data collection already exist with the same UUID, nothing will be done and nothing will be returned.

longest_common_subsequence(a, b)

Find the longest common subsequence of b within a, returning the indices of a that comprise the subsequence.

This function is intended for strings, but will work for any indexable objects with == equality defined for their elements.


julia> longest_common_subsequence("same", "same")
4-element Vector{Int64}:

julia> longest_common_subsequence("fooandbar", "foobar")
6-element Vector{Int64}:

Generate a sorting key for key that when used with sort will put the collection in "natural order".

julia> natkeygen.(["A1", "A10", "A02", "A1.5"])
4-element Vector{Vector{AbstractString}}:
 ["a", "0\x01"]
 ["a", "0\n"]
 ["a", "0\x02"]
 ["a", "0\x015"]

julia> sort(["A1", "A10", "A02", "A1.5"], by=natkeygen)
4-element Vector{String}:

Read the next 'word' from input. If input starts with a quote, this is the unescaped text between the opening and closing quote. Other wise this is simply the next word.

Returns a tuple of the form (word, rest).


julia> peelword("one two")
("one", "two")

julia> peelword(""one two" three")
("one two", "three")
plugin_add([collection::DataCollection=first(STACK)], plugins::Vector{<:AbstractString};

Return a variation of collection with all plugins not currently used added to the plugin list.

Unless quiet is a set an informative message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

plugin_info(plugin::AbstractString; quiet::Bool=false)

Fetch the documentation of plugin, or return nothing if documentation could not be fetched.

If quiet is not set warning messages will be omitted when no documentation could be fetched.

plugin_list(collection::DataCollection=first(STACK); quiet::Bool=false)

Obtain a list of plugins used in collection.

quiet is unused but accepted as an argument for the sake of consistency.

plugin_remove([collection::DataCollection=first(STACK)], plugins::Vector{<:AbstractString};

Return a variation of collection with all plugins currently used removed from the plugin list.

Unless quiet is a set an informative message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

prompt(question::AbstractString, default::AbstractString="",
       allowempty::Bool=false, cleardefault::Bool=true,

Interactively ask question and return the response string, optionally with a default value. If multiline is true, RET must be pressed twice consecutively to submit a value.

Unless allowempty is set an empty response is not accepted. If cleardefault is set, then an initial backspace will clear the default value.

The prompt supports the following line-edit-y keys:

  • left arrow
  • right arrow
  • home
  • end
  • delete forwards
  • delete backwards


julia> prompt("What colour is the sky? ")
What colour is the sky? Blue
prompt_char(question::AbstractString, options::Vector{Char},
            default::Union{Char, Nothing}=nothing)

Interactively ask question, only accepting options keys as answers. All keys are converted to lower case on input. If default is not nothing and 'RET' is hit, then default will be returned.

Should '^C' be pressed, an InterruptException will be thrown.

refine(collection::DataCollection, datasets::Vector{DataSet}, ident::Identifier)

Filter datasets (from collection) to data sets than match the identifier ident.

This function contains an advise entrypoint where plugins can apply further filtering, applied to the method refine(::Vector{DataSet}, ::Identifier, ::Vector{String}).

refine(datasets::Vector{DataSet}, ::Identifier, ignoreparams::Vector{String})

This is a stub function that exists soley as as an advise point for data set filtering during resolution of an identifier.

resolve(identstr::AbstractString, parameters::Union{SmallDict{String, Any}, Nothing}=nothing;
        resolvetype::Bool=true, stack::Vector{DataCollection}=STACK)

Attempt to resolve the identifier given by identstr and parameters against each layer of the data stack in turn.

resolve(collection::DataCollection, ident::Identifier;
        resolvetype::Bool=true, requirematch::Bool=true)

Attempt to resolve an identifier (ident) to a particular data set. Matching data sets will searched for from collection.

When resolvetype is set and ident specifies a datatype, the identified data set will be read to that type.

When requirematch is set an error is raised should no dataset match ident. Otherwise, nothing is returned.

resolve(ident::Identifier; resolvetype::Bool=true, stack=STACK)

Attempt to resolve ident using the specified data layer, if present, trying every layer of the data stack in turn otherwise.

save(writer::Datasaveer{driver}, destination::Any, information::Any)

Using a certain writer, save the information to the destination.

This fulfils this component of the overall data flow:

Data          Information
  ▲               ╷

Create a SmallDict version of dict, with all contained Dicts recursively converted into SmallDicts.

stack_index(ident::Union{Int, String, UUID, DataCollection}; quiet::Bool=false)

Obtain the index of the data collection identified by ident on the stack, if it is present. If it is not found, nothing is returned and unless quiet is set a warning is printed.

stack_move(ident::Union{Int, String, UUID, DataCollection}, shift::Int; quiet::Bool=false)

Find ident in the data collection stack, and shift its position by shift, returning the new index. shift is clamped so that the new index lies within STACK.

If ident could not be resolved, then nothing is returned and unless quiet is set a warning is printed.

stack_remove!(ident::Union{Int, String, UUID, DataCollection}; quiet::Bool=false)

Find ident in the data collection stack and remove it from the stack, returning the index at which it was found.

If ident could not be resolved, then nothing is returned and unless quiet is set a warning is printed.

storage(storer::DataStorage, as::Type; write::Bool=false)

Fetch a storer in form as, appropiate for reading from or writing to (depending on write).

By default, this just calls getstorage or putstorage (when write=true).

This executes this component of the overall data flow:

Storage ◀────▶ Data
stringdist(a::AbstractString, b::AbstractString; halfcase::Bool=false)

Calculate the Restricted Damerau-Levenshtein distance (aka. Optimal String Alignment) between a and b.

This is the minimum number of edits required to transform a to b, where each edit is a deletion, insertion, substitution, or transposition of a character, with the restriction that no substring is edited more than once.

When halfcase is true, substitutions that just switch the case of a character cost half as much.


julia> stringdist("The quick brown fox jumps over the lazy dog",
                  "The quack borwn fox leaps ovver the lzy dog")

julia> stringdist("typo", "tpyo")

julia> stringdist("frog", "cat")

julia> stringdist("Thing", "thing", halfcase=true)
stringsimilarity(a::AbstractString, b::AbstractString; halfcase::Bool=false)

Return the stringdist as a proportion of the maximum length of a and b, take one. When halfcase is true, case switches cost half as much.


julia> stringsimilarity("same", "same")

julia> stringsimilarity("semi", "demi")

julia> stringsimilarity("Same", "same", halfcase=true)

Return a list of types supported by the data transformer ADT.

This is used as the default value for the type key in the Data TOML. The list of types is dynamically generated based on the available methods for the data transformer.

In some cases, it makes sense for this to be explicitly defined for a particular transformer.


Consume io representing a TOML file, and reformat it to improve readability. Currently this takes the form of the following changes:

  • Replace inline multi-line strings with multi-line toml strings.

An IOBuffer containing the reformatted content is returned.

The processing assumes that io contains TOML.print-formatted content. Should this not be the case, mangled TOML may be emitted.


Call execute_repl_cmd(line), but gracefully catch an InterruptException if thrown.

This is the main entrypoint for command execution.


Return a Dict representation of thing for writing as TOML.

transformer_docs(name::Symbol, type::Symbol=:any)

Return the documentation for the transformer identified by name, or nothing if no documentation entry could be found.

typeify(qt::QualifiedType; mod::Module=Main)

Convert qt to a Type available in mod, if possible. If this cannot be done, nothing is returned instead.

@addpkg name::Symbol uuid::String

Register the package identified by name with UUID uuid. This package may now be used with @import $name.

All @addpkg statements should lie within a module's __init__ function.


@addpkg CSV "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
@advise [source] f(args...; kwargs...)

Convert a function call f(args...; kwargs...) to an advised function call, where the advise collection is obtained from source or the first data-like* value of args.

* i.e. a DataCollection, DataSet, or AbstractDataTransformer

For example, @advise myfunc(other, somedataset, rest...) is equivalent to somedataset.collection.advise(myfunc, other, somedataset, rest...).

This macro performs a fairly minor code transformation, but should improve clarity.

@dataplugin plugin_variable
@dataplugin plugin_variable :default

Register the plugin given by the variable plugin_variable, along with its documentation (fetched by @doc). Should :default be given as the second argument the plugin is also added to the list of default plugins.

This effectievly serves as a minor, but appreciable, convenience for the following pattern:

push!(PLUGINS, myplugin)
push!(DEFAULT_PLUGINS, # when also adding to defaults