DataToolkitBase.ImportParser.addimport!Function
addimport!(pkg::Symbol, property::Symbol,
           alias::Union{Symbol, Nothing}=nothing; imports::ImportList)
addimport!(pkg::Union{Expr, Symbol}, property::Expr,
           alias::Union{Symbol, Nothing}=nothing; imports::ImportList)

Add the appropriate information to import for property to be loaded from pkg, either by the default name, or as alias if specified.

DataToolkitBase.ImportParser.addpkg!Function
addpkg!(name::Union{Expr, Symbol}, alias::Union{Symbol, Nothing}=nothing;
        pkgs::PkgList, imports::ImportList)

Add the appropriate information to pkgs and perhaps imports for the package name to be loaded, either using the default name, or as alias if provided.

DataToolkitBase.ImportParser.flattentermsMethod
flattenterms(terms)

Where terms is a set of terms produced from parsing an expression like foo as bar, baz, flatten out the individual tokens.

Example

julia> flattenterms((:(foo.bar), :as, :((bar, baz, baz.foo, other.thing)), :as, :((other, more))))
9-element Vector{Union{Expr, Symbol}}:
 :(foo.bar)
 :as
 :bar
 :baz
 :(baz.foo)
 :(other.thing)
 :as
 :other
 :more
DataToolkitBase.ImportParser.graftMethod
graft(parent::Symbol, child::Union{Expr, Symbol})

Return a "grafted" expression takes the child property of parent.

Example

julia> graft(:a, :(b.c.d))
:(a.b.c.d)

julia> graft(:a, :b)
:(a.b)
DataToolkitBase.ImportParser.propertylistMethod
propertylist(prop::Expr)

Represent a nested property expression as a vector of symbols.

Example

julia> propertylist(:(a.b.c.d))
4-element Vector{Symbol}:
 :a
 :b
 :c
 :d
DataToolkitBase.ImportParser.splitfirstMethod
splitfirst(prop::Expr)

Split an nested property expression into the first term and the remaining terms

Example

julia> splitfirst(:(a.b.c.d))
(:a, :(b.c.d))
DataToolkitBase.ImportParser.@localimportMacro
@import pkg1, pkg2...
@import pkg1 as name1, pkg2 as name2...
@import pkg: foo, bar...
@import pkg: foo as bar, bar as baz...

Fetch modules previously registered with @addpkg, and import them into the current namespace. This macro tries to largely mirror the syntax of using.

If a required package had to be loaded for the @import statement, a PkgRequiredRerunNeeded singleton will be returned.

Example

@import pkg
pkg.dothing(...)
# Alternative form
@import pkg: dothing
dothing(...)
DataToolkitBase.DATA_CONFIG_KEY_SORT_MAPPINGConstant

When writing data configuration TOML file, the keys are (recursively) sorted. Some keys are particularly important though, and so to ensure they are placed higher a mappings from such keys to a higher sort priority string can be registered here.

For example, "config" => "x01" ensures that the special configuration section is placed before all of the data sets.

This can cause odd behaviour if somebody gives a dataset the same name as a special key, but frankly that would be a bit silly (given the key names, e.g. "uuid") and so this is of minimal concern.

DataToolkitBase.DATA_CONFIG_RESERVED_ATTRIBUTESConstant

The data specification TOML format constructs a DataCollection, which itself contains DataSets, comprised of metadata and AbstractDataTransformers.

DataCollection
├─ DataSet
│  ├─ AbstractDataTransformer
│  └─ AbstractDataTransformer
├─ DataSet
⋮

Within each scope, there are certain reserved attributes. They are listed in this Dict under the following keys:

  • :collection for DataCollection
  • :dataset for DataSet
  • :transformer for AbstractDataTransformer
DataToolkitBase.EXTRA_PACKAGESConstant

The set of packages loaded by each module via @addpkg, for import with @import.

More specifically, when a module M invokes @addpkg pkg id then EXTRA_PACKAGES[M][pkg] = id is set, and then this information is used with @import to obtain the package from the root module.

DataToolkitBase.HELP_CMD_HELPConstant

The help-string for the help command itself. This contains the template string "<SCOPE>", which is replaced with the relevant scope at runtime.

DataToolkitBase.LINT_SEVERITY_MESSAGESConstant

A mapping from severity numbers (see LINT_SEVERITY_MAPPING) to a tuple giving the color the message should be accented with and the severity title string.

DataToolkitBase.REPL_NAMEConstant

A symbol identifying the Data REPL. This is used in a few places, such as the command history.

DataToolkitBase.REPL_QUESTION_COLORConstant

The color that should be used for question text presented in a REPL context. This should be a symbol present in Base.text_colors.

DataToolkitBase.AbstractDataTransformerType

The supertype for methods producing or consuming data.

                 ╭────loader─────╮
                 ╵               ▼
Storage ◀────▶ Data          Information
                 ▲               ╷
                 ╰────writer─────╯

There are three subtypes:

  • DataStorage
  • DataLoader
  • DataWrite

Each subtype takes a Symbol type parameter designating the driver which should be used to perform the data operation. In addition, each subtype has the following fields:

  • dataset::DataSet, the data set the method operates on
  • type::Vector{<:QualifiedType}, the Julia types the method supports
  • priority::Int, the priority with which this method should be used, compared to alternatives. Lower values have higher priority.
  • parameters::SmallDict{String, Any}, any parameters applied to the method.
DataToolkitBase.AdviceType
Advice{func, context} <: Function

Advices allow for composable, highly flexible modifications of data by encapsulating a function call. They are inspired by elisp's advice system, namely the most versatile form — :around advice, and Clojure's advisors.

A Advice is essentially a function wrapper, with a priority::Int attribute. The wrapped functions should be of the form:

(action::Function, args...; kargs...) ->
  ([post::Function], action::Function, args::Tuple, [kargs::NamedTuple])

Short-hand return values with post or kargs omitted are also accepted, in which case default values (the identity function and (;) respectively) will be automatically substituted in.

    input=(action args kwargs)
         ┃                 ┏╸post=identity
       ╭─╂────advisor 1────╂─╮
       ╰─╂─────────────────╂─╯
       ╭─╂────advisor 2────╂─╮
       ╰─╂─────────────────╂─╯
       ╭─╂────advisor 3────╂─╮
       ╰─╂─────────────────╂─╯
         ┃                 ┃
         ▼                 ▽
action(args; kargs) ━━━━▶ post╺━━▶ result

To specify which transforms a Advice should be applied to, ensure you add the relevant type parameters to your transducing function. In cases where the transducing function is not applicable, the Advice will simply act as the identity function.

After all applicable Advices have been applied, action(args...; kargs...) |> post is called to produce the final result.

The final post function is created by rightwards-composition with every post entry of the advice forms (i.e. at each stage post = post ∘ extra is run).

The overall behaviour can be thought of as shells of advice.

        ╭╌ advisor 1 ╌╌╌╌╌╌╌╌─╮
        ┆ ╭╌ advisor 2 ╌╌╌╌╌╮ ┆
        ┆ ┆                 ┆ ┆
input ━━┿━┿━━━▶ function ━━━┿━┿━━▶ result
        ┆ ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯ ┆
        ╰╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯

Constructors

Advice(priority::Int, f::Function)
Advice(f::Function) # priority is set to 1

Examples

1. Logging every time a DataSet is loaded.

loggingadvisor = Advice(
    function(post::Function, f::typeof(load), loader::DataLoader, input, outtype)
        @info "Loading $(loader.data.name)"
        (post, f, (loader, input, outtype))
    end)

2. Automatically committing each data file write.

writecommitadvisor = Advice(
    function(post::Function, f::typeof(write), writer::DataWriter{:filesystem}, output, info)
        function writecommit(result)
            run(`git add $output`)
            run(`git commit -m "update $output"`)
            result
        end
        (post ∘ writecommit, writefn, (output, info))
    end)
DataToolkitBase.AdviceMethod
(advice::Advice)((post::Function, func::Function, args::Tuple, kwargs::NamedTuple))

Apply advice to the function call func(args...; kwargs...), and return the new (post, func, args, kwargs) tuple.

DataToolkitBase.AdviceAmalgamationType

A collection of Advices sourced from available Plugins.

Like individual Advices, a AdviceAmalgamation can be called as a function. However, it also supports the following convenience syntax:

(::AdviceAmalgamation)(f::Function, args...; kargs...) # -> result

Constructors

AdviceAmalgamation(adviseall::Function, advisors::Vector{Advice},
                   plugins_wanted::Vector{String}, plugins_used::Vector{String})
AdviceAmalgamation(plugins::Vector{String})
AdviceAmalgamation(collection::DataCollection)
DataToolkitBase.AmbiguousIdentifierType
AmbiguousIdentifier(identifier::Union{String, UUID}, matches::Vector, [collection])

Searching for identifier (optionally within collection), found multiple matches (provided as matches).

Example occurrence

julia> d"multimatch"
ERROR: AmbiguousIdentifier: "multimatch" matches multiple data sets
    ■:multimatch [45685f5f-e6ff-4418-aaf6-084b847236a8]
    ■:multimatch [92be4bda-55e9-4317-aff4-8d52ee6a5f2c]
Stacktrace: [...]
DataToolkitBase.CollectionVersionMismatchType
CollectionVersionMismatch(version::Int)

The version of the collection currently being acted on is not supported by the current version of DataToolkitBase.

Example occurrence

julia> fromspec(DataCollection, SmallDict{String, Any}("data_config_version" => -1))
ERROR: CollectionVersionMismatch: -1 (specified) ≠ 0 (current)
  The data collection specification uses the v-1 data collection format, however
  the installed DataToolkitBase version expects the v0 version of the format.
  In the future, conversion facilities may be implemented, for now though you
  will need to manually upgrade the file to the v0 format.
Stacktrace: [...]
DataToolkitBase.EmptyStackErrorType
EmptyStackError()

An attempt was made to perform an operation on a collection within the data stack, but the data stack is empty.

Example occurrence

julia> getlayer(nothing) # with an empty STACK
ERROR: EmptyStackError: The data collection stack is empty
Stacktrace: [...]
DataToolkitBase.FilePathType
struct FilePath path::String end

Crude stand in for a file path type, which is strangely absent from Base.

This allows for load/write method dispatch, and the distinguishing of file content (as a String) from file paths.

Examples

julia> string(FilePath("some/path"))
"some/path"
DataToolkitBase.IdentifierType
Identifier(dataset::DataSet, collection::Union{Symbol, Nothing}=:name,
           name::Symbol=something(collection, :name))

Create an Identifier referring to dataset, specifying the collection dataset comes from as well (when collection is not nothing) as all of its parameters, but without any type information.

Should collection and name default to the symbol :name, which signals that the collection and dataset reference of the generated Identifier should use the names of the collection/dataset. If set to :uuid, the UUID is used instead. No other value symbols are supported.

DataToolkitBase.IdentifierType

A description that can be used to uniquely identify a DataSet.

Four fields are used to describe the target DataSet:

  • collection, the name or UUID of the collection (optional).
  • dataset, the name or UUID of the dataset.
  • type, the type that should be loaded from the dataset.
  • parameters, any extra parameters of the dataset that should match.

Constructors

Identifier(collection::Union{AbstractString, UUID, Nothing},
           dataset::Union{AbstractString, UUID},
           type::Union{QualifiedType, Nothing},
           parameters::SmallDict{String, Any})

Parsing

An Identifier can be represented as a string with the following form, with the optional components enclosed by square brackets:

[COLLECTION:]DATASET[::TYPE]

Such forms can be parsed to an Identifier by simply calling the parse function, i.e. parse(Identifier, "mycollection:dataset").

DataToolkitBase.ImpossibleTypeExceptionType
ImpossibleTypeException(qt::QualifiedType, mod::Union{Module, Nothing})

The qualified type qt could not be converted to a Type, for some reason or another (mod is the parent module used in the attempt, should it be successfully identified, and nothing otherwise).

DataToolkitBase.LintItemType

Representation of a lint item.

Constructors

LintItem(source, severity::Union{Int, Symbol}, id::Symbol, message::String,
         fixer::Union{Function, Nothing}=nothing, autoapply::Bool=false)

source is the object that the lint applies to.

severity should be one of the following values:

  • 0 or :debug, for messages that may assist with debugging problems that may be associated with particular configuration.
  • 1 or :info, for informational messages understandable to end-users.
  • 2 or :warning, for potentially harmful situations.
  • 3 or :error, for severe issues that will prevent normal functioning.

id is a symbol representing the type of lint (e.g. :unknown_driver)

message is a message, intelligible to the end-user, describing the particular nature of the issue with respect to source. It should be as specific as possible.

fixer can be set to a function which modifies source to resolve the issue. If autoapply is set to true then fixer will be called spontaneously. The function should return true or false to indicate whether it was able to successfully fix the issue.

As a general rule, fixers that do or might require user input should not be run automatically, and fixers that can run without any user input and always "do the right thing" should be run automatically.

Examples

TODO

Structure

struct LintItem{S}
    source    ::S
    severity  ::UInt8
    id        ::Symbol
    message   ::String
    fixer     ::Union{Function, Nothing}
    autoapply ::Bool
end
DataToolkitBase.MissingPackageType
MissingPackage(pkg::Base.PkgId)

The package pkg was asked for, but does not seem to be available in the current environment.

Example occurrence

julia> @addpkg Bar "00000000-0000-0000-0000-000000000000"
Bar [00000000-0000-0000-0000-000000000000]

julia> @import Bar
[ Info: Lazy-loading Bar [00000000-0000-0000-0000-000000000001]
ERROR: MissingPackage: Bar [00000000-0000-0000-0000-000000000001] has been required, but does not seem to be installed.
Stacktrace: [...]
DataToolkitBase.OrphanDataSetType
OrphanDataSet(dataset::DataSet)

The data set (dataset) is no longer a child of its parent collection.

This error should not occur, and is intended as a sanity check should something go quite wrong.

DataToolkitBase.QualifiedTypeType

A representation of a Julia type that does not need the type to be defined in the Julia session, and can be stored as a string. This is done by storing the type name and the module it belongs to as Symbols.

Warning

While QualifiedType is currently quite capable, it is not currently able to express the full gamut of Julia types. In future this will be improved, but it will likely always be restricted to a certain subset.

Subtyping

While the subtype operator cannot work on QualifiedTypes (<: is a built-in), when the Julia types are defined the subset operator can be used instead. This works by simply converting the QualifiedTypes to the corresponding Type and then applying the subtype operator.

julia> QualifiedTypes(:Base, :Vector) ⊆ QualifiedTypes(:Core, :Array)
true

julia> Matrix ⊆ QualifiedTypes(:Core, :Array)
true

julia> QualifiedTypes(:Base, :Vector) ⊆ AbstractVector
true

julia> QualifiedTypes(:Base, :Foobar) ⊆ AbstractVector
false

Constructors

QualifiedType(parentmodule::Symbol, typename::Symbol)
QualifiedType(t::Type)

Parsing

A QualifiedType can be expressed as a string as "$parentmodule.$typename". This can be easily parsed as a QualifiedType, e.g. parse(QualifiedType, "Core.IO").

DataToolkitBase.ReadonlyCollectionType
ReadonlyCollection(collection::DataCollection)

Modification of collection is not viable, as it is read-only.

Example Occurrence

julia> lockedcollection = DataCollection(SmallDict{String, Any}("uuid" => Base.UUID(rand(UInt128)), "config" => SmallDict{String, Any}("locked" => true)))
julia> write(lockedcollection)
ERROR: ReadonlyCollection: The data collection unnamed#298 is locked
Stacktrace: [...]
DataToolkitBase.ReplCmdType

A command that can be used in the Data REPL (accessible through '}').

A ReplCmd must have a:

  • name, a symbol designating the command keyword.
  • trigger, a string used as the command trigger (defaults to String(name)).
  • description, a short overview of the functionality as a string or displayable object.
  • execute, either a list of sub-ReplCmds, or a function which will perform the command's action. The function must take a single argument, the rest of the command as an AbstractString (for example, 'cmd arg1 arg2' will call the execute function with "arg1 arg2").

Constructors

ReplCmd{name::Symbol}(trigger::String, description::Any, execute::Function)
ReplCmd{name::Symbol}(description::Any, execute::Function)
ReplCmd(name::Union{Symbol, String}, trigger::String, description::Any, execute::Function)
ReplCmd(name::Union{Symbol, String}, description::Any, execute::Function)

Examples

ReplCmd(:echo, "print the argument", identity)
ReplCmd(:addone, "return the input plus one", v -> 1 + parse(Int, v))
ReplCmd(:math, "A collection of basic integer arithmetic",
    [ReplCmd(:add, "a + b + ...", nums -> sum(parse.(Int, split(nums))))],
     ReplCmd(:mul, "a * b * ...", nums -> prod(parse.(Int, split(nums)))))

Methods

help(::ReplCmd) # -> print detailed help
allcompletions(::ReplCmd) # -> list all candidates
completions(::ReplCmd, sofar::AbstractString) # -> list relevant candidates
DataToolkitBase.SmallDictType
SmallDict{K, V}

A little (ordered) dictionary type for a small number of keys. Rather than doing anything clever with hashes, it just keeps a list of keys, and a list of values.

For a small number of items, this has time and particularly space advantages over a standard dictionary — a Dict{String, Any}("a" => 1) is about 4x larger than the equivalent SmallDict. Further testing indicates this provides a ~40% reduction on the overall in-memory size of a DataCollection.

DataToolkitBase.TransformerErrorType
TransformerError(msg::String)

A catch-all for issues involving data transformers, with details given in msg.

Example occurrence

julia> emptydata = DataSet(DataCollection(), "empty", SmallDict{String, Any}("uuid" => Base.UUID(rand(UInt128))))
DataSet empty

julia> read(emptydata)
ERROR: TransformerError: Data set "empty" could not be loaded in any form.
Stacktrace: [...]
DataToolkitBase.UnregisteredPackageType
UnregisteredPackage(pkg::Symbol, mod::Module)

The package pkg was asked for within mod, but has not been registered by mod, and so cannot be loaded.

Example occurrence

julia> @import Foo
ERROR: UnregisteredPackage: Foo has not been registered by Main, see @addpkg for more information
Stacktrace: [...]
DataToolkitBase.UnresolveableIdentifierType
UnresolveableIdentifier{T}(identifier::Union{String, UUID}, [collection::DataCollection])

No T (optionally from collection) could be found that matches identifier.

Example occurrences

julia> d"iirs"
ERROR: UnresolveableIdentifier: "iirs" does not match any available data sets
  Did you perhaps mean to refer to one of these data sets?
    ■:iris (75% match)
Stacktrace: [...]

julia> d"iris::Int"
ERROR: UnresolveableIdentifier: "iris::Int" does not match any available data sets
  Without the type restriction, however, the following data sets match:
    dataset:iris, which is available as a DataFrame, Matrix, CSV.File
Stacktrace: [...]
DataToolkitBase.UnsatisfyableTransformerType
UnsatisfyableTransformer{T}(dataset::DataSet, types::Vector{QualifiedType})

A transformer (of type T) that could provide any of types was asked for, but there is no transformer that satisfies this restriction.

Example occurrence

julia> emptydata = DataSet(DataCollection(), "empty", SmallDict{String, Any}("uuid" => Base.UUID(rand(UInt128))))
DataSet empty

julia> read(emptydata, String)
ERROR: UnsatisfyableTransformer: There are no loaders for "empty" that can provide a String. The defined loaders are as follows:
Stacktrace: [...]
Base.delete!Method
delete!(dataset::DataSet)

Remove dataset from its parent collection.

Base.iswritableMethod
iswritable(dc::DataCollection)

Check whether a data collection is backed by a writable file.

Base.openMethod
open(dataset::DataSet, as::Type; write::Bool=false)

Obtain the data of dataset in the form of as, with the appropriate storage provider automatically selected.

A write flag is also provided, to help the driver pick a more appropriate form of as.

This executes this component of the overall data flow:

                 ╭────loader─────╮
                 ╵               ▼
Storage ◀────▶ Data          Information
Base.readMethod
read(filename::AbstractString, DataCollection; writer::Union{Function, Nothing})

Read the entire contents of a file as a DataCollection.

The default value of writer is self -> write(filename, self).

Base.readMethod
read(dataset::DataSet, as::Type)
read(dataset::DataSet) # as default type

Obtain information from dataset in the form of as, with the appropriate loader and storage provider automatically determined.

This executes this component of the overall data flow:

                 ╭────loader─────╮
                 ╵               ▼
Storage ◀────▶ Data          Information

The loader and storage provider are selected by identifying the highest priority loader that can be satisfied by a storage provider. What this looks like in practice is illustrated in the diagram below.

      read(dataset, Matrix) ⟶ ::Matrix ◀╮
         ╭───╯        ╰────────────▷┬───╯
╔═════╸dataset╺══════════════════╗  │
║ STORAGE      LOADERS           ║  │
║ (⟶ File)─┬─╮ (File ⟶ String)   ║  │
║ (⟶ IO)   ┊ ╰─(File ⟶ Matrix)─┬─╫──╯
║ (⟶ File)┄╯   (IO ⟶ String)   ┊ ║
║              (IO ⟶ Matrix)╌╌╌╯ ║
╚════════════════════════════════╝

  ─ the load path used
  ┄ an option not taken

TODO explain further
Base.readMethod
read(io::IO, DataCollection; path::Union{String, Nothing}=nothing, mod::Module=Base.Main)

Read the entirety of io, as a DataCollection.

Base.replace!Method
replace!(dataset::DataSet; [name, uuid, parameters, storage, loaders, writers])

Perform an in-place update of dataset, optionally replacing any of the name, uuid, parameters, storage, loaders, or writers fields.

Base.writeMethod
write(dataset::DataSet, info::Any)

TODO write docstring

DataToolkitBase._dataadviseMethod
_dataadvise(thing::AdviceAmalgamation)
_dataadvise(thing::Vector{Advice})
_dataadvise(thing::Advice)
_dataadvise(thing::DataCollection)
_dataadvise(thing::DataSet)
_dataadvise(thing::AbstractDataTransformer)

Obtain the relevant AdviceAmalgamation for thing.

DataToolkitBase._dataadvisecallMethod
_dataadvisecall(func::Function, args...; kwargs...)

Identify the first data-like argument of args (i.e. a DataCollection, DataSet, or AbstractDataTransformer), obtain its advise, and perform an advised call of func(args...; kwargs...).

DataToolkitBase._readMethod
_read(dataset::DataSet, as::Type)

The advisible implementation of read(dataset::DataSet, as::Type) This is essentially an excersise in useful indirection.

DataToolkitBase.addFunction
add(::Type{DataSet}, name::String, spec::Dict{String, Any}, source::String="";
    collection::DataCollection=first(STACK), storage::Vector{Symbol}=Symbol[],
    loaders::Vector{Symbol}=Symbol[], writers::Vector{Symbol}=Symbol[],
    quiet::Bool=false)

Create a new DataSet with a name and spec, and add it to collection. The data transformers will be constructed with each of the backends listed in storage, loaders, and writers from source. If the symbol * is given, all possible drivers will be searched and the highest priority driver available (according to createpriority) used. Should no transformer of the specified driver and type exist, it will be skipped.

DataToolkitBase.allcompletionsMethod
allcompletions(r::ReplCmd)

Obtain all possible String completion candidates for r. This defaults to the empty vector String[].

allcompletions is only called when completions(r, sofar::AbstractString) is not implemented.

DataToolkitBase.complete_repl_cmdMethod
complete_repl_cmd(line::AbstractString; commands::Vector{ReplCmd}=REPL_CMDS)

Return potential completion candidates for line provided by commands. More specifically, the command being completed is identified and completions(cmd::ReplCmd{:cmd}, sofar::AbstractString) called.

Special behaviour is implemented for the help command.

DataToolkitBase.completionsMethod
completions(r::ReplCmd, sofar::AbstractString)

Obtain a list of String completion candidates based on sofar. All candidates should begin with sofar.

Should this function not be implemented for the specific ReplCmd r, allcompletions(r) will be called and filter to candidates that begin with sofar.

If r has subcommands, then the subcommand prefix will be removed and completions re-called on the relevant subcommand.

DataToolkitBase.config_getMethod
config_get(propertypath::Vector{String};
           collection::DataCollection=first(STACK), quiet::Bool=false)

Obtain the configuration value at propertypath in collection.

When no value is set, nothing is returned instead and if quiet is unset "unset" is printed.

DataToolkitBase.config_setMethod
config_set([collection::DataCollection=first(STACK)], propertypath::Vector{String}, value::Any;
           quiet::Bool=false)

Return a variation of collection with the configuration at propertypath set to value.

Unless quiet is set, a success message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

DataToolkitBase.config_unsetMethod
config_unset([collection::DataCollection=first(STACK)], propertypath::Vector{String};
              quiet::Bool=false)

Return a variation of collection with the configuration at propertypath removed.

Unless quiet is set, a success message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

DataToolkitBase.confirm_ynFunction
confirm_yn(question::AbstractString, default::Bool=false)

Interactively ask question and accept y/Y/n/N as the response. If any other key is pressed, then default will be taken as the response. A " [y/n]: " string will be appended to the question, with y/n capitalised to indicate the default value.

Example

julia> confirm_yn("Do you like chocolate?", true)
Do you like chocolate? [Y/n]: y
true
DataToolkitBase.createMethod
create(T::Type{<:AbstractDataTransformer}, source::String, dataset::DataSet)

If source/dataset can be used to construct a data transformer of type T, do so and return it. Otherwise return nothing.

Specific transformers should implement specialised forms of this function, that either return nothing to indicate that it is not applicable, or a "create spec form". A "create spec form" is simply a list of key::String => value entries, giving properties of the to-be-created transformer, e.g.

["foo" => "bar",
 "baz" => 2]

In addition to accepting TOML-representable values, a NamedTuple value can be given that specifies an interactive prompt to put to the user.

(; prompt::String = "$key",
   type::Type{String or Bool or <:Number} = String,
   default::type = false or "",
   optional::Bool = false,
   skipvalue::Any = nothing,
   post::Function = identity)

The value can also be a Function that takes the current specification as an argument and returns a TOML-representable value or NamedTuple.

Lastly true/false can be returned as a convenient way of simply indicating whether an empty (no parameters) driver should be created.

DataToolkitBase.createMethod
create(T::Type{<:AbstractDataTransformer}, driver::Symbol, source::String, dataset::DataSet;
       minpriority::Int=-100, maxpriority::Int=100)

Create a new T with driver driver from source/dataset.

If driver is the symbol * then all possible drivers are checked and the highest priority (according to createpriority) valid driver used. Drivers with a priority outside minprioritymaxpriority will not be considered.

The created data transformer is returned, unless the given driver is not valid, in which case nothing is returned instead.

DataToolkitBase.createpriorityMethod
createpriority(T::Type{<:AbstractDataTransformer})

The priority with which a transformer of type T should be created. This can be any integer, but try to keep to -100–100 (see create).

DataToolkitBase.datasetMethod
dataset([collection::DataCollection], identstr::AbstractString, [parameters::Dict{String, Any}])
dataset([collection::DataCollection], identstr::AbstractString, [parameters::Pair{Symbol, Any}...])

Return the data set identified by identstr, optionally specifying the collection the data set should be found in and any parameters that apply.

DataToolkitBase.dataset_parametersMethod
dataset_parameters(source::Union{DataCollection, DataSet, AbstractDataTransformer},
                   action::Val{:extract|:resolve|:encode}, value::Any)

Obtain a form (depending on action) of value, a property within source.

Actions

:extract Look for DataSet references ("📇DATASET<<...>>") within value, and turn them into Identifiers (the inverse of :encode).

:resolve Look for Identifiers in value, and resolve them to the referenced DataSet/value.

:encode Look for Identifiers in value, and turn them into DataSet references (the inverse of :extract).

DataToolkitBase.displaytableMethod
displaytable(headers::Vector, rows::Vector{<:Vector};
             spacing::Integer=2, maxwidth::Int=80)

Prepend the displaytable for rows with a header row given by headers.

DataToolkitBase.displaytableMethod
displaytable(rows::Vector{<:Vector};
             spacing::Integer=2, maxwidth::Int=80)

Return a vector of strings, formed from each row in rows.

Each string is of the same displaywidth, and individual values are separated by spacing spaces. Values are truncated if necessary to ensure the no row is no wider than maxwidth.

DataToolkitBase.execute_repl_cmdMethod
execute_repl_cmd(line::AbstractString;
                 commands::Vector{ReplCmd}=REPL_CMDS,
                 scope::String="Data REPL")

Examine line and identify the leading command, then:

  • Show an error if the command is not given in commands
  • Show help, if help is asked for (see help_show)
  • Call the command's execute function, if applicable
  • Call execute_repl_cmd on the argument with commands set to the command's subcommands and scope set to the command's trigger, if applicable
DataToolkitBase.find_repl_cmdMethod
find_repl_cmd(cmd::AbstractString; warn::Bool=false,
              commands::Vector{ReplCmd}=REPL_CMDS,
              scope::String="Data REPL")

Examine the command string cmd, and look for a command from commands that is uniquely identified. Either the identified command or nothing will be returned.

Should cmd start with help or ? then a ReplCmd{:help} command is returned.

If cmd is ambiguous and warn is true, then a message listing all potentially matching commands is printed.

If cmd does not match any of commands and warn is true, then a warning message is printed. Additionally, should the named command in cmd have more than a 3/5th longest common subsequence overlap with any of commands, then those commands are printed as suggestions.

DataToolkitBase.fromspecMethod
fromspec(ADT::Type{<:AbstractDataTransformer}, dataset::DataSet, spec::Dict{String, Any})

Create an ADT of dataset according to spec.

ADT can either contain the driver name as a type parameter, or it will be read from the "driver" key in spec.

DataToolkitBase.fromspecMethod
fromspec(::Type{DataCollection}, spec::Dict{String, Any};
         path::Union{String, Nothing}=nothing, mod::Module=Base.Main)

Create a DataCollection from spec.

The path and mod keywords are used as the values for the corresponding fields in the DataCollection.

DataToolkitBase.fromspecMethod
fromspec(::Type{DataSet}, collection::DataCollection, name::String, spec::Dict{String, Any})

Create a DataSet for collection called name, according to spec.

DataToolkitBase.get_packageMethod
get_package(pkg::Base.PkgId)
get_package(from::Module, name::Symbol)

Obtain a module specified by either pkg or identified by name and declared by from. Should the package not be currently loaded, in Julia ≥ 1.7 DataToolkit will attempt to lazy-load the package and return its module.

Failure to either locate name or require pkg will result in an exception being thrown.

DataToolkitBase.getlayerMethod
getlayer(name::AbstractString)
getlayer(uuid::UUID)

Find the DataCollection in STACK with name/uuid.

DataToolkitBase.helpMethod
help(r::ReplCmd)

Print the help string for r.

help(r::ReplCmd{<:Any, Vector{ReplCmd}})

Print the help string and subcommand table for r.

DataToolkitBase.help_cmd_tableMethod
help_cmd_table(; maxwidth::Int=displaysize(stdout)[2],
               commands::Vector{ReplCmd}=REPL_CMDS,
               sub::Bool=false)

Print a table showing the triggers and descriptions (limited to the first line) of commands, under the headers "Command" and "Action" (or "Subcommand" if sub is set). The table is truncated if necessary so it is no wider than maxwidth.

DataToolkitBase.help_showMethod
help_show(cmd::AbstractString; commands::Vector{ReplCmd}=REPL_CMDS)

If cmd refers to a command in commands, show its help (via help). If cmd is empty, list commands via help_cmd_table.

DataToolkitBase.help_showMethod
help_show(transformer::Symbol)

Show documentation of a particular data transformer (should it exist).

In the special case that transformer is Symbol(""), a list of all documented transformers is printed.

DataToolkitBase.highlight_lcsMethod
highlight_lcs(io::IO, a::String, b::String;
              before::String="\e[1m", after::String="\e[22m",
              invert::Bool=false)

Print a, highlighting the longest common subsequence between a and b by inserting before prior to each subsequence region and after afterwards.

If invert is set, the before/after behaviour is switched.

DataToolkitBase.initMethod
init(name::Union{AbstractString, Missing},
     path::Union{AbstractString, Nothing};
     uuid::UUID=uuid4(), plugins::Vector{String}=DEFAULT_PLUGINS,
     write::Bool=true, addtostack::Bool=true, quiet::Bool=false)

Create a new data collection.

This can be an in-memory data collection, when path is set to nothing, or a collection which corresponds to a Data TOML file, in which case path should be set to either a path to a .toml file or a directory in which a Data.toml file should be placed.

When path is a string and write is set, the data collection file will be immediately written, overwriting any existing file at the path.

When addtostack is set, the data collection will also be added to the top of the data collection stack.

Unless quiet is set, a message will be send to stderr reporting successful creating of the data collection file.

Example

julia> init("test", "/tmp/test/Data.toml")
DataToolkitBase.init_replMethod
init_repl()

Construct the Data REPL LineEdit.Prompt and configure it and the REPL to behave appropriately. Other than boilerplate, this basically consists of:

  • Setting the prompt style
  • Setting the execution function (toplevel_execute_repl_cmd)
  • Setting the completion to use DataCompletionProvider
DataToolkitBase.invokepkglatestMethod
invokepkglatest(f, args...; kwargs...)

Call f(args...; kwargs...) via invokelatest, and re-run if PkgRequiredRerunNeeded is returned.

DataToolkitBase.issubseqMethod
issubseq(a, b)

Return true if a is a subsequence of b, false otherwise.

Examples

julia> issubseq("abc", "abc")
true

julia> issubseq("adg", "abcdefg")
true

julia> issubseq("gda", "abcdefg")
false
DataToolkitBase.lintMethod
lint(obj::T)

Call all of the relevant linter functions on obj. More specifically, the method table is searched for lint(obj::T, ::Val{:linter_id}) methods (where :linter_id is a stand-in for the actual IDs used), and each specific lint function is invoked and the results combined.

Note

Each specific linter function should return a vector of relevant LintItems, i.e.

lint(obj::T, ::Val{:linter_id}) -> Union{Vector{LintItem{T}}, LintItem{T}, Nothing}

See the documentation on LintItem for more information on how it should be constructed.

DataToolkitBase.lintfixFunction
lintfix(report::LintReport)

Attempt to fix as many issues raised in report as possible.

DataToolkitBase.loadFunction
load(loader::DataLoader{driver}, source::Any, as::Type)

Using a certain loader, obtain information in the form of as from the data given by source.

This fulfils this component of the overall data flow:

  ╭────loader─────╮
  ╵               ▼
Data          Information

When the loader produces nothing this is taken to indicate that it was unable to load the data for some reason, and that another loader should be tried if possible. This can be considered a soft failure. Any other value is considered valid information.

DataToolkitBase.loadcollection!Function
loadcollection!(source::Union{<:AbstractString, <:IO}, mod::Module=Base.Main;
                soft::Bool=false, index::Int=1)

Load a data collection from source and add it to the data stack at index. source must be accepted by read(source, DataCollection).

mod should be set to the Module within which loadcollection! is being invoked. This is important when code is run by the collection. As such, it is usually appropriate to call:

loadcollection!(source, @__MODULE__; soft)

When soft is set, should an data collection already exist with the same UUID, nothing will be done and nothing will be returned.

DataToolkitBase.longest_common_subsequenceMethod
longest_common_subsequence(a, b)

Find the longest common subsequence of b within a, returning the indices of a that comprise the subsequence.

This function is intended for strings, but will work for any indexable objects with == equality defined for their elements.

Example

julia> longest_common_subsequence("same", "same")
4-element Vector{Int64}:
 1
 2
 3
 4

julia> longest_common_subsequence("fooandbar", "foobar")
6-element Vector{Int64}:
 1
 2
 3
 7
 8
 9
DataToolkitBase.natkeygenMethod
natkeygen(key::String)

Generate a sorting key for key that when used with sort will put the collection in "natural order".

julia> natkeygen.(["A1", "A10", "A02", "A1.5"])
4-element Vector{Vector{AbstractString}}:
 ["a", "0\x01"]
 ["a", "0\n"]
 ["a", "0\x02"]
 ["a", "0\x015"]

julia> sort(["A1", "A10", "A02", "A1.5"], by=natkeygen)
4-element Vector{String}:
 "A1"
 "A1.5"
 "A02"
 "A10"
DataToolkitBase.peelwordMethod
peelword(input::AbstractString)

Read the next 'word' from input. If input starts with a quote, this is the unescaped text between the opening and closing quote. Other wise this is simply the next word.

Returns a tuple of the form (word, rest).

Example

julia> peelword("one two")
("one", "two")

julia> peelword(""one two" three")
("one two", "three")
DataToolkitBase.plugin_addMethod
plugin_add([collection::DataCollection=first(STACK)], plugins::Vector{<:AbstractString};
           quiet::Bool=false)

Return a variation of collection with all plugins not currently used added to the plugin list.

Unless quiet is a set an informative message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

DataToolkitBase.plugin_infoMethod
plugin_info(plugin::AbstractString; quiet::Bool=false)

Fetch the documentation of plugin, or return nothing if documentation could not be fetched.

If quiet is not set warning messages will be omitted when no documentation could be fetched.

DataToolkitBase.plugin_listFunction
plugin_list(collection::DataCollection=first(STACK); quiet::Bool=false)

Obtain a list of plugins used in collection.

quiet is unused but accepted as an argument for the sake of consistency.

DataToolkitBase.plugin_removeMethod
plugin_remove([collection::DataCollection=first(STACK)], plugins::Vector{<:AbstractString};
              quiet::Bool=false)

Return a variation of collection with all plugins currently used removed from the plugin list.

Unless quiet is a set an informative message is printed.

Side effects

The new collection is written, if possible.

Should collection be part of STACK, the stack entry is updated in-place.

DataToolkitBase.promptFunction
prompt(question::AbstractString, default::AbstractString="",
       allowempty::Bool=false, cleardefault::Bool=true,
       multiline::Bool=false)

Interactively ask question and return the response string, optionally with a default value. If multiline is true, RET must be pressed twice consecutively to submit a value.

Unless allowempty is set an empty response is not accepted. If cleardefault is set, then an initial backspace will clear the default value.

The prompt supports the following line-edit-y keys:

  • left arrow
  • right arrow
  • home
  • end
  • delete forwards
  • delete backwards

Example

julia> prompt("What colour is the sky? ")
What colour is the sky? Blue
"Blue"
DataToolkitBase.prompt_charFunction
prompt_char(question::AbstractString, options::Vector{Char},
            default::Union{Char, Nothing}=nothing)

Interactively ask question, only accepting options keys as answers. All keys are converted to lower case on input. If default is not nothing and 'RET' is hit, then default will be returned.

Should '^C' be pressed, an InterruptException will be thrown.

DataToolkitBase.refineMethod
refine(collection::DataCollection, datasets::Vector{DataSet}, ident::Identifier)

Filter datasets (from collection) to data sets than match the identifier ident.

This function contains an advise entrypoint where plugins can apply further filtering, applied to the method refine(::Vector{DataSet}, ::Identifier, ::Vector{String}).

DataToolkitBase.refineMethod
refine(datasets::Vector{DataSet}, ::Identifier, ignoreparams::Vector{String})

This is a stub function that exists soley as as an advise point for data set filtering during resolution of an identifier.

DataToolkitBase.resolveFunction
resolve(identstr::AbstractString, parameters::Union{SmallDict{String, Any}, Nothing}=nothing;
        resolvetype::Bool=true, stack::Vector{DataCollection}=STACK)

Attempt to resolve the identifier given by identstr and parameters against each layer of the data stack in turn.

DataToolkitBase.resolveMethod
resolve(collection::DataCollection, ident::Identifier;
        resolvetype::Bool=true, requirematch::Bool=true)

Attempt to resolve an identifier (ident) to a particular data set. Matching data sets will searched for from collection.

When resolvetype is set and ident specifies a datatype, the identified data set will be read to that type.

When requirematch is set an error is raised should no dataset match ident. Otherwise, nothing is returned.

DataToolkitBase.resolveMethod
resolve(ident::Identifier; resolvetype::Bool=true, stack=STACK)

Attempt to resolve ident using the specified data layer, if present, trying every layer of the data stack in turn otherwise.

DataToolkitBase.saveFunction
save(writer::Datasaveer{driver}, destination::Any, information::Any)

Using a certain writer, save the information to the destination.

This fulfils this component of the overall data flow:

Data          Information
  ▲               ╷
  ╰────writer─────╯
DataToolkitBase.smallifyMethod
smallify(dict::Dict)

Create a SmallDict version of dict, with all contained Dicts recursively converted into SmallDicts.

DataToolkitBase.stack_indexMethod
stack_index(ident::Union{Int, String, UUID, DataCollection}; quiet::Bool=false)

Obtain the index of the data collection identified by ident on the stack, if it is present. If it is not found, nothing is returned and unless quiet is set a warning is printed.

DataToolkitBase.stack_moveMethod
stack_move(ident::Union{Int, String, UUID, DataCollection}, shift::Int; quiet::Bool=false)

Find ident in the data collection stack, and shift its position by shift, returning the new index. shift is clamped so that the new index lies within STACK.

If ident could not be resolved, then nothing is returned and unless quiet is set a warning is printed.

DataToolkitBase.stack_remove!Method
stack_remove!(ident::Union{Int, String, UUID, DataCollection}; quiet::Bool=false)

Find ident in the data collection stack and remove it from the stack, returning the index at which it was found.

If ident could not be resolved, then nothing is returned and unless quiet is set a warning is printed.

DataToolkitBase.storageMethod
storage(storer::DataStorage, as::Type; write::Bool=false)

Fetch a storer in form as, appropiate for reading from or writing to (depending on write).

By default, this just calls getstorage or putstorage (when write=true).

This executes this component of the overall data flow:

Storage ◀────▶ Data
DataToolkitBase.stringdistMethod
stringdist(a::AbstractString, b::AbstractString; halfcase::Bool=false)

Calculate the Restricted Damerau-Levenshtein distance (aka. Optimal String Alignment) between a and b.

This is the minimum number of edits required to transform a to b, where each edit is a deletion, insertion, substitution, or transposition of a character, with the restriction that no substring is edited more than once.

When halfcase is true, substitutions that just switch the case of a character cost half as much.

Examples

julia> stringdist("The quick brown fox jumps over the lazy dog",
                  "The quack borwn fox leaps ovver the lzy dog")
7

julia> stringdist("typo", "tpyo")
1

julia> stringdist("frog", "cat")
4

julia> stringdist("Thing", "thing", halfcase=true)
0.5
DataToolkitBase.stringsimilarityMethod
stringsimilarity(a::AbstractString, b::AbstractString; halfcase::Bool=false)

Return the stringdist as a proportion of the maximum length of a and b, take one. When halfcase is true, case switches cost half as much.

Example

julia> stringsimilarity("same", "same")
1.0

julia> stringsimilarity("semi", "demi")
0.75

julia> stringsimilarity("Same", "same", halfcase=true)
0.875
DataToolkitBase.supportedtypesFunction
supportedtypes(ADT::Type{<:AbstractDataTransformer})::Vector{QualifiedType}

Return a list of types supported by the data transformer ADT.

This is used as the default value for the type key in the Data TOML. The list of types is dynamically generated based on the available methods for the data transformer.

In some cases, it makes sense for this to be explicitly defined for a particular transformer.

DataToolkitBase.tomlreformat!Method
tomlreformat!(io::IO)

Consume io representing a TOML file, and reformat it to improve readability. Currently this takes the form of the following changes:

  • Replace inline multi-line strings with multi-line toml strings.

An IOBuffer containing the reformatted content is returned.

The processing assumes that io contains TOML.print-formatted content. Should this not be the case, mangled TOML may be emitted.

DataToolkitBase.toplevel_execute_repl_cmdMethod
toplevel_execute_repl_cmd(line::AbstractString)

Call execute_repl_cmd(line), but gracefully catch an InterruptException if thrown.

This is the main entrypoint for command execution.

DataToolkitBase.tospecMethod
tospec(thing::AbstractDataTransformer)
tospec(thing::DataSet)
tospec(thing::DataCollection)

Return a Dict representation of thing for writing as TOML.

DataToolkitBase.transformer_docsFunction
transformer_docs(name::Symbol, type::Symbol=:any)

Return the documentation for the transformer identified by name, or nothing if no documentation entry could be found.

DataToolkitBase.typeifyMethod
typeify(qt::QualifiedType; mod::Module=Main)

Convert qt to a Type available in mod, if possible. If this cannot be done, nothing is returned instead.

DataToolkitBase.@addpkgMacro
@addpkg name::Symbol uuid::String

Register the package identified by name with UUID uuid. This package may now be used with @import $name.

All @addpkg statements should lie within a module's __init__ function.

Example

@addpkg CSV "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataToolkitBase.@adviseMacro
@advise [source] f(args...; kwargs...)

Convert a function call f(args...; kwargs...) to an advised function call, where the advise collection is obtained from source or the first data-like* value of args.

* i.e. a DataCollection, DataSet, or AbstractDataTransformer

For example, @advise myfunc(other, somedataset, rest...) is equivalent to somedataset.collection.advise(myfunc, other, somedataset, rest...).

This macro performs a fairly minor code transformation, but should improve clarity.

DataToolkitBase.@datapluginMacro
@dataplugin plugin_variable
@dataplugin plugin_variable :default

Register the plugin given by the variable plugin_variable, along with its documentation (fetched by @doc). Should :default be given as the second argument the plugin is also added to the list of default plugins.

This effectievly serves as a minor, but appreciable, convenience for the following pattern:

push!(PLUGINS, myplugin)
PLUGINS_DOCUMENTATION[myplugin.name] = @doc myplugin
push!(DEFAULT_PLUGINS, myplugin.name) # when also adding to defaults