Saving Tools

This page discusses numerous tools that can significantly improve process of saving & loading files, always in a scientific context.

These tools are also used in the examples demonstrated in the Real World Examples page. After reading the proper documentation here it might be worth it to have a look there as well!

In DrWatson we save and load files with the functions wsave(filename, data) and wload(filename). These functions are further used in the tools below, like e.g. tagsave and can be overloaded for your own specific datatype.

In addition, wsaveensures that mkpath is always called on the path you are trying to save your file at. We all know how unpleasant it is to run a 2-hour simulation and save no data because FileIO.save complains that the path you are trying to save at does not exist...

To overload the saving part, add a new method to DrWatson._wsave(filename, ::YourType) (notice the _). By overloading _wsave you get all the extra functionality of tagsave, safesave, etc., for free for your own types (tagsave requires that you save your data as a dictionary).

Saving and loading fallback

By default we fallback to FileIO.save and FileIO.load for and types. This means that you have to install yourself whatever saving backend you want to use. FileIO by itself does not install a package that saves data, it only provides the interface!

The suffix of the file name determines which package will be used for actually saving the file. It is your responsibility to know how the saving package works and what input it expects!

Safely saving data

Almost all packages that save data by default overwrite existing files (if given a save name of an existing file). This is the default behavior because often it is desired.

Sometimes it is not though! And the consequences of overwritten data can range from irrelevant to catastrophic. To avoid such an event we provide an alternative way to save data that will never overwrite existing files:

DrWatson.safesaveFunction
safesave(filename, data)

Safely save data in filename by ensuring that no existing files are overwritten. Do this by renaming already existing data with a backup-number ending like #1, #2, .... For example if filename = test.bson, the first time you safesave it, the file is saved normally. The second time the existing save is renamed to test_#1.bson and a new file test.bson is then saved.

If a backup file already exists then its backup-number is incremented (e.g. going from #2 to #3). For example safesaving test.bson a third time will rename the old test_#1.bson to test_#2.bson, rename the old test.bson to test_#1.bson and then save a new test.bson with the latest data.

See also tagsave.

Tagging a run using Git

For reproducibility reasons (and also to not go insane when asking "HOW DID I GET THOSE RESUUUULTS") it is useful to "tag" any simulation/result/process using the Git status of the repository.

To this end we have some functions that can be used to ensure reproducibility:

DrWatson.tagsaveFunction
tagsave(file::String, d::Dict; safe = false, gitpath = projectdir(), storepatch = true, force = false)

First tag! dictionary d and then save d in file. If safe = true save the file using safesave.

"Tagging" means that when saving the dictionary, an extra field :gitcommit is added to establish reproducibility of results using Git. If the Git repository is dirty, one more field :gitpatch is added that stores the difference string. If a dictionary already contains a key :gitcommit, it is not overwritten, unless, force=true. For more details, see tag!.

DrWatson.@tagsaveMacro
@tagsave(file::String, d::Dict; kwargs...)

Same as tagsave but one more field :script is added that records the local path of the script and line number that called @tagsave, see @tag!.

The functions also incorporate safesave if need be.

Low level functions

@tagsave internally uses the following low level functions:

DrWatson.tag!Function
tag!(d::Dict; gitpath = projectdir(), storepatch = true, force = false) -> d

Tag d by adding an extra field gitcommit which will have as value the gitdescribe of the repository at gitpath (by default the project's gitpath). Do nothing if a key gitcommit already exists (unless force=true then replace with the new value) or if the Git repository is not found. If the git repository is dirty, i.e. there are un-commited changes, then the output of git diff HEAD is stored in the field gitpatch. Note that patches for binary files are not stored.

Notice that the key-type of the dictionary must be String or Symbol. If String is a subtype of the value type of the dictionary, this operation is in-place. Otherwise a new dictionary is created and returned.

To restore a repository to the state of a particular model-run do:

  1. checkout the relevant commit with git checkout xyz where xyz is the value stored
  2. apply the patch git apply patch, where the string stored in the gitpatch field needs to be written to the file patch.

Examples

julia> d = Dict(:x => 3, :y => 4)
Dict{Symbol,Int64} with 2 entries:
  :y => 4
  :x => 3

julia> tag!(d)
Dict{Symbol,Any} with 3 entries:
  :y => 4
  :gitcommit => "96df587e45b29e7a46348a3d780db1f85f41de04"
  :x => 3
DrWatson.@tag!Macro
@tag!(d, gitpath = projectdir(), storepatch = true, force = false) -> d

Do the same as tag! but also add another field script that has the path of the script that called @tag!, relative with respect to gitpath. The saved string ends with #line_number, which indicates the line number within the script that @tag! was called at.

Examples

julia> d = Dict(:x => 3)Dict{Symbol,Int64} with 1 entry:
  :x => 3

julia> @tag!(d) # running from a script or inline evaluation of Juno
Dict{Symbol,Any} with 3 entries:
  :gitcommit => "618b72bc0936404ab6a4dd8d15385868b8299d68"
  :script => "test\stools_tests.jl#10"
  :x      => 3
DrWatson.gitdescribeFunction
gitdescribe(gitpath = projectdir()) -> gitstr

Return a string gitstr with the output of git describe if an annotated git tag exists, otherwise the current active commit id of the Git repository present in gitpath, which by default is the currently active project. If the repository is dirty when this function is called the string will end with "_dirty".

Return nothing if gitpath is not a Git repository, i.e. a directory within a git repository.

The format of the git describe output in general is

`"TAGNAME-[NUMBER_OF_COMMITS_AHEAD-]gLATEST_COMMIT_HASH[_dirty]"`

If the latest tag is v1.2.3 and there are 5 additional commits while the latest commit hash is 334a0f225d9fba86161ab4c8892d4f023688159c, the output will be v1.2.3-5-g334a0f. Notice that git will shorten the hash if there are no ambiguous commits.

More information about the git describe output can be found on (https://git-scm.com/docs/git-describe)

See also tag!.

Examples

julia> gitdescribe() # a tag exists
"v1.2.3-g7364ab"

julia> gitdescribe() # a tag doesn't exist
"96df587e45b29e7a46348a3d780db1f85f41de04"

julia> gitdescribe(path_to_a_dirty_repo)
"3bf684c6a115e3dce484b7f200b66d3ced8b0832_dirty"
DrWatson.gitpatchFunction
gitpatch(gitpath = projectdir())

Generates a patch describing the changes of a dirty repository compared to its last commit; i.e. what git diff HEAD produces. The gitpath needs to point to a directory within a git repository, otherwise nothing is returned.

Be aware that gitpatch needs a working installation of Git, that can be found in the current PATH.

Please notice that tag! will operate in place only when possible. If not possible then a new dictionary is returned. Also (importantly) these functions will never error as they are most commonly used when saving simulations and this could risk data not being saved!

Produce or Load

produce_or_load is a function that very conveniently integrates with savename to either load a file if it exists, or if it doesn't to produce it, save it and then return it!

This saves you the effort of checking if a file exists and then loading, or then running some code and saving, or writing a bunch of if clauses in your code. In addition, it attempts to minimize computing energy spent on getting a result.

DrWatson.produce_or_loadFunction
produce_or_load([path="",] c, f; kwargs...) -> file, s

Let s = joinpath(path, savename(prefix, c, suffix)). If a file named s exists then load it and return it, along with the global path that it is saved at (s).

If the file does not exist then call file = f(c), with f your function that produces your data. Then save file as s and then return file, s. The function f must return a dictionary, the macros @dict and @strdict can help with that.

Keywords

  • tag = true : Save the file using tagsave.
  • gitpath = projectdir() : Path to search for a Git repo.
  • suffix = "bson", prefix = default_prefix(c) : Used in savename.
  • force = false : If true then don't check if file s exists and produce it and save it anyway.
  • loadfile = true : If false, this function does not actually load the file, but only checks if it exists. The return value in this case is always nothing, s, regardless of whether the file exists or not. If it doesn't exist it is still produced and saved.
  • verbose = true : print info about the process, if the file doesn't exist.
  • kwargs... : All other keywords are propagated to savename.

See also savename.

See Stopping "Did I run this?" for an example usage of produce_or_load.

Converting a struct to a dictionary

savename gives great support for getting a name out of any Julia composite type. To save something though, one needs a dictionary. So the following function can be conveniently used to directly save a struct using any saving function:

DrWatson.struct2dictFunction
struct2dict(s) -> d

Convert a Julia composite type s to a dictionary d with key type Symbol that maps each field of s to its value. This can be useful in e.g. saving:

tagsave(savename(s), struct2dict(s))