Logging and Graphing

Dagger's scheduler keeps track of the important and potentially expensive actions it does, such as moving data between workers or executing thunks, and tracks how much time and memory allocations these operations consume, among other things. It does it through the TimespanLogging.jl package (which used to be directly integrated into Dagger). Saving this information somewhere accessible is disabled by default, but it's quite easy to turn it on, by setting a "log sink" in the Context being used, as ctx.log_sink. A variety of log sinks are built-in to TimespanLogging; the NoOpLog is the default log sink when one isn't explicitly specified, and disables logging entirely (to minimize overhead). There are currently two other log sinks of interest; the first and newer of the two is the MultiEventLog, which generates multiple independent log streams, one per "consumer" (details in the next section). The second and older sink is the LocalEventLog, which is explained later in this document. Most users are recommended to use the MultiEventLog since it's far more flexible and extensible, and is more performant in general.


The MultiEventLog is intended to be configurable to exclude unnecessary information, and to include any built-in or user-defined metrics. It stores a set of "sub-log" streams internally, appending a single element to each of them when an event is generated. This element can be called a "sub-event" (to distinguish it from the higher-level "event" that Dagger creates), and is created by a "consumer". A consumer is a function or callable struct that, when called with the Event object generated by TimespanLogging, returns a sub-event characterizing whatever information the consumer represents. For example, the Dagger.Events.BytesAllocd consumer calculates the total bytes allocated and live at any given time within Dagger, and returns the current value when called. Let's construct one:

ctx = Context()
ml = TimespanLogging.MultiEventLog()

# Add the BytesAllocd consumer to the log as `:bytes`
ml[:bytes] = Dagger.Events.BytesAllocd()

ctx.log_sink = ml

As we can see above, each consumer gets a unique name as a Symbol that identifies it. Now that the log sink is attached with a consumer, we can execute some Dagger tasks, and then collect the sub-events generated by BytesAllocd:

# Using the lazy API, for explanatory purposes
collect(ctx, delayed(+)(1, delayed(*)(3, 4))) # Allocates 8 bytes
log = TimspanLogging.get_logs!(ctx)[1] # Get the logs for worker 1
@show log[:bytes]

You'll then see that 8 bytes are allocated and then freed during the process of executing and completing those tasks.

Note that the MultiEventLog can also be used perfectly well when using Dagger's eager API:

ctx = Dagger.Sch.eager_context()
ctx.log_sink = ml

a = Dagger.@spawn 3*4
Dagger.@spawn 1+a

There are a variety of other consumers built-in to TimespanLogging and Dagger, under the TimespanLogging.Events and Dagger.Events modules, respectively; see Dagger Types and TimespanLogging Types for details.

The MultiEventLog also has a mechanism to call a set of functions, called "aggregators", after all consumers have been executed, and are passed the full set of log streams as a Dict{Symbol,Vector{Any}}. The only one currently shipped with TimespanLogging directly is the LogWindow, and DaggerWebDash.jl has the TableStorage which integrates with it; see DaggerWebDash Types for details.


The LocalEventLog is generally only useful when you want combined events (event start and finish combined as a single unit), and only care about a few simple built-in generated events. Let's attach one to our context:

ctx = Context()
log = TimespanLogging.LocalEventLog()
ctx.log_sink = log

Now anytime ctx is used as the context for a scheduler, the scheduler will log events into log.

Once sufficient data has been accumulated into a LocalEventLog, it can be gathered to a single host via TimespanLogging.get_logs!(log). The result is a Vector of TimespanLogging.Timespan objects, which describe some metadata about an operation that occured and the scheduler logged. These events may be introspected directly, or may also be rendered to a DOT-format string:

logs = TimespanLogging.get_logs!(log)
str = Dagger.show_plan(logs)

Dagger.show_plan can also be called as Dagger.show_plan(io::IO, logs) to write the graph to a file or other IO object. The string generated by this function may be passed to an external tool like Graphviz for rendering. Note that this method doesn't display input arguments to the DAG (non-Thunks); you can call Dagger.show_plan(logs, thunk), where thunk is the output Thunk of the DAG, to render argument nodes.


TimespanLogging.get_logs! clears out the event logs, so that old events don't mix with new ones from future DAGs.

As a convenience, it's possible to set ctx.log_file to the path to an output file, and then calls to compute(ctx, ...)/collect(ctx, ...) will automatically write the graph in DOT format to that path. There is also a benefit to this approach over manual calls to get_logs! and show_plan: DAGs which aren't Thunks (such as operations on the Dagger.DArray) will be properly rendered with input arguments (which normally aren't rendered because a Thunk is dynamically generated from such operations by Dagger before scheduling).


The FilterLog exists to allow writing events to a user-defined location (such as a database, file, or network socket). It is not currently tested or documented.