AutoGrad.AutoGradModule

Usage:

x = Param([1,2,3])          # The user declares parameters with Param
y = @diff sum(x .* x)       # computes gradients using @diff
grad(y,x) => [2,4,6]        # looks up the gradient of a parameter with grad

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Params used in the computation.

grad(y, x) returns the gradient of a @diff result y with respect to any parameter x::Param. (nothing may be returned if the gradient is 0).

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an iterator of Params found by a recursive search of object x, which is typically a model or a @diff result.

Alternative usage:

x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.ParamType

Usage:

x = Param([1,2,3])          # The user declares parameters with Param
y = @diff sum(x .* x)       # computes gradients using @diff
grad(y,x) => [2,4,6]        # looks up the gradient of a parameter with grad

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Params used in the computation.

grad(y, x) returns the gradient of a @diff result y with respect to any parameter x::Param. (nothing may be returned if the gradient is 0).

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an iterator of Params found by a recursive search of object x, which is typically a model or a @diff result.

Alternative usage:

x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.SparseType
Sparse(container, values, indices)

Create a sparse container to make gradient calculations efficient. s::Sparse represents the value a as defined below:

a = zero(s.container)
for (idx, val) in zip(s.indices, s.values)
a[idx] .+= val
end

except when there are repeated indices in idx, the corresponding values get added rather than being overwritten. See https://github.com/JuliaLang/julia/issues/31392.

AutoGrad.addto!Function
addto!(accumulator, newval)

Add newval to accumulator and return the result. Used in outgrad calculations. The outgrad values start as nothing representing a 0 gradient. Then they are incremented using values of type Number, Tuple, AbstractDict, AbstractArray, Nothing and AutoGrad.Sparse. The accumulator and the newval types must match: Nothing matches all types, Sparse matches types that match its container, other types must match themselves. addto! handles repeated indices in newval by adding all corresponding values to the accumulator.

AutoGrad.cat1dMethod
cat1d(args...)

Return vcat(vec.(args)...) but possibly more efficiently. Can be used to concatenate the contents of arrays with different shapes and sizes.

AutoGrad.gcheckFunction
gcheck(f, x...; kw, o...)
@gcheck f(x...; kw...) (opt1=val1,opt2=val2,...)

Numerically check the gradient of f(x...; kw...) and return a boolean result.

Example call: gcheck(nll,model,x,y) or @gcheck nll(model,x,y). The parameters should be marked as Param arrays in f, x, and/or kw. Only 10 random entries in each large numeric array are checked by default. If the output of f is not a number, we check the gradient of sum(f(x...; kw...)). Keyword arguments:

• kw=(): keyword arguments to be passed to f, i.e. f(x...; kw...)
• nsample=10: number of random entries from each param to check
• atol=0.01,rtol=0.05: tolerance parameters. See isapprox for their meaning.
• delta=0.0001: step size for numerical gradient calculation.
• verbose=1: 0 prints nothing, 1 shows failing tests, 2 shows all tests.
AutoGrad.gradFunction

Usage:

x = Param([1,2,3])          # The user declares parameters with Param
y = @diff sum(x .* x)       # computes gradients using @diff
grad(y,x) => [2,4,6]        # looks up the gradient of a parameter with grad

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Params used in the computation.

grad(y, x) returns the gradient of a @diff result y with respect to any parameter x::Param. (nothing may be returned if the gradient is 0).

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an iterator of Params found by a recursive search of object x, which is typically a model or a @diff result.

Alternative usage:

x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.gradcheckMethod
gradcheck(f, x...; kwargs...)

Numerically check the gradient of f(x...) and return a boolean result.

Each argument can be a Number, Array, Tuple or Dict which in turn can contain other Arrays etc. Only 10 random entries in each large numeric array are checked by default. If the output of f is not a number, we check the gradient of sum(f(x...)). See also gcheck for a different take on marking parameters.

Keywords

• args=:: the argument indices to check gradients with respect to. Could be an array or range of indices or a single index. By default all arguments that have a length method are checked.

• kw=(): keyword arguments to be passed to f.

• nsample=10: number of random entries from each numeric array in gradient dw=(grad(f))(w,x...;o...) compared to their numerical estimates.

• atol=0.01: tolerance parameter. See isapprox for an explanation.

• rtol=0.05: tolerance parameter. See isapprox for an explanation.

• delta=0.0001: step size for numerical gradient calculation.

• verbose=1: 0 prints nothing, 1 shows failing tests, 2 shows all tests.

AutoGrad.gradlossFunction

Usage:

x = Param([1,2,3])          # The user declares parameters with Param
y = @diff sum(x .* x)       # computes gradients using @diff
grad(y,x) => [2,4,6]        # looks up the gradient of a parameter with grad

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Params used in the computation.

grad(y, x) returns the gradient of a @diff result y with respect to any parameter x::Param. (nothing may be returned if the gradient is 0).

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an iterator of Params found by a recursive search of object x, which is typically a model or a @diff result.

Alternative usage:

x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.paramsFunction

Usage:

x = Param([1,2,3])          # The user declares parameters with Param
y = @diff sum(x .* x)       # computes gradients using @diff
grad(y,x) => [2,4,6]        # looks up the gradient of a parameter with grad

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Params used in the computation.

grad(y, x) returns the gradient of a @diff result y with respect to any parameter x::Param. (nothing may be returned if the gradient is 0).

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an iterator of Params found by a recursive search of object x, which is typically a model or a @diff result.

Alternative usage:

x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.randcheckFunction

Test a numeric function with Float32/64 randn scalars and randn arrays, possibly transforming the input to match the domain

AutoGrad.unbroadcastMethod
unbroadcast(x,dx)

Bring dx to x's size via unbroadcasting (reduction). This is needed when defining gradients of multi-argument broadcasting functions where the arguments and the result may be of different sizes.

AutoGrad.valueFunction

Usage:

x = Param([1,2,3])          # The user declares parameters with Param
y = @diff sum(x .* x)       # computes gradients using @diff
grad(y,x) => [2,4,6]        # looks up the gradient of a parameter with grad

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Params used in the computation.

grad(y, x) returns the gradient of a @diff result y with respect to any parameter x::Param. (nothing may be returned if the gradient is 0).

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an iterator of Params found by a recursive search of object x, which is typically a model or a @diff result.

Alternative usage:

x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.@diffMacro

Usage:

x = Param([1,2,3])          # The user declares parameters with Param
y = @diff sum(x .* x)       # computes gradients using @diff
grad(y,x) => [2,4,6]        # looks up the gradient of a parameter with grad

Param(x) returns a struct that acts like x but marks it as a parameter you want to compute gradients with respect to.

@diff expr evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Params used in the computation.

grad(y, x) returns the gradient of a @diff result y with respect to any parameter x::Param. (nothing may be returned if the gradient is 0).

value(x) returns the value associated with x if x is a Param or the output of @diff, otherwise returns x.

params(x) returns an iterator of Params found by a recursive search of object x, which is typically a model or a @diff result.

Alternative usage:

x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
gradloss(f)(x) => ([2 4 6], 14)

Given a scalar valued function f, grad(f,argnum=1) returns another function g which takes the same inputs as f and returns the gradient of the output with respect to the argnum'th argument. gradloss is similar except the resulting function also returns f's output.

AutoGrad.@gcheckMacro
gcheck(f, x...; kw, o...)
@gcheck f(x...; kw...) (opt1=val1,opt2=val2,...)

Numerically check the gradient of f(x...; kw...) and return a boolean result.

Example call: gcheck(nll,model,x,y) or @gcheck nll(model,x,y). The parameters should be marked as Param arrays in f, x, and/or kw. Only 10 random entries in each large numeric array are checked by default. If the output of f is not a number, we check the gradient of sum(f(x...; kw...)). Keyword arguments:

• kw=(): keyword arguments to be passed to f, i.e. f(x...; kw...)
• nsample=10: number of random entries from each param to check
• atol=0.01,rtol=0.05: tolerance parameters. See isapprox for their meaning.
• delta=0.0001: step size for numerical gradient calculation.
• verbose=1: 0 prints nothing, 1 shows failing tests, 2 shows all tests.
AutoGrad.@primitiveMacro
@primitive  fx g1 g2...

Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as sign, and non-numeric functions such as size should be defined using the @zerograd macro instead.

Examples

@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y

@primitive sin(x::Number),dy  (dy.*cos(x))
@primitive hypot(x1,x2),dy,y  (dy.*x1./y)  (dy.*x2./y)

The first example shows that fx is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.

The second example specifies variable names for the output gradient dy and the output y after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in f(a::Int,b,c...;d=1). Parametric methods such as f(x::T) where {T<:Number} cannot be used.

The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.

Under the hood

The @primitive macro turns the first example into:

sin(x::Value{T}) where {T<:Number} = forw(sin, x)

This will cause calls to sin with a boxed argument (Value{T<:Number}) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:

hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T})        where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T)        where {S,T} = forw(hypot, x1, x2)

We want the forw method to be called if any one of the arguments is a boxed Value. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.

back(f,Arg{i},dy,y,x...) => dx[i]

For the third example here is the generated gradient method:

back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)

For the last example a different gradient method is generated for each argument:

back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y

In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.

Broadcasting is handled by extra forw and back methods. @primitive defines the following so that broadcasting of a primitive function with a boxed value triggers forw and back.

broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)

If you do not want the broadcasting methods, you can use the @primitive1 macro. If you only want the broadcasting methods use @primitive2. As a motivating example, here is how * is defined for non-scalars:

@primitive1 *(x1,x2),dy  (dy*x2')  (x1'*dy)
@primitive2 *(x1,x2),dy  unbroadcast(x1,dy.*x2)  unbroadcast(x2,x1.*dy)

Regular * is matrix multiplication, broadcasted * is elementwise multiplication and the two have different gradients as defined above. unbroadcast(a,b) reduces b to the same shape as a by performing the necessary summations.

AutoGrad.@primitive1Macro
@primitive  fx g1 g2...

Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as sign, and non-numeric functions such as size should be defined using the @zerograd macro instead.

Examples

@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y

@primitive sin(x::Number),dy  (dy.*cos(x))
@primitive hypot(x1,x2),dy,y  (dy.*x1./y)  (dy.*x2./y)

The first example shows that fx is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.

The second example specifies variable names for the output gradient dy and the output y after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in f(a::Int,b,c...;d=1). Parametric methods such as f(x::T) where {T<:Number} cannot be used.

The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.

Under the hood

The @primitive macro turns the first example into:

sin(x::Value{T}) where {T<:Number} = forw(sin, x)

This will cause calls to sin with a boxed argument (Value{T<:Number}) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:

hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T})        where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T)        where {S,T} = forw(hypot, x1, x2)

We want the forw method to be called if any one of the arguments is a boxed Value. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.

back(f,Arg{i},dy,y,x...) => dx[i]

For the third example here is the generated gradient method:

back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)

For the last example a different gradient method is generated for each argument:

back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y

In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.

Broadcasting is handled by extra forw and back methods. @primitive defines the following so that broadcasting of a primitive function with a boxed value triggers forw and back.

broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)

If you do not want the broadcasting methods, you can use the @primitive1 macro. If you only want the broadcasting methods use @primitive2. As a motivating example, here is how * is defined for non-scalars:

@primitive1 *(x1,x2),dy  (dy*x2')  (x1'*dy)
@primitive2 *(x1,x2),dy  unbroadcast(x1,dy.*x2)  unbroadcast(x2,x1.*dy)

Regular * is matrix multiplication, broadcasted * is elementwise multiplication and the two have different gradients as defined above. unbroadcast(a,b) reduces b to the same shape as a by performing the necessary summations.

AutoGrad.@primitive2Macro
@primitive  fx g1 g2...

Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as sign, and non-numeric functions such as size should be defined using the @zerograd macro instead.

Examples

@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y

@primitive sin(x::Number),dy  (dy.*cos(x))
@primitive hypot(x1,x2),dy,y  (dy.*x1./y)  (dy.*x2./y)

The first example shows that fx is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.

The second example specifies variable names for the output gradient dy and the output y after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in f(a::Int,b,c...;d=1). Parametric methods such as f(x::T) where {T<:Number} cannot be used.

The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.

Under the hood

The @primitive macro turns the first example into:

sin(x::Value{T}) where {T<:Number} = forw(sin, x)

This will cause calls to sin with a boxed argument (Value{T<:Number}) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:

hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T})        where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T)        where {S,T} = forw(hypot, x1, x2)

We want the forw method to be called if any one of the arguments is a boxed Value. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.

back(f,Arg{i},dy,y,x...) => dx[i]

For the third example here is the generated gradient method:

back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)

For the last example a different gradient method is generated for each argument:

back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y

In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.

Broadcasting is handled by extra forw and back methods. @primitive defines the following so that broadcasting of a primitive function with a boxed value triggers forw and back.

broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)

If you do not want the broadcasting methods, you can use the @primitive1 macro. If you only want the broadcasting methods use @primitive2. As a motivating example, here is how * is defined for non-scalars:

@primitive1 *(x1,x2),dy  (dy*x2')  (x1'*dy)
@primitive2 *(x1,x2),dy  unbroadcast(x1,dy.*x2)  unbroadcast(x2,x1.*dy)

Regular * is matrix multiplication, broadcasted * is elementwise multiplication and the two have different gradients as defined above. unbroadcast(a,b) reduces b to the same shape as a by performing the necessary summations.

AutoGrad.@zerogradMacro
@zerograd f(args...; kwargs...)

Define f as an AutoGrad primitive operation with zero gradient.

Example:

@zerograd  floor(x::Float32)

@zerograd allows f to handle boxed Value inputs by unboxing them like a @primitive, but unlike @primitive it does not record its actions or return a boxed Value result. Some functions, like sign(), have zero gradient. Others, like length() have discrete or constant outputs. These need to handle Value inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the @zerograd macro for those. Use the @zerograd1 variant if you don't want to define the broadcasting version and @zerograd2 if you only want to define the broadcasting version. Note that kwargs are NOT unboxed.

AutoGrad.@zerograd1Macro
@zerograd f(args...; kwargs...)

Define f as an AutoGrad primitive operation with zero gradient.

Example:

@zerograd  floor(x::Float32)

@zerograd allows f to handle boxed Value inputs by unboxing them like a @primitive, but unlike @primitive it does not record its actions or return a boxed Value result. Some functions, like sign(), have zero gradient. Others, like length() have discrete or constant outputs. These need to handle Value inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the @zerograd macro for those. Use the @zerograd1 variant if you don't want to define the broadcasting version and @zerograd2 if you only want to define the broadcasting version. Note that kwargs are NOT unboxed.

AutoGrad.@zerograd2Macro
@zerograd f(args...; kwargs...)

Define f as an AutoGrad primitive operation with zero gradient.

Example:

@zerograd  floor(x::Float32)

@zerograd allows f to handle boxed Value inputs by unboxing them like a @primitive, but unlike @primitive it does not record its actions or return a boxed Value result. Some functions, like sign(), have zero gradient. Others, like length() have discrete or constant outputs. These need to handle Value inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the @zerograd macro for those. Use the @zerograd1 variant if you don't want to define the broadcasting version and @zerograd2 if you only want to define the broadcasting version. Note that kwargs are NOT unboxed.