AutoGrad.AutoGrad
— ModuleUsage:
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
Param(x)
returns a struct that acts like x
but marks it as a parameter you want to compute gradients with respect to.
@diff expr
evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Param
s used in the computation.
grad(y, x)
returns the gradient of a @diff
result y
with respect to any parameter x::Param
. (nothing
may be returned if the gradient is 0).
value(x)
returns the value associated with x
if x
is a Param
or the output of @diff
, otherwise returns x
.
params(x)
returns an iterator of Param
s found by a recursive search of object x
, which is typically a model or a @diff
result.
Alternative usage:
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
Given a scalar valued function f
, grad(f,argnum=1)
returns another function g
which takes the same inputs as f
and returns the gradient of the output with respect to the argnum'th argument. gradloss
is similar except the resulting function also returns f's output.
AutoGrad.Param
— TypeUsage:
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
Param(x)
returns a struct that acts like x
but marks it as a parameter you want to compute gradients with respect to.
@diff expr
evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Param
s used in the computation.
grad(y, x)
returns the gradient of a @diff
result y
with respect to any parameter x::Param
. (nothing
may be returned if the gradient is 0).
value(x)
returns the value associated with x
if x
is a Param
or the output of @diff
, otherwise returns x
.
params(x)
returns an iterator of Param
s found by a recursive search of object x
, which is typically a model or a @diff
result.
Alternative usage:
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
Given a scalar valued function f
, grad(f,argnum=1)
returns another function g
which takes the same inputs as f
and returns the gradient of the output with respect to the argnum'th argument. gradloss
is similar except the resulting function also returns f's output.
AutoGrad.Sparse
— TypeSparse(container, values, indices)
Create a sparse container to make gradient calculations efficient. s::Sparse
represents the value a
as defined below:
a = zero(s.container)
for (idx, val) in zip(s.indices, s.values)
a[idx] .+= val
end
except when there are repeated indices in idx
, the corresponding values get added rather than being overwritten. See https://github.com/JuliaLang/julia/issues/31392.
AutoGrad.addto!
— Functionaddto!(accumulator, newval)
Add newval to accumulator and return the result. Used in outgrad calculations. The outgrad values start as nothing
representing a 0 gradient. Then they are incremented using values of type Number, Tuple, AbstractDict, AbstractArray, Nothing and AutoGrad.Sparse. The accumulator and the newval types must match: Nothing matches all types, Sparse matches types that match its container, other types must match themselves. addto!
handles repeated indices in newval by adding all corresponding values to the accumulator
.
AutoGrad.cat1d
— Methodcat1d(args...)
Return vcat(vec.(args)...)
but possibly more efficiently. Can be used to concatenate the contents of arrays with different shapes and sizes.
AutoGrad.dir
— MethodUse AutoGrad.dir(path...)
to construct paths relative to AutoGrad root.
AutoGrad.gcheck
— Functiongcheck(f, x...; kw, o...)
@gcheck f(x...; kw...) (opt1=val1,opt2=val2,...)
Numerically check the gradient of f(x...; kw...)
and return a boolean result.
Example call: gcheck(nll,model,x,y)
or @gcheck nll(model,x,y)
. The parameters should be marked as Param
arrays in f
, x
, and/or kw
. Only 10 random entries in each large numeric array are checked by default. If the output of f
is not a number, we check the gradient of sum(f(x...; kw...))
. Keyword arguments:
kw=()
: keyword arguments to be passed tof
, i.e.f(x...; kw...)
nsample=10
: number of random entries from each param to checkatol=0.01,rtol=0.05
: tolerance parameters. Seeisapprox
for their meaning.delta=0.0001
: step size for numerical gradient calculation.verbose=1
: 0 prints nothing, 1 shows failing tests, 2 shows all tests.
AutoGrad.grad
— FunctionUsage:
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
Param(x)
returns a struct that acts like x
but marks it as a parameter you want to compute gradients with respect to.
@diff expr
evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Param
s used in the computation.
grad(y, x)
returns the gradient of a @diff
result y
with respect to any parameter x::Param
. (nothing
may be returned if the gradient is 0).
value(x)
returns the value associated with x
if x
is a Param
or the output of @diff
, otherwise returns x
.
params(x)
returns an iterator of Param
s found by a recursive search of object x
, which is typically a model or a @diff
result.
Alternative usage:
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
Given a scalar valued function f
, grad(f,argnum=1)
returns another function g
which takes the same inputs as f
and returns the gradient of the output with respect to the argnum'th argument. gradloss
is similar except the resulting function also returns f's output.
AutoGrad.gradcheck
— Methodgradcheck(f, x...; kwargs...)
Numerically check the gradient of f(x...)
and return a boolean result.
Each argument can be a Number, Array, Tuple or Dict which in turn can contain other Arrays etc. Only 10 random entries in each large numeric array are checked by default. If the output of f
is not a number, we check the gradient of sum(f(x...))
. See also gcheck
for a different take on marking parameters.
Keywords
args=:
: the argument indices to check gradients with respect to. Could be an array or range of indices or a single index. By default all arguments that have alength
method are checked.kw=()
: keyword arguments to be passed tof
.nsample=10
: number of random entries from each numeric array in gradientdw=(grad(f))(w,x...;o...)
compared to their numerical estimates.atol=0.01
: tolerance parameter. Seeisapprox
for an explanation.rtol=0.05
: tolerance parameter. Seeisapprox
for an explanation.delta=0.0001
: step size for numerical gradient calculation.verbose=1
: 0 prints nothing, 1 shows failing tests, 2 shows all tests.
AutoGrad.gradloss
— FunctionUsage:
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
Param(x)
returns a struct that acts like x
but marks it as a parameter you want to compute gradients with respect to.
@diff expr
evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Param
s used in the computation.
grad(y, x)
returns the gradient of a @diff
result y
with respect to any parameter x::Param
. (nothing
may be returned if the gradient is 0).
value(x)
returns the value associated with x
if x
is a Param
or the output of @diff
, otherwise returns x
.
params(x)
returns an iterator of Param
s found by a recursive search of object x
, which is typically a model or a @diff
result.
Alternative usage:
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
Given a scalar valued function f
, grad(f,argnum=1)
returns another function g
which takes the same inputs as f
and returns the gradient of the output with respect to the argnum'th argument. gradloss
is similar except the resulting function also returns f's output.
AutoGrad.params
— FunctionUsage:
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
Param(x)
returns a struct that acts like x
but marks it as a parameter you want to compute gradients with respect to.
@diff expr
evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Param
s used in the computation.
grad(y, x)
returns the gradient of a @diff
result y
with respect to any parameter x::Param
. (nothing
may be returned if the gradient is 0).
value(x)
returns the value associated with x
if x
is a Param
or the output of @diff
, otherwise returns x
.
params(x)
returns an iterator of Param
s found by a recursive search of object x
, which is typically a model or a @diff
result.
Alternative usage:
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
Given a scalar valued function f
, grad(f,argnum=1)
returns another function g
which takes the same inputs as f
and returns the gradient of the output with respect to the argnum'th argument. gradloss
is similar except the resulting function also returns f's output.
AutoGrad.randcheck
— FunctionTest a numeric function with Float32/64 randn scalars and randn arrays, possibly transforming the input to match the domain
AutoGrad.unbroadcast
— Methodunbroadcast(x,dx)
Bring dx to x's size via unbroadcasting (reduction). This is needed when defining gradients of multi-argument broadcasting functions where the arguments and the result may be of different sizes.
AutoGrad.value
— FunctionUsage:
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
Param(x)
returns a struct that acts like x
but marks it as a parameter you want to compute gradients with respect to.
@diff expr
evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Param
s used in the computation.
grad(y, x)
returns the gradient of a @diff
result y
with respect to any parameter x::Param
. (nothing
may be returned if the gradient is 0).
value(x)
returns the value associated with x
if x
is a Param
or the output of @diff
, otherwise returns x
.
params(x)
returns an iterator of Param
s found by a recursive search of object x
, which is typically a model or a @diff
result.
Alternative usage:
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
Given a scalar valued function f
, grad(f,argnum=1)
returns another function g
which takes the same inputs as f
and returns the gradient of the output with respect to the argnum'th argument. gradloss
is similar except the resulting function also returns f's output.
AutoGrad.@diff
— MacroUsage:
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
Param(x)
returns a struct that acts like x
but marks it as a parameter you want to compute gradients with respect to.
@diff expr
evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the Param
s used in the computation.
grad(y, x)
returns the gradient of a @diff
result y
with respect to any parameter x::Param
. (nothing
may be returned if the gradient is 0).
value(x)
returns the value associated with x
if x
is a Param
or the output of @diff
, otherwise returns x
.
params(x)
returns an iterator of Param
s found by a recursive search of object x
, which is typically a model or a @diff
result.
Alternative usage:
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
Given a scalar valued function f
, grad(f,argnum=1)
returns another function g
which takes the same inputs as f
and returns the gradient of the output with respect to the argnum'th argument. gradloss
is similar except the resulting function also returns f's output.
AutoGrad.@gcheck
— Macrogcheck(f, x...; kw, o...)
@gcheck f(x...; kw...) (opt1=val1,opt2=val2,...)
Numerically check the gradient of f(x...; kw...)
and return a boolean result.
Example call: gcheck(nll,model,x,y)
or @gcheck nll(model,x,y)
. The parameters should be marked as Param
arrays in f
, x
, and/or kw
. Only 10 random entries in each large numeric array are checked by default. If the output of f
is not a number, we check the gradient of sum(f(x...; kw...))
. Keyword arguments:
kw=()
: keyword arguments to be passed tof
, i.e.f(x...; kw...)
nsample=10
: number of random entries from each param to checkatol=0.01,rtol=0.05
: tolerance parameters. Seeisapprox
for their meaning.delta=0.0001
: step size for numerical gradient calculation.verbose=1
: 0 prints nothing, 1 shows failing tests, 2 shows all tests.
AutoGrad.@primitive
— Macro@primitive fx g1 g2...
Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as sign
, and non-numeric functions such as size
should be defined using the @zerograd macro instead.
Examples
@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y
@primitive sin(x::Number),dy (dy.*cos(x))
@primitive hypot(x1,x2),dy,y (dy.*x1./y) (dy.*x2./y)
The first example shows that fx
is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.
The second example specifies variable names for the output gradient dy
and the output y
after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in f(a::Int,b,c...;d=1)
. Parametric methods such as f(x::T) where {T<:Number}
cannot be used.
The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.
Under the hood
The @primitive macro turns the first example into:
sin(x::Value{T}) where {T<:Number} = forw(sin, x)
This will cause calls to sin
with a boxed argument (Value{T<:Number}
) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:
hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T) where {S,T} = forw(hypot, x1, x2)
We want the forw method to be called if any one of the arguments is a boxed Value
. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.
In AutoGrad, gradients are defined using gradient methods that have the following pattern:
back(f,Arg{i},dy,y,x...) => dx[i]
For the third example here is the generated gradient method:
back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)
For the last example a different gradient method is generated for each argument:
back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y
In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.
Broadcasting
Broadcasting is handled by extra forw
and back
methods. @primitive
defines the following so that broadcasting of a primitive function with a boxed value triggers forw
and back
.
broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)
If you do not want the broadcasting methods, you can use the @primitive1
macro. If you only want the broadcasting methods use @primitive2
. As a motivating example, here is how *
is defined for non-scalars:
@primitive1 *(x1,x2),dy (dy*x2') (x1'*dy)
@primitive2 *(x1,x2),dy unbroadcast(x1,dy.*x2) unbroadcast(x2,x1.*dy)
Regular *
is matrix multiplication, broadcasted *
is elementwise multiplication and the two have different gradients as defined above. unbroadcast(a,b)
reduces b
to the same shape as a
by performing the necessary summations.
AutoGrad.@primitive1
— Macro@primitive fx g1 g2...
Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as sign
, and non-numeric functions such as size
should be defined using the @zerograd macro instead.
Examples
@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y
@primitive sin(x::Number),dy (dy.*cos(x))
@primitive hypot(x1,x2),dy,y (dy.*x1./y) (dy.*x2./y)
The first example shows that fx
is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.
The second example specifies variable names for the output gradient dy
and the output y
after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in f(a::Int,b,c...;d=1)
. Parametric methods such as f(x::T) where {T<:Number}
cannot be used.
The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.
Under the hood
The @primitive macro turns the first example into:
sin(x::Value{T}) where {T<:Number} = forw(sin, x)
This will cause calls to sin
with a boxed argument (Value{T<:Number}
) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:
hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T) where {S,T} = forw(hypot, x1, x2)
We want the forw method to be called if any one of the arguments is a boxed Value
. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.
In AutoGrad, gradients are defined using gradient methods that have the following pattern:
back(f,Arg{i},dy,y,x...) => dx[i]
For the third example here is the generated gradient method:
back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)
For the last example a different gradient method is generated for each argument:
back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y
In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.
Broadcasting
Broadcasting is handled by extra forw
and back
methods. @primitive
defines the following so that broadcasting of a primitive function with a boxed value triggers forw
and back
.
broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)
If you do not want the broadcasting methods, you can use the @primitive1
macro. If you only want the broadcasting methods use @primitive2
. As a motivating example, here is how *
is defined for non-scalars:
@primitive1 *(x1,x2),dy (dy*x2') (x1'*dy)
@primitive2 *(x1,x2),dy unbroadcast(x1,dy.*x2) unbroadcast(x2,x1.*dy)
Regular *
is matrix multiplication, broadcasted *
is elementwise multiplication and the two have different gradients as defined above. unbroadcast(a,b)
reduces b
to the same shape as a
by performing the necessary summations.
AutoGrad.@primitive2
— Macro@primitive fx g1 g2...
Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as sign
, and non-numeric functions such as size
should be defined using the @zerograd macro instead.
Examples
@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y
@primitive sin(x::Number),dy (dy.*cos(x))
@primitive hypot(x1,x2),dy,y (dy.*x1./y) (dy.*x2./y)
The first example shows that fx
is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.
The second example specifies variable names for the output gradient dy
and the output y
after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in f(a::Int,b,c...;d=1)
. Parametric methods such as f(x::T) where {T<:Number}
cannot be used.
The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.
Under the hood
The @primitive macro turns the first example into:
sin(x::Value{T}) where {T<:Number} = forw(sin, x)
This will cause calls to sin
with a boxed argument (Value{T<:Number}
) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:
hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T) where {S,T} = forw(hypot, x1, x2)
We want the forw method to be called if any one of the arguments is a boxed Value
. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.
In AutoGrad, gradients are defined using gradient methods that have the following pattern:
back(f,Arg{i},dy,y,x...) => dx[i]
For the third example here is the generated gradient method:
back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)
For the last example a different gradient method is generated for each argument:
back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y
In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.
Broadcasting
Broadcasting is handled by extra forw
and back
methods. @primitive
defines the following so that broadcasting of a primitive function with a boxed value triggers forw
and back
.
broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)
If you do not want the broadcasting methods, you can use the @primitive1
macro. If you only want the broadcasting methods use @primitive2
. As a motivating example, here is how *
is defined for non-scalars:
@primitive1 *(x1,x2),dy (dy*x2') (x1'*dy)
@primitive2 *(x1,x2),dy unbroadcast(x1,dy.*x2) unbroadcast(x2,x1.*dy)
Regular *
is matrix multiplication, broadcasted *
is elementwise multiplication and the two have different gradients as defined above. unbroadcast(a,b)
reduces b
to the same shape as a
by performing the necessary summations.
AutoGrad.@zerograd
— Macro@zerograd f(args...; kwargs...)
Define f
as an AutoGrad primitive operation with zero gradient.
Example:
@zerograd floor(x::Float32)
@zerograd
allows f
to handle boxed Value
inputs by unboxing them like a @primitive
, but unlike @primitive
it does not record its actions or return a boxed Value
result. Some functions, like sign()
, have zero gradient. Others, like length()
have discrete or constant outputs. These need to handle Value
inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the @zerograd
macro for those. Use the @zerograd1
variant if you don't want to define the broadcasting version and @zerograd2
if you only want to define the broadcasting version. Note that kwargs
are NOT unboxed.
AutoGrad.@zerograd1
— Macro@zerograd f(args...; kwargs...)
Define f
as an AutoGrad primitive operation with zero gradient.
Example:
@zerograd floor(x::Float32)
@zerograd
allows f
to handle boxed Value
inputs by unboxing them like a @primitive
, but unlike @primitive
it does not record its actions or return a boxed Value
result. Some functions, like sign()
, have zero gradient. Others, like length()
have discrete or constant outputs. These need to handle Value
inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the @zerograd
macro for those. Use the @zerograd1
variant if you don't want to define the broadcasting version and @zerograd2
if you only want to define the broadcasting version. Note that kwargs
are NOT unboxed.
AutoGrad.@zerograd2
— Macro@zerograd f(args...; kwargs...)
Define f
as an AutoGrad primitive operation with zero gradient.
Example:
@zerograd floor(x::Float32)
@zerograd
allows f
to handle boxed Value
inputs by unboxing them like a @primitive
, but unlike @primitive
it does not record its actions or return a boxed Value
result. Some functions, like sign()
, have zero gradient. Others, like length()
have discrete or constant outputs. These need to handle Value
inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the @zerograd
macro for those. Use the @zerograd1
variant if you don't want to define the broadcasting version and @zerograd2
if you only want to define the broadcasting version. Note that kwargs
are NOT unboxed.