`AutoGrad.AutoGrad`

— ModuleUsage:

```
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
```

`Param(x)`

returns a struct that acts like `x`

but marks it as a parameter you want to compute gradients with respect to.

`@diff expr`

evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the `Param`

s used in the computation.

`grad(y, x)`

returns the gradient of a `@diff`

result `y`

with respect to any parameter `x::Param`

. (`nothing`

may be returned if the gradient is 0).

`value(x)`

returns the value associated with `x`

if `x`

is a `Param`

or the output of `@diff`

, otherwise returns `x`

.

`params(x)`

returns an iterator of `Param`

s found by a recursive search of object `x`

, which is typically a model or a `@diff`

result.

Alternative usage:

```
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
```

Given a scalar valued function `f`

, `grad(f,argnum=1)`

returns another function `g`

which takes the same inputs as `f`

and returns the gradient of the output with respect to the argnum'th argument. `gradloss`

is similar except the resulting function also returns f's output.

`AutoGrad.Param`

— TypeUsage:

```
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
```

`Param(x)`

returns a struct that acts like `x`

but marks it as a parameter you want to compute gradients with respect to.

`@diff expr`

evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the `Param`

s used in the computation.

`grad(y, x)`

returns the gradient of a `@diff`

result `y`

with respect to any parameter `x::Param`

. (`nothing`

may be returned if the gradient is 0).

`value(x)`

returns the value associated with `x`

if `x`

is a `Param`

or the output of `@diff`

, otherwise returns `x`

.

`params(x)`

returns an iterator of `Param`

s found by a recursive search of object `x`

, which is typically a model or a `@diff`

result.

Alternative usage:

```
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
```

Given a scalar valued function `f`

, `grad(f,argnum=1)`

returns another function `g`

which takes the same inputs as `f`

and returns the gradient of the output with respect to the argnum'th argument. `gradloss`

is similar except the resulting function also returns f's output.

`AutoGrad.Sparse`

— Type`Sparse(container, values, indices)`

Create a sparse container to make gradient calculations efficient. `s::Sparse`

represents the value `a`

as defined below:

```
a = zero(s.container)
for (idx, val) in zip(s.indices, s.values)
a[idx] .+= val
end
```

except when there are repeated indices in `idx`

, the corresponding values get added rather than being overwritten. See https://github.com/JuliaLang/julia/issues/31392.

`AutoGrad.addto!`

— Function`addto!(accumulator, newval)`

Add newval to accumulator and return the result. Used in outgrad calculations. The outgrad values start as `nothing`

representing a 0 gradient. Then they are incremented using values of type Number, Tuple, AbstractDict, AbstractArray, Nothing and AutoGrad.Sparse. The accumulator and the newval types must match: Nothing matches all types, Sparse matches types that match its container, other types must match themselves. `addto!`

handles repeated indices in newval by adding all corresponding values to the `accumulator`

.

`AutoGrad.cat1d`

— Method`cat1d(args...)`

Return `vcat(vec.(args)...)`

but possibly more efficiently. Can be used to concatenate the contents of arrays with different shapes and sizes.

`AutoGrad.dir`

— MethodUse `AutoGrad.dir(path...)`

to construct paths relative to AutoGrad root.

`AutoGrad.gcheck`

— Function```
gcheck(f, x...; kw, o...)
@gcheck f(x...; kw...) (opt1=val1,opt2=val2,...)
```

Numerically check the gradient of `f(x...; kw...)`

and return a boolean result.

Example call: `gcheck(nll,model,x,y)`

or `@gcheck nll(model,x,y)`

. The parameters should be marked as `Param`

arrays in `f`

, `x`

, and/or `kw`

. Only 10 random entries in each large numeric array are checked by default. If the output of `f`

is not a number, we check the gradient of `sum(f(x...; kw...))`

. Keyword arguments:

`kw=()`

: keyword arguments to be passed to`f`

, i.e.`f(x...; kw...)`

`nsample=10`

: number of random entries from each param to check`atol=0.01,rtol=0.05`

: tolerance parameters. See`isapprox`

for their meaning.`delta=0.0001`

: step size for numerical gradient calculation.`verbose=1`

: 0 prints nothing, 1 shows failing tests, 2 shows all tests.

`AutoGrad.grad`

— FunctionUsage:

```
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
```

`Param(x)`

returns a struct that acts like `x`

but marks it as a parameter you want to compute gradients with respect to.

`@diff expr`

evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the `Param`

s used in the computation.

`grad(y, x)`

returns the gradient of a `@diff`

result `y`

with respect to any parameter `x::Param`

. (`nothing`

may be returned if the gradient is 0).

`value(x)`

returns the value associated with `x`

if `x`

is a `Param`

or the output of `@diff`

, otherwise returns `x`

.

`params(x)`

returns an iterator of `Param`

s found by a recursive search of object `x`

, which is typically a model or a `@diff`

result.

Alternative usage:

```
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
```

Given a scalar valued function `f`

, `grad(f,argnum=1)`

returns another function `g`

which takes the same inputs as `f`

and returns the gradient of the output with respect to the argnum'th argument. `gradloss`

is similar except the resulting function also returns f's output.

`AutoGrad.gradcheck`

— Method`gradcheck(f, x...; kwargs...)`

Numerically check the gradient of `f(x...)`

and return a boolean result.

Each argument can be a Number, Array, Tuple or Dict which in turn can contain other Arrays etc. Only 10 random entries in each large numeric array are checked by default. If the output of `f`

is not a number, we check the gradient of `sum(f(x...))`

. See also `gcheck`

for a different take on marking parameters.

**Keywords**

`args=:`

: the argument indices to check gradients with respect to. Could be an array or range of indices or a single index. By default all arguments that have a`length`

method are checked.`kw=()`

: keyword arguments to be passed to`f`

.`nsample=10`

: number of random entries from each numeric array in gradient`dw=(grad(f))(w,x...;o...)`

compared to their numerical estimates.`atol=0.01`

: tolerance parameter. See`isapprox`

for an explanation.`rtol=0.05`

: tolerance parameter. See`isapprox`

for an explanation.`delta=0.0001`

: step size for numerical gradient calculation.`verbose=1`

: 0 prints nothing, 1 shows failing tests, 2 shows all tests.

`AutoGrad.gradloss`

— FunctionUsage:

```
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
```

`Param(x)`

returns a struct that acts like `x`

but marks it as a parameter you want to compute gradients with respect to.

`@diff expr`

evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the `Param`

s used in the computation.

`grad(y, x)`

returns the gradient of a `@diff`

result `y`

with respect to any parameter `x::Param`

. (`nothing`

may be returned if the gradient is 0).

`value(x)`

returns the value associated with `x`

if `x`

is a `Param`

or the output of `@diff`

, otherwise returns `x`

.

`params(x)`

returns an iterator of `Param`

s found by a recursive search of object `x`

, which is typically a model or a `@diff`

result.

Alternative usage:

```
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
```

`f`

, `grad(f,argnum=1)`

returns another function `g`

which takes the same inputs as `f`

and returns the gradient of the output with respect to the argnum'th argument. `gradloss`

is similar except the resulting function also returns f's output.

`AutoGrad.params`

— FunctionUsage:

```
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
```

`Param(x)`

returns a struct that acts like `x`

but marks it as a parameter you want to compute gradients with respect to.

`@diff expr`

evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the `Param`

s used in the computation.

`grad(y, x)`

returns the gradient of a `@diff`

result `y`

with respect to any parameter `x::Param`

. (`nothing`

may be returned if the gradient is 0).

`value(x)`

returns the value associated with `x`

if `x`

is a `Param`

or the output of `@diff`

, otherwise returns `x`

.

`params(x)`

returns an iterator of `Param`

s found by a recursive search of object `x`

, which is typically a model or a `@diff`

result.

Alternative usage:

```
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
```

`f`

, `grad(f,argnum=1)`

returns another function `g`

which takes the same inputs as `f`

and returns the gradient of the output with respect to the argnum'th argument. `gradloss`

is similar except the resulting function also returns f's output.

`AutoGrad.randcheck`

— FunctionTest a numeric function with Float32/64 randn scalars and randn arrays, possibly transforming the input to match the domain

`AutoGrad.unbroadcast`

— Method`unbroadcast(x,dx)`

Bring dx to x's size via unbroadcasting (reduction). This is needed when defining gradients of multi-argument broadcasting functions where the arguments and the result may be of different sizes.

`AutoGrad.value`

— FunctionUsage:

```
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
```

`Param(x)`

returns a struct that acts like `x`

but marks it as a parameter you want to compute gradients with respect to.

`@diff expr`

evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the `Param`

s used in the computation.

`grad(y, x)`

returns the gradient of a `@diff`

result `y`

with respect to any parameter `x::Param`

. (`nothing`

may be returned if the gradient is 0).

`value(x)`

returns the value associated with `x`

if `x`

is a `Param`

or the output of `@diff`

, otherwise returns `x`

.

`params(x)`

returns an iterator of `Param`

s found by a recursive search of object `x`

, which is typically a model or a `@diff`

result.

Alternative usage:

```
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
```

`f`

, `grad(f,argnum=1)`

returns another function `g`

which takes the same inputs as `f`

and returns the gradient of the output with respect to the argnum'th argument. `gradloss`

is similar except the resulting function also returns f's output.

`AutoGrad.@diff`

— MacroUsage:

```
x = Param([1,2,3]) # The user declares parameters with `Param`
y = @diff sum(x .* x) # computes gradients using `@diff`
grad(y,x) => [2,4,6] # looks up the gradient of a parameter with `grad`
```

`Param(x)`

returns a struct that acts like `x`

but marks it as a parameter you want to compute gradients with respect to.

`@diff expr`

evaluates an expression and returns a struct that contains its value (which should be a scalar) and gradients with respect to the `Param`

s used in the computation.

`grad(y, x)`

returns the gradient of a `@diff`

result `y`

with respect to any parameter `x::Param`

. (`nothing`

may be returned if the gradient is 0).

`value(x)`

returns the value associated with `x`

if `x`

is a `Param`

or the output of `@diff`

, otherwise returns `x`

.

`params(x)`

returns an iterator of `Param`

s found by a recursive search of object `x`

, which is typically a model or a `@diff`

result.

Alternative usage:

```
x = [1 2 3]
f(x) = sum(x .* x)
f(x) => 14
grad(f)(x) => [2 4 6]
gradloss(f)(x) => ([2 4 6], 14)
```

`f`

, `grad(f,argnum=1)`

returns another function `g`

which takes the same inputs as `f`

and returns the gradient of the output with respect to the argnum'th argument. `gradloss`

is similar except the resulting function also returns f's output.

`AutoGrad.@gcheck`

— Macro```
gcheck(f, x...; kw, o...)
@gcheck f(x...; kw...) (opt1=val1,opt2=val2,...)
```

Numerically check the gradient of `f(x...; kw...)`

and return a boolean result.

Example call: `gcheck(nll,model,x,y)`

or `@gcheck nll(model,x,y)`

. The parameters should be marked as `Param`

arrays in `f`

, `x`

, and/or `kw`

. Only 10 random entries in each large numeric array are checked by default. If the output of `f`

is not a number, we check the gradient of `sum(f(x...; kw...))`

. Keyword arguments:

`kw=()`

: keyword arguments to be passed to`f`

, i.e.`f(x...; kw...)`

`nsample=10`

: number of random entries from each param to check`atol=0.01,rtol=0.05`

: tolerance parameters. See`isapprox`

for their meaning.`delta=0.0001`

: step size for numerical gradient calculation.`verbose=1`

: 0 prints nothing, 1 shows failing tests, 2 shows all tests.

`AutoGrad.@primitive`

— Macro`@primitive fx g1 g2...`

Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as `sign`

, and non-numeric functions such as `size`

should be defined using the @zerograd macro instead.

**Examples**

```
@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y
@primitive sin(x::Number),dy (dy.*cos(x))
@primitive hypot(x1,x2),dy,y (dy.*x1./y) (dy.*x2./y)
```

The first example shows that `fx`

is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.

The second example specifies variable names for the output gradient `dy`

and the output `y`

after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in `f(a::Int,b,c...;d=1)`

. Parametric methods such as `f(x::T) where {T<:Number}`

cannot be used.

The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.

**Under the hood**

The @primitive macro turns the first example into:

`sin(x::Value{T}) where {T<:Number} = forw(sin, x)`

This will cause calls to `sin`

with a boxed argument (`Value{T<:Number}`

) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:

```
hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T) where {S,T} = forw(hypot, x1, x2)
```

We want the forw method to be called if any one of the arguments is a boxed `Value`

. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.

In AutoGrad, gradients are defined using gradient methods that have the following pattern:

`back(f,Arg{i},dy,y,x...) => dx[i]`

For the third example here is the generated gradient method:

`back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)`

For the last example a different gradient method is generated for each argument:

```
back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y
```

In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.

**Broadcasting**

Broadcasting is handled by extra `forw`

and `back`

methods. `@primitive`

defines the following so that broadcasting of a primitive function with a boxed value triggers `forw`

and `back`

.

```
broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)
```

If you do not want the broadcasting methods, you can use the `@primitive1`

macro. If you only want the broadcasting methods use `@primitive2`

. As a motivating example, here is how `*`

is defined for non-scalars:

```
@primitive1 *(x1,x2),dy (dy*x2') (x1'*dy)
@primitive2 *(x1,x2),dy unbroadcast(x1,dy.*x2) unbroadcast(x2,x1.*dy)
```

Regular `*`

is matrix multiplication, broadcasted `*`

is elementwise multiplication and the two have different gradients as defined above. `unbroadcast(a,b)`

reduces `b`

to the same shape as `a`

by performing the necessary summations.

`AutoGrad.@primitive1`

— Macro`@primitive fx g1 g2...`

Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as `sign`

, and non-numeric functions such as `size`

should be defined using the @zerograd macro instead.

**Examples**

```
@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y
@primitive sin(x::Number),dy (dy.*cos(x))
@primitive hypot(x1,x2),dy,y (dy.*x1./y) (dy.*x2./y)
```

The first example shows that `fx`

is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.

The second example specifies variable names for the output gradient `dy`

and the output `y`

after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in `f(a::Int,b,c...;d=1)`

. Parametric methods such as `f(x::T) where {T<:Number}`

cannot be used.

The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.

**Under the hood**

The @primitive macro turns the first example into:

`sin(x::Value{T}) where {T<:Number} = forw(sin, x)`

This will cause calls to `sin`

with a boxed argument (`Value{T<:Number}`

) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:

```
hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T) where {S,T} = forw(hypot, x1, x2)
```

We want the forw method to be called if any one of the arguments is a boxed `Value`

. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.

In AutoGrad, gradients are defined using gradient methods that have the following pattern:

`back(f,Arg{i},dy,y,x...) => dx[i]`

For the third example here is the generated gradient method:

`back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)`

For the last example a different gradient method is generated for each argument:

```
back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y
```

In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.

**Broadcasting**

Broadcasting is handled by extra `forw`

and `back`

methods. `@primitive`

defines the following so that broadcasting of a primitive function with a boxed value triggers `forw`

and `back`

.

```
broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)
```

If you do not want the broadcasting methods, you can use the `@primitive1`

macro. If you only want the broadcasting methods use `@primitive2`

. As a motivating example, here is how `*`

is defined for non-scalars:

```
@primitive1 *(x1,x2),dy (dy*x2') (x1'*dy)
@primitive2 *(x1,x2),dy unbroadcast(x1,dy.*x2) unbroadcast(x2,x1.*dy)
```

Regular `*`

is matrix multiplication, broadcasted `*`

is elementwise multiplication and the two have different gradients as defined above. `unbroadcast(a,b)`

reduces `b`

to the same shape as `a`

by performing the necessary summations.

`AutoGrad.@primitive2`

— Macro`@primitive fx g1 g2...`

Define a new primitive operation for AutoGrad and (optionally) specify its gradients. Non-differentiable functions such as `sign`

, and non-numeric functions such as `size`

should be defined using the @zerograd macro instead.

**Examples**

```
@primitive sin(x::Number)
@primitive hypot(x1,x2),dy,y
@primitive sin(x::Number),dy (dy.*cos(x))
@primitive hypot(x1,x2),dy,y (dy.*x1./y) (dy.*x2./y)
```

The first example shows that `fx`

is a typed method declaration. Julia supports multiple dispatch, i.e. a single function can have multiple methods with different arg types. AutoGrad takes advantage of this and supports multiple dispatch for primitives and gradients.

The second example specifies variable names for the output gradient `dy`

and the output `y`

after the method declaration which can be used in gradient expressions. Untyped, ellipsis and keyword arguments are ok as in `f(a::Int,b,c...;d=1)`

. Parametric methods such as `f(x::T) where {T<:Number}`

cannot be used.

The method declaration can optionally be followed by gradient expressions. The third and fourth examples show how gradients can be specified. Note that the parameters, the return variable and the output gradient of the original function can be used in the gradient expressions.

**Under the hood**

The @primitive macro turns the first example into:

`sin(x::Value{T}) where {T<:Number} = forw(sin, x)`

This will cause calls to `sin`

with a boxed argument (`Value{T<:Number}`

) to be recorded. The recorded operations are used by AutoGrad to construct a dynamic computational graph. With multiple arguments things are a bit more complicated. Here is what happens with the second example:

```
hypot(x1::Value{S}, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::S, x2::Value{T}) where {S,T} = forw(hypot, x1, x2)
hypot(x1::Value{S}, x2::T) where {S,T} = forw(hypot, x1, x2)
```

We want the forw method to be called if any one of the arguments is a boxed `Value`

. There is no easy way to specify this in Julia, so the macro generates all 2^N-1 boxed/unboxed argument combinations.

In AutoGrad, gradients are defined using gradient methods that have the following pattern:

`back(f,Arg{i},dy,y,x...) => dx[i]`

For the third example here is the generated gradient method:

`back(::typeof(sin), ::Type{Arg{1}}, dy, y, x::Value{T}) where {T<:Number} = dy .* cos(x)`

For the last example a different gradient method is generated for each argument:

```
back(::typeof(hypot), ::Type{Arg{1}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x1) ./ y
back(::typeof(hypot), ::Type{Arg{2}}, dy, y, x1::Value{S}, x2::Value{T}) where {S,T} = (dy .* x2) ./ y
```

In fact @primitive generates four more definitions for the other boxed/unboxed argument combinations.

**Broadcasting**

Broadcasting is handled by extra `forw`

and `back`

methods. `@primitive`

defines the following so that broadcasting of a primitive function with a boxed value triggers `forw`

and `back`

.

```
broadcasted(::typeof(sin), x::Value{T}) where {T<:Number} = forw(broadcasted,sin,x)
back(::typeof(broadcasted), ::Type{Arg{2}}, dy, y, ::typeof(sin), x::Value{T}) where {T<:Number} = dy .* cos(x)
```

If you do not want the broadcasting methods, you can use the `@primitive1`

macro. If you only want the broadcasting methods use `@primitive2`

. As a motivating example, here is how `*`

is defined for non-scalars:

```
@primitive1 *(x1,x2),dy (dy*x2') (x1'*dy)
@primitive2 *(x1,x2),dy unbroadcast(x1,dy.*x2) unbroadcast(x2,x1.*dy)
```

Regular `*`

is matrix multiplication, broadcasted `*`

is elementwise multiplication and the two have different gradients as defined above. `unbroadcast(a,b)`

reduces `b`

to the same shape as `a`

by performing the necessary summations.

`AutoGrad.@zerograd`

— Macro`@zerograd f(args...; kwargs...)`

Define `f`

as an AutoGrad primitive operation with zero gradient.

**Example:**

`@zerograd floor(x::Float32)`

`@zerograd`

allows `f`

to handle boxed `Value`

inputs by unboxing them like a `@primitive`

, but unlike `@primitive`

it does not record its actions or return a boxed `Value`

result. Some functions, like `sign()`

, have zero gradient. Others, like `length()`

have discrete or constant outputs. These need to handle `Value`

inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the `@zerograd`

macro for those. Use the `@zerograd1`

variant if you don't want to define the broadcasting version and `@zerograd2`

if you only want to define the broadcasting version. Note that `kwargs`

are NOT unboxed.

`AutoGrad.@zerograd1`

— Macro`@zerograd f(args...; kwargs...)`

Define `f`

as an AutoGrad primitive operation with zero gradient.

**Example:**

`@zerograd floor(x::Float32)`

`@zerograd`

allows `f`

to handle boxed `Value`

inputs by unboxing them like a `@primitive`

, but unlike `@primitive`

it does not record its actions or return a boxed `Value`

result. Some functions, like `sign()`

, have zero gradient. Others, like `length()`

have discrete or constant outputs. These need to handle `Value`

inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the `@zerograd`

macro for those. Use the `@zerograd1`

variant if you don't want to define the broadcasting version and `@zerograd2`

if you only want to define the broadcasting version. Note that `kwargs`

are NOT unboxed.

`AutoGrad.@zerograd2`

— Macro`@zerograd f(args...; kwargs...)`

Define `f`

as an AutoGrad primitive operation with zero gradient.

**Example:**

`@zerograd floor(x::Float32)`

`@zerograd`

allows `f`

to handle boxed `Value`

inputs by unboxing them like a `@primitive`

, but unlike `@primitive`

it does not record its actions or return a boxed `Value`

result. Some functions, like `sign()`

, have zero gradient. Others, like `length()`

have discrete or constant outputs. These need to handle `Value`

inputs, but do not need to record anything and can return regular values. Their output can be treated like a constant in the program. Use the `@zerograd`

macro for those. Use the `@zerograd1`

variant if you don't want to define the broadcasting version and `@zerograd2`

if you only want to define the broadcasting version. Note that `kwargs`

are NOT unboxed.