# How ForwardDiff Works

ForwardDiff is an implementation of forward mode automatic differentiation (AD) in Julia. There are two key components of this implementation: the `Dual`

type, and the API.

## Dual Number Implementation

Partial derivatives are stored in the `Partials`

type:

```
struct Partials{N,V} <: AbstractVector{V}
values::NTuple{N,V}
end
```

The `Partials`

type is used to implement the `Dual`

type:

```
struct Dual{T,V<:Real,N} <: Real
value::V
partials::Partials{N,V}
end
```

This type represents an `N`

-dimensional dual number coupled with a tag parameter `T`

in order to prevent perturbation confusion. This dual number type is implemented to have the following mathematical behavior:

\[f(a + \sum_{i=1}^N b_i \epsilon_i) = f(a) + f'(a) \sum_{i=1}^N b_i \epsilon_i\]

where the $a$ component is stored in the `value`

field and the $b$ components are stored in the `partials`

field. This property of dual numbers is the central feature that allows ForwardDiff to take derivatives.

In order to implement the above property, elementary numerical functions on a `Dual`

number are overloaded to evaluate both the original function, *and* evaluate the derivative of the function, propogating the derivative via multiplication. For example, `Base.sin`

can be overloaded on `Dual`

like so:

`Base.sin(d::Dual{T}) where {T} = Dual{T}(sin(value(d)), cos(value(d)) * partials(d))`

If we assume that a general function `f`

is composed of entirely of these elementary functions, then the chain rule enables our derivatives to compose as well. Thus, by overloading a plethora of elementary functions, we can differentiate generic functions composed of them by passing in a `Dual`

number and looking at the output.

We won't discuss higher-order differentiation in detail, but the reader is encouraged to learn about hyper-dual numbers, which extend dual numbers to higher orders by introducing extra $\epsilon$ terms that can cross-multiply. ForwardDiff's `Dual`

number implementation naturally supports hyper-dual numbers without additional code by allowing instances of the `Dual`

type to nest within each other. For example, a second-order hyper-dual number has the type `Dual{T,Dual{S,V,M},N}`

, a third-order hyper-dual number has the type `Dual{T,Dual{S,Dual{R,V,K},M},N}`

, and so on.

## ForwardDiff's API

The second component provided by this package is the API, which abstracts away the number types and makes it easy to execute familiar calculations like gradients and Hessians. This way, users don't have to understand `Dual`

numbers in order to make use of the package.

The job of the API functions is to performantly seed input values with `Dual`

numbers, pass the seeded value into the target function, and extract the derivative information from the result. For example, to calculate the partial derivatives for the gradient of a function $f$ at an input vector $\vec{x}$, we would do the following:

\[\vec{x} = \begin{bmatrix} x_1 \\ \vdots \\ x_i \\ \vdots \\ x_N \end{bmatrix} \to \vec{x}_{\epsilon} = \begin{bmatrix} x_1 + \epsilon_1 \\ \vdots \\ x_i + \epsilon_i \\ \vdots \\ x_N + \epsilon_N \end{bmatrix} \to f(\vec{x}_{\epsilon}) = f(\vec{x}) + \sum_{i=1}^N \frac{\delta f(\vec{x})}{\delta x_i} \epsilon_i\]

In reality, ForwardDiff does this calculation in chunks of the input vector (see Configuring Chunk Size for details). To provide a simple example of this, let's examine the case where the input vector size is 4 and the chunk size is 2. It then takes two calls to $f$ to evaluate the gradient:

\[\vec{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} \vec{x}_{\epsilon} = \begin{bmatrix} x_1 + \epsilon_1 \\ x_2 + \epsilon_2 \\ x_3 \\ x_4 \end{bmatrix} \to f(\vec{x}_{\epsilon}) = f(\vec{x}) + \frac{\delta f(\vec{x})}{\delta x_1} \epsilon_1 + \frac{\delta f(\vec{x})}{\delta x_2} \epsilon_2 \vec{x}_{\epsilon} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 + \epsilon_1 \\ x_4 + \epsilon_2 \end{bmatrix} \to f(\vec{x}_{\epsilon}) = f(\vec{x}) + \frac{\delta f(\vec{x})}{\delta x_3} \epsilon_1 + \frac{\delta f(\vec{x})}{\delta x_4} \epsilon_2\]

This seeding process is similar for Jacobians, so we won't rehash it here.