Data Generators



Abstract supertype for all data generation inputs to use with makedata() function. Use subtypes(DataGenInput) for a list of all possible data generation inputs.

LogDiffInput(nTimeStep, initial, volatility, drift)

Contains parameters that are used by makedata() to synthesize data from a log-normal diffusion process of the form.

\[P_{t+1} = P_t \cdot e^{drift + volatility \cdot v}\]

Where P_t is the value of the data at timestep t. The drift and volatility represent the mean and standard deviation of a normal distribution. The equation given above expresses them as such by letting v be a draw from a standard normal distribution which is then shifted and scaled by the drift and volatility terms.


  • nTimeStep: The number of time steps to synthesize.
  • initial: The assumed value at the 0th time step. Default: 100.
  • volatility: The price volatility as a standard deviation in terms of implied time period. Defaults to 0.3.
  • drift: The drift parameter describes the mean of the log-normal diffusion process

given in terms of the entire implied time period (if simulating a year, drift would be annual expected return). Defaults to 0.02.


input1 = LogDiffInput(250, 100, .05, .1)

# initialize first input with default values
input2 = LogDiffInput(250)

# initialize a second input with zero volatility
kwargs = Dict(:nTimeStep=>250, :initial=>100, :volatility=>.05, :drift=>.1)
input3 = LogDiffInput(;kwargs...)
BootstrapInput(input_data, bootstrap_method::<:TSBootMethod; kwargs...)
BootstrapInput{T <: TSBootMethod}(; kwargs...)

Contains the parameters needed to perform block bootstrap of type T to be used by makedata() function. T can be any subtype of TSBootMethod: Stationary, MovingBlock, or CircularBlock.

Keyword Arguments

  • input_data: data to be resampled. Must be a 1-D array
  • bootstrap_method: Type of time series bootstrap to use. Must be subtype of TSBootMethod.
  • n: size of resampled output data. Default: 100
  • block_size: block size to use. Defaults to the optimal block length using opt_block_length()


input_data = [1,2,4,3,5,7,6,3];
kwargs = Dict(:n=>20);
input1 = BootstrapInput(input_data, Stationary; kwargs...)

kwargs = Dict(:input_data=>input_data, :n=>20, :block_size=>4);
input2 = BootstrapInput{MovingBlock}(;kwargs...)

makedata function

makedata(Input::LogDiffInput, nSimulation::Integer)

generates data according to the DataGenInput struct provided

Possible DataGenInput types are

  • ::LogDiffInput
  • ::BootstrapInput{MovingBlock}
  • ::BootstrapInput{CircularBlock}
  • ::BootstrapInput{Stationary}


  • Input<:DataGenInput: struct with parameters to generate data
  • nSimulation: the number of simulations to run.


  • data: nTimeStep x nSimulation array, where each column contains the data for one simulation, and each row contains data for each timestep


nTimeStep = 100;
input1 = LogDiffInput(nTimeStep);

# create a dataset using the log diffusion model
data1 = makedata(input1, 1)

# create another dataset with 2 simulation runs using a startionary bootstrap 
input2 = BootstrapInput(data1, Stationary; n=100);
data2 = makedata(input2, 2)

factory function


factory(widget::Widget, bootstrap_method::TSBootMethod, nWidgets::Signed)

Creates nWidgets using a given bootstrap_method. If a widget of type "Stock" is passed in then the widget factory will use a given bootstrap method to produce n "Stock" widgets. All widgets use first difference.

Positional Inputs

  • widget::Widget: A concrete widget struct. See the Widget documentation for more.
  • bootstrap_method: A subtype of TSBootMethod: Stationary, MovingBlock, or CircularBlock.
  • nWidgets: The amount of widgets you want widget factory to return.


prices = [1,2,5,9,8,10,5,3];
widget = Stock(prices)

list_of_widgets = factory(widget, Stationary, 2)

Helper functions

opt_block_length(array, bootstrap_method::TSBootMethod)

Computes the optimal block length for a time series block bootstrap using the methods defined by Politis and White (2004).

If bootstrap method other than Stationary or CircularBlock is used, the function defaults to CircularBlock


using Distributions: Normal

# create ar(1) data set
ar1 = [1.0];
for _ in 1:799
    push!(ar1, ar1[end] * 0.7 + rand(Normal()))

# find optimal block lengths
st_bl = opt_block_length(ar1, Stationary)
cb_bl = opt_block_length(ar1, CircularBlock)