CEEDesigns.frontFunction
front(v)
front(f, v; atol1=0, atol2=0)

Construct a Pareto front of v. Elements of v will be masked by f in the computation.

The first and second (objective) coordinates have to differ by at least atol1, atol2, respectively, relatively to the latest point on the front.

Examples

v = [(1,2), (2,3), (2,1)]
front(v)

# output
[(1, 2), (2, 1)]
v = [(1, (1, 2)), (2, (2, 3)), (3, (2, 1))]
front(x -> x[2], v)

# output

[(1, (1, 2)), (3, (2, 1))]
v = [(1, 2), (2, 1.99), (3, 1)]
front(v; atol2 = 0.2)

# output

[(1, 2), (3, 1)]
CEEDesigns.plot_evalsMethod
plot_evals(evals; f, ylabel="information measure")

Create a stick plot that visualizes the performance measures evaluated for subsets of experiments.

Argument evals should be the output of evaluate_experiments and the kwarg f (if provided) is a function that should take as input evals and return a list of its keys in the order to be plotted on the x-axis. By default they are sorted by length.

CEEDesigns.plot_frontMethod
plot_front(designs; grad=cgrad(:Paired_12), xlabel, ylabel, labels=get_labels(designs))

Render scatter plot of efficient designs, as returned from efficient_designs.

You may optionally specify a color gradient, to draw the colors from.

Examples

designs = efficient_designs(experiment, state)
plot_front(designs)
plot_front(designs; grad = cgrad(:Paired_12))
CEEDesigns.StaticDesigns.ArrangementMDPType

ArrangementMDP(; experiments, experimentalcosts, evals, maxparallel=1, tradeoff=(1, 0)) Structure to parametrize a MDP that is used to approximate the optimal experimental arrangement.

CEEDesigns.StaticDesigns.efficient_designsMethod
efficient_designs(experiments, evals; max_parallel=1, tradeoff=(1, 0), mdp_kwargs=default_mdp_kwargs)

Return the set of Pareto-efficient experimental designs, given experimental costs, predictive accuracy (loss), and estimated filtration rates for experimental subsets.

Arguments

  • experiments: a dictionary containing pairs experiment => cost (=> features), where cost can either be scalar cost or a tuple (monetary cost, execution time).
  • evals: a dictionary containing pairs experimental subset => (; predictive loss, filtration).

Keyword arguments

  • parallel: to estimate the execution time of the design, define the number of experiments that can run concurrently. The experiments will subsequently be arranged in descending order based on their individual durations, and they will be then iteratively allocated into consecutive groups that represent parallel experiments.
  • tradeoff: determines how to project the monetary cost and execution time of an experimental design onto a single combined cost.

Example

efficient_designs(
    experiments_costs,
    model,
    data[!, Not("HeartDisease")],
    data[!, "HeartDisease"];
    eval_options = (; zero_cost_features, measure = LogLoss()),
    arrangement_options = (; max_parallel = 2, tradeoff = (0.0, 1)),
)
CEEDesigns.StaticDesigns.efficient_designsMethod
efficient_designs(experiments, args...; eval_options, arrangement_options)

Evaluate predictive power for subsets of experiments, and return the set of Pareto-efficient experimental designs.

Internally, evaluate_experiments is called first, followed by efficient_designs.

Keyword arguments

Example

efficient_designs(
    experiments_costs,
    data_binary;
    eval_options = (; zero_cost_features),
    arrangement_options = (; max_parallel = 2, tradeoff = (0.0, 1)),
)
CEEDesigns.StaticDesigns.evaluate_experimentsMethod
evaluate_experiments(experiments, model, X, y; zero_cost_features=[], evaluate_empty_subset=true, return_full_metrics=false, kwargs...)

Evaluate predictive accuracy over subsets of experiments, and return the metrics. The evaluation is facilitated by MLJ.evaluate; additional keyword arguments to this function will be passed to evaluate.

Evaluations are run in parallel.

Arguments

  • experiments: a dictionary containing pairs experiment => (cost =>) features, where features is a subset of column names in data.
  • model: a predictive model whose accuracy will be evaluated.
  • X: a dataframe with features used for prediction.
  • y: the variable that we aim to predict.

Keyword arguments

  • max_cardinality: maximum cardinality of experimental subsets (defaults to the number of experiments).
  • zero_cost_features: additional zero-cost features available for each experimental subset (defaults to an empty list).
  • evaluate_empty_subset: flag indicating whether to evaluate empty experimental subset. A constant column will be added if zero_cost_features is empty (defaults to true).
  • return_full_metrics: flag indicating whether to return full MLJ.PerformanceEvaluation metrics. Otherwise return an aggregate "measurement" for the first measure (defaults to false).

Example

evaluate_experiments(
    experiments,
    model,
    data[!, Not("HeartDisease")],
    data[!, "HeartDisease"];
    zero_cost_features,
    measure = LogLoss(),
)
CEEDesigns.StaticDesigns.evaluate_experimentsMethod
evaluate_experiments(experiments, X; zero_cost_features=[], evaluate_empty_subset=true)

Evaluate discriminative power for subsets of experiments, and return the metrics.

Evaluations are run in parallel.

Arguments

  • experiments: a dictionary containing pairs experiment => (cost =>) features, where features is a subset of column names in X.
  • X: a dataframe containing binary labels, where false indicated that an entity was filtered out by the experiment (and should be removed from the triage).

Keyword arguments

  • zero_cost_features: additional zero-cost features available for each experimental subset (defaults to an empty list).
  • evaluate_empty_subset: flag indicating whether to evaluate empty experimental subset.

Example

evaluate_experiments(experiments, data_binary; zero_cost_features)
CEEDesigns.GenerativeDesigns.eoxConstant

A penalized action that results in a terminal state, e.g., in situations where conducting additional experiments is not possible, but the level of uncertainty remains above an acceptable threshold.

CEEDesigns.GenerativeDesigns.EfficientValueMDPType
EfficientValueMDP(costs; sampler, value, evidence=Evidence(), <keyword arguments>)

Structure that parametrizes the experimental decision-making process. It is used in the object interface of POMDPs.

In this experimental setup, our objective is to maximize the value of the experimental evidence (such as clinical utility), adjusted for experimental costs.

Internally, the reward associated with a particular experimental evidence and with total accumulated monetary_cost and (optionally) execution_time is computed as value(evidence) - costs_tradeoff' * [monetary_cost, execution_time].

Arguments

  • costs: a dictionary containing pairs experiment => cost, where cost can either be a scalar cost (modelled as a monetary cost) or a tuple (monetary cost, execution time).

Keyword Arguments

  • sampler: a function of (evidence, features, rng), in which evidence denotes the current experimental evidence, features represent the set of features we want to sample from, and rng is a random number generator; it returns a dictionary mapping the features to outcomes.
  • value: a function of (evidence); it quantifies the utility of experimental evidence.
  • evidence=Evidence(): initial experimental evidence.
  • max_parallel: maximum number of parallel experiments.
  • discount: this is the discounting factor utilized in reward computation.
CEEDesigns.GenerativeDesigns.UncertaintyReductionMDPType
UncertaintyReductionMDP(costs; sampler, uncertainty, threshold, evidence=Evidence(), <keyword arguments>)

Structure that parametrizes the experimental decision-making process. It is used in the object interface of POMDPs.

In this experimental setup, our objective is to minimize the expected experimental cost while ensuring the uncertainty remains below a specified threshold.

Internally, a state of the decision process is modeled as a tuple (evidence::Evidence, [total accumulated monetary cost, total accumulated execution time]).

Arguments

  • costs: a dictionary containing pairs experiment => cost, where cost can either be a scalar cost (modelled as a monetary cost) or a tuple (monetary cost, execution time).

Keyword Arguments

  • sampler: a function of (evidence, features, rng), in which evidence denotes the current experimental evidence, features represent the set of features we want to sample from, and rng is a random number generator; it returns a dictionary mapping the features to outcomes.
  • uncertainty: a function of evidence; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.
  • threshold: a number representing the acceptable level of uncertainty about the target variable.
  • evidence=Evidence(): initial experimental evidence.
  • costs_tradeoff: tradeoff between monetary cost and execution time of an experimental designs, given as a tuple of floats.
  • max_parallel: maximum number of parallel experiments.
  • discount: this is the discounting factor utilized in reward computation.
  • bigM: it refers to the penalty that arises in a scenario where further experimental action is not an option, yet the uncertainty exceeds the allowable limit.
  • max_experiments: this denotes the maximum number of experiments that are permissible to be conducted.
CEEDesigns.GenerativeDesigns.DistanceBasedMethod
DistanceBased(data; target, uncertainty=Entropy(), similarity=Exponential(), distance=Dict(); prior=ones(nrow(data)))

Compute distances between experimental evidence and historical readouts, and apply a 'similarity' functional to obtain probability mass for each row.

Consider using QuadraticDistance, DiscreteDistance, and SquaredMahalanobisDistance.

Return value

A named tuple with the following fields:

  • sampler: a function of (evidence, features, rng), in which evidence denotes the current experimental evidence, features represent the set of features we want to sample from, and rng is a random number generator; it returns a dictionary mapping the features to outcomes.
  • uncertainty: a function of evidence; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.
  • weights: a function of evidence; it returns probabilities (posterior) acrss the rows in data.

Arguments

  • data: a dataframe with historical data.
  • target: target column name or a vector of target columns names.

Keyword Argumets

  • uncertainty: a function that takes the subdataframe containing columns in targets along with prior, and returns an anonymous function taking a single argument (a probability vector over observations) and returns an uncertainty measure over targets.
  • similarity: a function that, for each row, takes distances between row[col] and readout[col], and returns a non-negative probability mass for the row.
  • distance: a dictionary of pairs colname => similarity functional, where a similarity functional must implement the signature (readout, col; prior). Defaults to QuadraticDistance and DiscreteDistance for Continuous and Multiclass scitypes, respectively.
  • prior: prior across rows, uniform by default.
  • filter_range: a dictionary of pairs colname => (lower bound, upper bound). If there's data in the current state for a specific column specified in this list, only historical observations within the defined range for that column are considered.
  • importance_weights: a dictionary of pairs colname with either weights or a function x -> weight, which will be applied to each element of the column to obtain the vector of weights. If data for a given column is available in the current state, the product of the corresponding weights is used to adjust the similarity vector.

Example

(; sampler, uncertainty, weights) = DistanceBased(
    data;
    target = "HeartDisease",
    uncertainty = Entropy(),
    similarity = Exponential(; λ = 5),
);
CEEDesigns.GenerativeDesigns.EntropyMethod
Entropy()

Return a function of (labels; prior). When this function is called as part of an instantiation procedure in DistanceBased, it returns an internal function of weights that computes the fraction of information entropy, relative to the entropy calculated with respect to a specified prior.

CEEDesigns.GenerativeDesigns.QuadraticDistanceMethod
QuadraticDistance(; λ=1, standardize=true)

This returns an anonymous function (x, col; prior) -> λ * (x .- col).^2 / σ. If standardize is set to true, σ represents col's variance calculated in relation to prior, otherwise σ equals one.

CEEDesigns.GenerativeDesigns.SquaredMahalanobisDistanceMethod
SquaredMahalanobisDistance(; diagonal=0)

Returns a function that computes squared Mahalanobis distance between each row of data and the evidence. For a singular covariance matrix, consider adding entries to the matrix's diagonal via the diagonal keyword.

To accommodate missing values, we have implemented an approach described in https://www.jstor.org/stable/3559861, on page 285.

Arguments

  • diagonal: A scalar to be added to the diagonal entries of the covariance matrix.

Returns

It returns a high-level function of (data, targets, prior). When called, that function will return an internal function compute_distances that takes an Evidence and computes the squared Mahalanobis distance based on the input data and the evidence.

CEEDesigns.GenerativeDesigns.VarianceMethod
Variance()

Return a function of (data; prior). When this function is called as part of an instantiation procedure in DistanceBased, it returns an internal function of weights that computes the fraction of variance in the data, relative to the variance calculated with respect to a specified prior.

CEEDesigns.GenerativeDesigns.efficient_designMethod
efficient_design(costs; sampler, uncertainty, threshold, evidence=Evidence(), <keyword arguments>)

In the uncertainty reduction setup, minimize the expected experimental cost while ensuring the uncertainty remains below a specified threshold.

Arguments

  • costs: a dictionary containing pairs experiment => cost, where cost can either be a scalar cost (modelled as a monetary cost) or a tuple (monetary cost, execution time).

Keyword Arguments

  • sampler: a function of (evidence, features, rng), in which evidence denotes the current experimental evidence, features represent the set of features we want to sample from, and rng is a random number generator; it returns a dictionary mapping the features to outcomes.
  • uncertainty: a function of evidence; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.
  • threshold: uncertainty threshold.
  • evidence=Evidence(): initial experimental evidence.
  • solver=default_solver: a POMDPs.jl compatible solver used to solve the decision process. The default solver is DPWSolver.
  • repetitions=0: number of runoffs used to estimate the expected experimental cost.
  • mdp_options: a NamedTuple of additional keyword arguments that will be passed to the constructor of UncertaintyReductionMDP.
  • realized_uncertainty=false: whenever the initial state uncertainty is below the selected threshold, return the actual uncertainty of this state.

Example

(; sampler, uncertainty, weights) = DistanceBased(
    data;
    target = "HeartDisease",
    uncertainty = Entropy(),
    similarity = Exponential(; λ = 5),
);
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver = GenerativeDesigns.DPWSolver(; n_iterations = 60_000, tree_in_info = true)
designs = efficient_design(
    costs;
    experiments,
    sampler,
    uncertainty,
    threshold = 0.6,
    evidence,
    solver,            # planner
    mdp_options = (; max_parallel = 1),
    repetitions = 5,
)
CEEDesigns.GenerativeDesigns.efficient_designsMethod
efficient_designs(costs; sampler, uncertainty, thresholds, evidence=Evidence(), <keyword arguments>)

In the uncertainty reduction setup, minimize the expected experimental resource spend over a range of uncertainty thresholds, and return the set of Pareto-efficient designs in the dimension of cost and uncertainty threshold.

Internally, an instance of the UncertaintyReductionMDP structure is created for every selected uncertainty threshold and the corresponding runoffs are simulated.

Arguments

  • costs: a dictionary containing pairs experiment => cost, where cost can either be a scalar cost (modelled as a monetary cost) or a tuple (monetary cost, execution time).

Keyword Arguments

  • sampler: a function of (evidence, features, rng), in which evidence denotes the current experimental evidence, features represent the set of features we want to sample from, and rng is a random number generator; it returns a dictionary mapping the features to outcomes.
  • uncertainty: a function of evidence; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.
  • thresholds: number of thresholds to consider uniformly in the range between 0 and 1, inclusive.
  • evidence=Evidence(): initial experimental evidence.
  • solver=default_solver: a POMDPs.jl compatible solver used to solve the decision process. The default solver is DPWSolver.
  • repetitions=0: number of runoffs used to estimate the expected experimental cost.
  • mdp_options: a NamedTuple of additional keyword arguments that will be passed to the constructor of UncertaintyReductionMDP.
  • realized_uncertainty=false: whenever the initial state uncertainty is below the selected threshold, return the actual uncertainty of this state.

Example

(; sampler, uncertainty, weights) = DistanceBased(
    data;
    target = "HeartDisease",
    uncertainty = Entropy(),
    similarity = Exponential(; λ = 5),
);
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver = GenerativeDesigns.DPWSolver(; n_iterations = 60_000, tree_in_info = true)
designs = efficient_designs(
    costs;
    experiments,
    sampler,
    uncertainty,
    thresholds = 6,
    evidence,
    solver,            # planner
    mdp_options = (; max_parallel = 1),
    repetitions = 5,
)
CEEDesigns.GenerativeDesigns.efficient_valueMethod
efficient_value(costs; sampler, value, evidence=Evidence(), <keyword arguments>)

Estimate the maximum value of experimental evidence (such as clinical utility), adjusted for experimental costs.

Internally, an instance of the EfficientValueMDP structure is created and a summary over repetitions runoffs is returned.

Arguments

  • costs: a dictionary containing pairs experiment => cost, where cost can either be a scalar cost (modelled as a monetary cost) or a tuple (monetary cost, execution time).

Keyword Arguments

  • sampler: a function of (evidence, features, rng), in which evidence denotes the current experimental evidence, features represent the set of features we want to sample from, and rng is a random number generator; it returns a dictionary mapping the features to outcomes.
  • value: a function of (evidence, (monetary costs, execution time)); it quantifies the utility of experimental evidence.
  • evidence=Evidence(): initial experimental evidence.
  • solver=default_solver: a POMDPs.jl compatible solver used to solve the decision process. The default solver is DPWSolver.
  • repetitions=0: number of runoffs used to estimate the expected experimental cost.
  • mdp_options: a NamedTuple of additional keyword arguments that will be passed to the constructor of EfficientValueMDP.

Example

(; sampler, uncertainty, weights) = DistanceBased(
    data;
    target = "HeartDisease",
    uncertainty = Entropy(),
    similarity = Exponential(; λ = 5),
);
value = (evidence, costs) -> (1 - uncertainty(evidence) + 0.005 * sum(costs));
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver =
    GenerativeDesigns.DPWSolver(; n_iterations = 10_000, depth = 3, tree_in_info = true)
design = efficient_value(
    experiments;
    sampler,
    value,
    evidence,
    solver,            # planner
    mdp_options = (; max_parallel = 1),
    repetitions = 5,
)