`CEEDesigns.front`

— Function```
front(v)
front(f, v; atol1=0, atol2=0)
```

Construct a Pareto front of `v`

. Elements of `v`

will be masked by `f`

in the computation.

The first and second (objective) coordinates have to differ by at least `atol1`

, `atol2`

, respectively, relatively to the latest point on the front.

**Examples**

```
v = [(1,2), (2,3), (2,1)]
front(v)
# output
[(1, 2), (2, 1)]
```

```
v = [(1, (1, 2)), (2, (2, 3)), (3, (2, 1))]
front(x -> x[2], v)
# output
[(1, (1, 2)), (3, (2, 1))]
```

```
v = [(1, 2), (2, 1.99), (3, 1)]
front(v; atol2 = 0.2)
# output
[(1, 2), (3, 1)]
```

`CEEDesigns.make_labels`

— Method`make_labels(designs)`

Make labels used plotting of experimental designs.

`CEEDesigns.plot_evals`

— Method`plot_evals(evals; f, ylabel="information measure")`

Create a stick plot that visualizes the performance measures evaluated for subsets of experiments.

Argument `evals`

should be the output of `evaluate_experiments`

and the kwarg `f`

(if provided) is a function that should take as input `evals`

and return a list of its keys in the order to be plotted on the x-axis. By default they are sorted by length.

`CEEDesigns.plot_front`

— Method`plot_front(designs; grad=cgrad(:Paired_12), xlabel, ylabel, labels=get_labels(designs))`

Render scatter plot of efficient designs, as returned from `efficient_designs`

.

You may optionally specify a color gradient, to draw the colors from.

**Examples**

```
designs = efficient_designs(experiment, state)
plot_front(designs)
plot_front(designs; grad = cgrad(:Paired_12))
```

`CEEDesigns.StaticDesigns.ArrangementMDP`

— TypeArrangementMDP(; experiments, experimental*costs, evals, max*parallel=1, tradeoff=(1, 0)) Structure to parametrize a MDP that is used to approximate the optimal experimental arrangement.

`CEEDesigns.StaticDesigns.efficient_designs`

— Method`efficient_designs(experiments, evals; max_parallel=1, tradeoff=(1, 0), mdp_kwargs=default_mdp_kwargs)`

Return the set of Pareto-efficient experimental designs, given experimental costs, predictive accuracy (loss), and estimated filtration rates for experimental subsets.

**Arguments**

`experiments`

: a dictionary containing pairs`experiment => cost (=> features)`

, where`cost`

can either be scalar cost or a tuple`(monetary cost, execution time)`

.`evals`

: a dictionary containing pairs`experimental subset => (; predictive loss, filtration)`

.

**Keyword arguments**

`parallel`

: to estimate the execution time of the design, define the number of experiments that can run concurrently. The experiments will subsequently be arranged in descending order based on their individual durations, and they will be then iteratively allocated into consecutive groups that represent parallel experiments.`tradeoff`

: determines how to project the monetary cost and execution time of an experimental design onto a single combined cost.

**Example**

```
efficient_designs(
experiments_costs,
model,
data[!, Not("HeartDisease")],
data[!, "HeartDisease"];
eval_options = (; zero_cost_features, measure = LogLoss()),
arrangement_options = (; max_parallel = 2, tradeoff = (0.0, 1)),
)
```

`CEEDesigns.StaticDesigns.efficient_designs`

— Method`efficient_designs(experiments, args...; eval_options, arrangement_options)`

Evaluate predictive power for subsets of experiments, and return the set of Pareto-efficient experimental designs.

Internally, `evaluate_experiments`

is called first, followed by `efficient_designs`

.

**Keyword arguments**

`eval_options`

: keyword arguments to`evaluate_experiments`

.`arrangement_options`

: keyword arguments to`efficient_designs`

.

**Example**

```
efficient_designs(
experiments_costs,
data_binary;
eval_options = (; zero_cost_features),
arrangement_options = (; max_parallel = 2, tradeoff = (0.0, 1)),
)
```

`CEEDesigns.StaticDesigns.evaluate_experiments`

— Method`evaluate_experiments(experiments, model, X, y; zero_cost_features=[], evaluate_empty_subset=true, return_full_metrics=false, kwargs...)`

Evaluate predictive accuracy over subsets of experiments, and return the metrics. The evaluation is facilitated by `MLJ.evaluate`

; additional keyword arguments to this function will be passed to `evaluate`

.

Evaluations are run in parallel.

**Arguments**

`experiments`

: a dictionary containing pairs`experiment => (cost =>) features`

, where`features`

is a subset of column names in`data`

.`model`

: a predictive model whose accuracy will be evaluated.`X`

: a dataframe with features used for prediction.`y`

: the variable that we aim to predict.

**Keyword arguments**

`max_cardinality`

: maximum cardinality of experimental subsets (defaults to the number of experiments).`zero_cost_features`

: additional zero-cost features available for each experimental subset (defaults to an empty list).`evaluate_empty_subset`

: flag indicating whether to evaluate empty experimental subset. A constant column will be added if`zero_cost_features`

is empty (defaults to true).`return_full_metrics`

: flag indicating whether to return full`MLJ.PerformanceEvaluation`

metrics. Otherwise return an aggregate "measurement" for the first measure (defaults to false).

**Example**

```
evaluate_experiments(
experiments,
model,
data[!, Not("HeartDisease")],
data[!, "HeartDisease"];
zero_cost_features,
measure = LogLoss(),
)
```

`CEEDesigns.StaticDesigns.evaluate_experiments`

— Method`evaluate_experiments(experiments, X; zero_cost_features=[], evaluate_empty_subset=true)`

Evaluate discriminative power for subsets of experiments, and return the metrics.

Evaluations are run in parallel.

**Arguments**

`experiments`

: a dictionary containing pairs`experiment => (cost =>) features`

, where`features`

is a subset of column names in`X`

.`X`

: a dataframe containing binary labels, where`false`

indicated that an entity was filtered out by the experiment (and should be removed from the triage).

**Keyword arguments**

`zero_cost_features`

: additional zero-cost features available for each experimental subset (defaults to an empty list).`evaluate_empty_subset`

: flag indicating whether to evaluate empty experimental subset.

**Example**

`evaluate_experiments(experiments, data_binary; zero_cost_features)`

`CEEDesigns.GenerativeDesigns.eox`

— ConstantA penalized action that results in a terminal state, e.g., in situations where conducting additional experiments is not possible, but the level of uncertainty remains above an acceptable threshold.

`CEEDesigns.GenerativeDesigns.ActionCost`

— TypeRepresent action as a named tuple `(; costs=(monetary cost, time), features)`

.

`CEEDesigns.GenerativeDesigns.EfficientValueMDP`

— Type`EfficientValueMDP(costs; sampler, value, evidence=Evidence(), <keyword arguments>)`

Structure that parametrizes the experimental decision-making process. It is used in the object interface of POMDPs.

In this experimental setup, our objective is to maximize the value of the experimental evidence (such as clinical utility), adjusted for experimental costs.

Internally, the reward associated with a particular experimental `evidence`

and with total accumulated `monetary_cost`

and (optionally) `execution_time`

is computed as `value(evidence) - costs_tradeoff' * [monetary_cost, execution_time]`

.

**Arguments**

`costs`

: a dictionary containing pairs`experiment => cost`

, where`cost`

can either be a scalar cost (modelled as a monetary cost) or a tuple`(monetary cost, execution time)`

.

**Keyword Arguments**

`sampler`

: a function of`(evidence, features, rng)`

, in which`evidence`

denotes the current experimental evidence,`features`

represent the set of features we want to sample from, and`rng`

is a random number generator; it returns a dictionary mapping the features to outcomes.`value`

: a function of`(evidence)`

; it quantifies the utility of experimental evidence.`evidence=Evidence()`

: initial experimental evidence.`max_parallel`

: maximum number of parallel experiments.`discount`

: this is the discounting factor utilized in reward computation.

`CEEDesigns.GenerativeDesigns.Evidence`

— TypeRepresent experimental evidence as an immutable dictionary.

`CEEDesigns.GenerativeDesigns.State`

— TypeRepresent experimental state as a tuple of experimental costs and evidence.

`CEEDesigns.GenerativeDesigns.UncertaintyReductionMDP`

— Type`UncertaintyReductionMDP(costs; sampler, uncertainty, threshold, evidence=Evidence(), <keyword arguments>)`

Structure that parametrizes the experimental decision-making process. It is used in the object interface of POMDPs.

In this experimental setup, our objective is to minimize the expected experimental cost while ensuring the uncertainty remains below a specified threshold.

Internally, a state of the decision process is modeled as a tuple `(evidence::Evidence, [total accumulated monetary cost, total accumulated execution time])`

.

**Arguments**

`costs`

: a dictionary containing pairs`experiment => cost`

, where`cost`

can either be a scalar cost (modelled as a monetary cost) or a tuple`(monetary cost, execution time)`

.

**Keyword Arguments**

`sampler`

: a function of`(evidence, features, rng)`

, in which`evidence`

denotes the current experimental evidence,`features`

represent the set of features we want to sample from, and`rng`

is a random number generator; it returns a dictionary mapping the features to outcomes.`uncertainty`

: a function of`evidence`

; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.`threshold`

: a number representing the acceptable level of uncertainty about the target variable.`evidence=Evidence()`

: initial experimental evidence.`costs_tradeoff`

: tradeoff between monetary cost and execution time of an experimental designs, given as a tuple of floats.`max_parallel`

: maximum number of parallel experiments.`discount`

: this is the discounting factor utilized in reward computation.`bigM`

: it refers to the penalty that arises in a scenario where further experimental action is not an option, yet the uncertainty exceeds the allowable limit.`max_experiments`

: this denotes the maximum number of experiments that are permissible to be conducted.

`CEEDesigns.GenerativeDesigns.DiscreteDistance`

— Method`DiscreteDistance(; λ=1)`

Return an anonymous function `(x, col) -> λ * (x .== col)`

.

`CEEDesigns.GenerativeDesigns.DistanceBased`

— Method`DistanceBased(data; target, uncertainty=Entropy(), similarity=Exponential(), distance=Dict(); prior=ones(nrow(data)))`

Compute distances between experimental evidence and historical readouts, and apply a 'similarity' functional to obtain probability mass for each row.

Consider using `QuadraticDistance`

, `DiscreteDistance`

, and `SquaredMahalanobisDistance`

.

**Return value**

A named tuple with the following fields:

`sampler`

: a function of`(evidence, features, rng)`

, in which`evidence`

denotes the current experimental evidence,`features`

represent the set of features we want to sample from, and`rng`

is a random number generator; it returns a dictionary mapping the features to outcomes.`uncertainty`

: a function of`evidence`

; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.`weights`

: a function of`evidence`

; it returns probabilities (posterior) acrss the rows in`data`

.

**Arguments**

`data`

: a dataframe with historical data.`target`

: target column name or a vector of target columns names.

**Keyword Argumets**

`uncertainty`

: a function that takes the subdataframe containing columns in targets along with prior, and returns an anonymous function taking a single argument (a probability vector over observations) and returns an uncertainty measure over targets.`similarity`

: a function that, for each row, takes distances between`row[col]`

and`readout[col]`

, and returns a non-negative probability mass for the row.`distance`

: a dictionary of pairs`colname => similarity functional`

, where a similarity functional must implement the signature`(readout, col; prior)`

. Defaults to`QuadraticDistance`

and`DiscreteDistance`

for`Continuous`

and`Multiclass`

scitypes, respectively.`prior`

: prior across rows, uniform by default.`filter_range`

: a dictionary of pairs`colname => (lower bound, upper bound)`

. If there's data in the current state for a specific column specified in this list, only historical observations within the defined range for that column are considered.`importance_weights`

: a dictionary of pairs`colname`

with either`weights`

or a function`x -> weight`

, which will be applied to each element of the column to obtain the vector of weights. If data for a given column is available in the current state, the product of the corresponding weights is used to adjust the similarity vector.

**Example**

```
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
```

`CEEDesigns.GenerativeDesigns.Entropy`

— Method`Entropy()`

Return a function of `(labels; prior)`

. When this function is called as part of an instantiation procedure in `DistanceBased`

, it returns an internal function of `weights`

that computes the fraction of information entropy, relative to the entropy calculated with respect to a specified `prior`

.

`CEEDesigns.GenerativeDesigns.Exponential`

— Method`Exponential(; λ=1)`

Return an anonymous function `x -> exp(-λ * sum(x; init=0))`

.

`CEEDesigns.GenerativeDesigns.QuadraticDistance`

— Method`QuadraticDistance(; λ=1, standardize=true)`

This returns an anonymous function `(x, col; prior) -> λ * (x .- col).^2 / σ`

. If `standardize`

is set to `true`

, `σ`

represents `col`

's variance calculated in relation to `prior`

, otherwise `σ`

equals one.

`CEEDesigns.GenerativeDesigns.SquaredMahalanobisDistance`

— Method`SquaredMahalanobisDistance(; diagonal=0)`

Returns a function that computes squared Mahalanobis distance between each row of `data`

and the evidence. For a singular covariance matrix, consider adding entries to the matrix's diagonal via the `diagonal`

keyword.

To accommodate missing values, we have implemented an approach described in https://www.jstor.org/stable/3559861, on page 285.

**Arguments**

`diagonal`

: A scalar to be added to the diagonal entries of the covariance matrix.

**Returns**

It returns a high-level function of `(data, targets, prior)`

. When called, that function will return an internal function `compute_distances`

that takes an `Evidence`

and computes the squared Mahalanobis distance based on the input data and the evidence.

`CEEDesigns.GenerativeDesigns.Variance`

— Method`Variance()`

Return a function of `(data; prior)`

. When this function is called as part of an instantiation procedure in `DistanceBased`

, it returns an internal function of `weights`

that computes the fraction of variance in the data, relative to the variance calculated with respect to a specified `prior`

.

`CEEDesigns.GenerativeDesigns.efficient_design`

— Method`efficient_design(costs; sampler, uncertainty, threshold, evidence=Evidence(), <keyword arguments>)`

In the uncertainty reduction setup, minimize the expected experimental cost while ensuring the uncertainty remains below a specified threshold.

**Arguments**

`costs`

: a dictionary containing pairs`experiment => cost`

, where`cost`

can either be a scalar cost (modelled as a monetary cost) or a tuple`(monetary cost, execution time)`

.

**Keyword Arguments**

`sampler`

: a function of`(evidence, features, rng)`

, in which`evidence`

denotes the current experimental evidence,`features`

represent the set of features we want to sample from, and`rng`

is a random number generator; it returns a dictionary mapping the features to outcomes.`uncertainty`

: a function of`evidence`

; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.`threshold`

: uncertainty threshold.`evidence=Evidence()`

: initial experimental evidence.`solver=default_solver`

: a POMDPs.jl compatible solver used to solve the decision process. The default solver is`DPWSolver`

.`repetitions=0`

: number of runoffs used to estimate the expected experimental cost.`mdp_options`

: a`NamedTuple`

of additional keyword arguments that will be passed to the constructor of`UncertaintyReductionMDP`

.`realized_uncertainty=false`

: whenever the initial state uncertainty is below the selected threshold, return the actual uncertainty of this state.

**Example**

```
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver = GenerativeDesigns.DPWSolver(; n_iterations = 60_000, tree_in_info = true)
designs = efficient_design(
costs;
experiments,
sampler,
uncertainty,
threshold = 0.6,
evidence,
solver, # planner
mdp_options = (; max_parallel = 1),
repetitions = 5,
)
```

`CEEDesigns.GenerativeDesigns.efficient_designs`

— Method`efficient_designs(costs; sampler, uncertainty, thresholds, evidence=Evidence(), <keyword arguments>)`

In the uncertainty reduction setup, minimize the expected experimental resource spend over a range of uncertainty thresholds, and return the set of Pareto-efficient designs in the dimension of cost and uncertainty threshold.

Internally, an instance of the `UncertaintyReductionMDP`

structure is created for every selected uncertainty threshold and the corresponding runoffs are simulated.

**Arguments**

`costs`

: a dictionary containing pairs`experiment => cost`

, where`cost`

can either be a scalar cost (modelled as a monetary cost) or a tuple`(monetary cost, execution time)`

.

**Keyword Arguments**

`sampler`

: a function of`(evidence, features, rng)`

, in which`evidence`

denotes the current experimental evidence,`features`

represent the set of features we want to sample from, and`rng`

is a random number generator; it returns a dictionary mapping the features to outcomes.`uncertainty`

: a function of`evidence`

; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.`thresholds`

: number of thresholds to consider uniformly in the range between 0 and 1, inclusive.`evidence=Evidence()`

: initial experimental evidence.`solver=default_solver`

: a POMDPs.jl compatible solver used to solve the decision process. The default solver is`DPWSolver`

.`repetitions=0`

: number of runoffs used to estimate the expected experimental cost.`mdp_options`

: a`NamedTuple`

of additional keyword arguments that will be passed to the constructor of`UncertaintyReductionMDP`

.`realized_uncertainty=false`

: whenever the initial state uncertainty is below the selected threshold, return the actual uncertainty of this state.

**Example**

```
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver = GenerativeDesigns.DPWSolver(; n_iterations = 60_000, tree_in_info = true)
designs = efficient_designs(
costs;
experiments,
sampler,
uncertainty,
thresholds = 6,
evidence,
solver, # planner
mdp_options = (; max_parallel = 1),
repetitions = 5,
)
```

`CEEDesigns.GenerativeDesigns.efficient_value`

— Method`efficient_value(costs; sampler, value, evidence=Evidence(), <keyword arguments>)`

Estimate the maximum value of experimental evidence (such as clinical utility), adjusted for experimental costs.

Internally, an instance of the `EfficientValueMDP`

structure is created and a summary over `repetitions`

runoffs is returned.

**Arguments**

`costs`

: a dictionary containing pairs`experiment => cost`

, where`cost`

can either be a scalar cost (modelled as a monetary cost) or a tuple`(monetary cost, execution time)`

.

**Keyword Arguments**

`sampler`

: a function of`(evidence, features, rng)`

, in which`evidence`

denotes the current experimental evidence,`features`

represent the set of features we want to sample from, and`rng`

is a random number generator; it returns a dictionary mapping the features to outcomes.`value`

: a function of`(evidence, (monetary costs, execution time))`

; it quantifies the utility of experimental evidence.`evidence=Evidence()`

: initial experimental evidence.`solver=default_solver`

: a POMDPs.jl compatible solver used to solve the decision process. The default solver is`DPWSolver`

.`repetitions=0`

: number of runoffs used to estimate the expected experimental cost.`mdp_options`

: a`NamedTuple`

of additional keyword arguments that will be passed to the constructor of`EfficientValueMDP`

.

**Example**

```
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
value = (evidence, costs) -> (1 - uncertainty(evidence) + 0.005 * sum(costs));
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver =
GenerativeDesigns.DPWSolver(; n_iterations = 10_000, depth = 3, tree_in_info = true)
design = efficient_value(
experiments;
sampler,
value,
evidence,
solver, # planner
mdp_options = (; max_parallel = 1),
repetitions = 5,
)
```