CEEDesigns.front
— Functionfront(v)
front(f, v; atol1=0, atol2=0)
Construct a Pareto front of v
. Elements of v
will be masked by f
in the computation.
The first and second (objective) coordinates have to differ by at least atol1
, atol2
, respectively, relatively to the latest point on the front.
Examples
v = [(1,2), (2,3), (2,1)]
front(v)
# output
[(1, 2), (2, 1)]
v = [(1, (1, 2)), (2, (2, 3)), (3, (2, 1))]
front(x -> x[2], v)
# output
[(1, (1, 2)), (3, (2, 1))]
v = [(1, 2), (2, 1.99), (3, 1)]
front(v; atol2 = 0.2)
# output
[(1, 2), (3, 1)]
CEEDesigns.make_labels
— Methodmake_labels(designs)
Make labels used plotting of experimental designs.
CEEDesigns.plot_evals
— Methodplot_evals(evals; f, ylabel="information measure")
Create a stick plot that visualizes the performance measures evaluated for subsets of experiments.
Argument evals
should be the output of evaluate_experiments
and the kwarg f
(if provided) is a function that should take as input evals
and return a list of its keys in the order to be plotted on the x-axis. By default they are sorted by length.
CEEDesigns.plot_front
— Methodplot_front(designs; grad=cgrad(:Paired_12), xlabel, ylabel, labels=get_labels(designs))
Render scatter plot of efficient designs, as returned from efficient_designs
.
You may optionally specify a color gradient, to draw the colors from.
Examples
designs = efficient_designs(experiment, state)
plot_front(designs)
plot_front(designs; grad = cgrad(:Paired_12))
CEEDesigns.StaticDesigns.ArrangementMDP
— TypeArrangementMDP(; experiments, experimentalcosts, evals, maxparallel=1, tradeoff=(1, 0)) Structure to parametrize a MDP that is used to approximate the optimal experimental arrangement.
CEEDesigns.StaticDesigns.efficient_designs
— Methodefficient_designs(experiments, evals; max_parallel=1, tradeoff=(1, 0), mdp_kwargs=default_mdp_kwargs)
Return the set of Pareto-efficient experimental designs, given experimental costs, predictive accuracy (loss), and estimated filtration rates for experimental subsets.
Arguments
experiments
: a dictionary containing pairsexperiment => cost (=> features)
, wherecost
can either be scalar cost or a tuple(monetary cost, execution time)
.evals
: a dictionary containing pairsexperimental subset => (; predictive loss, filtration)
.
Keyword arguments
parallel
: to estimate the execution time of the design, define the number of experiments that can run concurrently. The experiments will subsequently be arranged in descending order based on their individual durations, and they will be then iteratively allocated into consecutive groups that represent parallel experiments.tradeoff
: determines how to project the monetary cost and execution time of an experimental design onto a single combined cost.
Example
efficient_designs(
experiments_costs,
model,
data[!, Not("HeartDisease")],
data[!, "HeartDisease"];
eval_options = (; zero_cost_features, measure = LogLoss()),
arrangement_options = (; max_parallel = 2, tradeoff = (0.0, 1)),
)
CEEDesigns.StaticDesigns.efficient_designs
— Methodefficient_designs(experiments, args...; eval_options, arrangement_options)
Evaluate predictive power for subsets of experiments, and return the set of Pareto-efficient experimental designs.
Internally, evaluate_experiments
is called first, followed by efficient_designs
.
Keyword arguments
eval_options
: keyword arguments toevaluate_experiments
.arrangement_options
: keyword arguments toefficient_designs
.
Example
efficient_designs(
experiments_costs,
data_binary;
eval_options = (; zero_cost_features),
arrangement_options = (; max_parallel = 2, tradeoff = (0.0, 1)),
)
CEEDesigns.StaticDesigns.evaluate_experiments
— Methodevaluate_experiments(experiments, model, X, y; zero_cost_features=[], evaluate_empty_subset=true, return_full_metrics=false, kwargs...)
Evaluate predictive accuracy over subsets of experiments, and return the metrics. The evaluation is facilitated by MLJ.evaluate
; additional keyword arguments to this function will be passed to evaluate
.
Evaluations are run in parallel.
Arguments
experiments
: a dictionary containing pairsexperiment => (cost =>) features
, wherefeatures
is a subset of column names indata
.model
: a predictive model whose accuracy will be evaluated.X
: a dataframe with features used for prediction.y
: the variable that we aim to predict.
Keyword arguments
max_cardinality
: maximum cardinality of experimental subsets (defaults to the number of experiments).zero_cost_features
: additional zero-cost features available for each experimental subset (defaults to an empty list).evaluate_empty_subset
: flag indicating whether to evaluate empty experimental subset. A constant column will be added ifzero_cost_features
is empty (defaults to true).return_full_metrics
: flag indicating whether to return fullMLJ.PerformanceEvaluation
metrics. Otherwise return an aggregate "measurement" for the first measure (defaults to false).
Example
evaluate_experiments(
experiments,
model,
data[!, Not("HeartDisease")],
data[!, "HeartDisease"];
zero_cost_features,
measure = LogLoss(),
)
CEEDesigns.StaticDesigns.evaluate_experiments
— Methodevaluate_experiments(experiments, X; zero_cost_features=[], evaluate_empty_subset=true)
Evaluate discriminative power for subsets of experiments, and return the metrics.
Evaluations are run in parallel.
Arguments
experiments
: a dictionary containing pairsexperiment => (cost =>) features
, wherefeatures
is a subset of column names inX
.X
: a dataframe containing binary labels, wherefalse
indicated that an entity was filtered out by the experiment (and should be removed from the triage).
Keyword arguments
zero_cost_features
: additional zero-cost features available for each experimental subset (defaults to an empty list).evaluate_empty_subset
: flag indicating whether to evaluate empty experimental subset.
Example
evaluate_experiments(experiments, data_binary; zero_cost_features)
CEEDesigns.GenerativeDesigns.eox
— ConstantA penalized action that results in a terminal state, e.g., in situations where conducting additional experiments is not possible, but the level of uncertainty remains above an acceptable threshold.
CEEDesigns.GenerativeDesigns.ActionCost
— TypeRepresent action as a named tuple (; costs=(monetary cost, time), features)
.
CEEDesigns.GenerativeDesigns.EfficientValueMDP
— TypeEfficientValueMDP(costs; sampler, value, evidence=Evidence(), <keyword arguments>)
Structure that parametrizes the experimental decision-making process. It is used in the object interface of POMDPs.
In this experimental setup, our objective is to maximize the value of the experimental evidence (such as clinical utility), adjusted for experimental costs.
Internally, the reward associated with a particular experimental evidence
and with total accumulated monetary_cost
and (optionally) execution_time
is computed as value(evidence) - costs_tradeoff' * [monetary_cost, execution_time]
.
Arguments
costs
: a dictionary containing pairsexperiment => cost
, wherecost
can either be a scalar cost (modelled as a monetary cost) or a tuple(monetary cost, execution time)
.
Keyword Arguments
sampler
: a function of(evidence, features, rng)
, in whichevidence
denotes the current experimental evidence,features
represent the set of features we want to sample from, andrng
is a random number generator; it returns a dictionary mapping the features to outcomes.value
: a function of(evidence)
; it quantifies the utility of experimental evidence.evidence=Evidence()
: initial experimental evidence.max_parallel
: maximum number of parallel experiments.discount
: this is the discounting factor utilized in reward computation.
CEEDesigns.GenerativeDesigns.Evidence
— TypeRepresent experimental evidence as an immutable dictionary.
CEEDesigns.GenerativeDesigns.State
— TypeRepresent experimental state as a tuple of experimental costs and evidence.
CEEDesigns.GenerativeDesigns.UncertaintyReductionMDP
— TypeUncertaintyReductionMDP(costs; sampler, uncertainty, threshold, evidence=Evidence(), <keyword arguments>)
Structure that parametrizes the experimental decision-making process. It is used in the object interface of POMDPs.
In this experimental setup, our objective is to minimize the expected experimental cost while ensuring the uncertainty remains below a specified threshold.
Internally, a state of the decision process is modeled as a tuple (evidence::Evidence, [total accumulated monetary cost, total accumulated execution time])
.
Arguments
costs
: a dictionary containing pairsexperiment => cost
, wherecost
can either be a scalar cost (modelled as a monetary cost) or a tuple(monetary cost, execution time)
.
Keyword Arguments
sampler
: a function of(evidence, features, rng)
, in whichevidence
denotes the current experimental evidence,features
represent the set of features we want to sample from, andrng
is a random number generator; it returns a dictionary mapping the features to outcomes.uncertainty
: a function ofevidence
; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.threshold
: a number representing the acceptable level of uncertainty about the target variable.evidence=Evidence()
: initial experimental evidence.costs_tradeoff
: tradeoff between monetary cost and execution time of an experimental designs, given as a tuple of floats.max_parallel
: maximum number of parallel experiments.discount
: this is the discounting factor utilized in reward computation.bigM
: it refers to the penalty that arises in a scenario where further experimental action is not an option, yet the uncertainty exceeds the allowable limit.max_experiments
: this denotes the maximum number of experiments that are permissible to be conducted.
CEEDesigns.GenerativeDesigns.DiscreteDistance
— MethodDiscreteDistance(; λ=1)
Return an anonymous function (x, col) -> λ * (x .== col)
.
CEEDesigns.GenerativeDesigns.DistanceBased
— MethodDistanceBased(data; target, uncertainty=Entropy(), similarity=Exponential(), distance=Dict(); prior=ones(nrow(data)))
Compute distances between experimental evidence and historical readouts, and apply a 'similarity' functional to obtain probability mass for each row.
Consider using QuadraticDistance
, DiscreteDistance
, and SquaredMahalanobisDistance
.
Return value
A named tuple with the following fields:
sampler
: a function of(evidence, features, rng)
, in whichevidence
denotes the current experimental evidence,features
represent the set of features we want to sample from, andrng
is a random number generator; it returns a dictionary mapping the features to outcomes.uncertainty
: a function ofevidence
; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.weights
: a function ofevidence
; it returns probabilities (posterior) acrss the rows indata
.
Arguments
data
: a dataframe with historical data.target
: target column name or a vector of target columns names.
Keyword Argumets
uncertainty
: a function that takes the subdataframe containing columns in targets along with prior, and returns an anonymous function taking a single argument (a probability vector over observations) and returns an uncertainty measure over targets.similarity
: a function that, for each row, takes distances betweenrow[col]
andreadout[col]
, and returns a non-negative probability mass for the row.distance
: a dictionary of pairscolname => similarity functional
, where a similarity functional must implement the signature(readout, col; prior)
. Defaults toQuadraticDistance
andDiscreteDistance
forContinuous
andMulticlass
scitypes, respectively.prior
: prior across rows, uniform by default.filter_range
: a dictionary of pairscolname => (lower bound, upper bound)
. If there's data in the current state for a specific column specified in this list, only historical observations within the defined range for that column are considered.importance_weights
: a dictionary of pairscolname
with eitherweights
or a functionx -> weight
, which will be applied to each element of the column to obtain the vector of weights. If data for a given column is available in the current state, the product of the corresponding weights is used to adjust the similarity vector.
Example
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
CEEDesigns.GenerativeDesigns.Entropy
— MethodEntropy()
Return a function of (labels; prior)
. When this function is called as part of an instantiation procedure in DistanceBased
, it returns an internal function of weights
that computes the fraction of information entropy, relative to the entropy calculated with respect to a specified prior
.
CEEDesigns.GenerativeDesigns.Exponential
— MethodExponential(; λ=1)
Return an anonymous function x -> exp(-λ * sum(x; init=0))
.
CEEDesigns.GenerativeDesigns.QuadraticDistance
— MethodQuadraticDistance(; λ=1, standardize=true)
This returns an anonymous function (x, col; prior) -> λ * (x .- col).^2 / σ
. If standardize
is set to true
, σ
represents col
's variance calculated in relation to prior
, otherwise σ
equals one.
CEEDesigns.GenerativeDesigns.SquaredMahalanobisDistance
— MethodSquaredMahalanobisDistance(; diagonal=0)
Returns a function that computes squared Mahalanobis distance between each row of data
and the evidence. For a singular covariance matrix, consider adding entries to the matrix's diagonal via the diagonal
keyword.
To accommodate missing values, we have implemented an approach described in https://www.jstor.org/stable/3559861, on page 285.
Arguments
diagonal
: A scalar to be added to the diagonal entries of the covariance matrix.
Returns
It returns a high-level function of (data, targets, prior)
. When called, that function will return an internal function compute_distances
that takes an Evidence
and computes the squared Mahalanobis distance based on the input data and the evidence.
CEEDesigns.GenerativeDesigns.Variance
— MethodVariance()
Return a function of (data; prior)
. When this function is called as part of an instantiation procedure in DistanceBased
, it returns an internal function of weights
that computes the fraction of variance in the data, relative to the variance calculated with respect to a specified prior
.
CEEDesigns.GenerativeDesigns.efficient_design
— Methodefficient_design(costs; sampler, uncertainty, threshold, evidence=Evidence(), <keyword arguments>)
In the uncertainty reduction setup, minimize the expected experimental cost while ensuring the uncertainty remains below a specified threshold.
Arguments
costs
: a dictionary containing pairsexperiment => cost
, wherecost
can either be a scalar cost (modelled as a monetary cost) or a tuple(monetary cost, execution time)
.
Keyword Arguments
sampler
: a function of(evidence, features, rng)
, in whichevidence
denotes the current experimental evidence,features
represent the set of features we want to sample from, andrng
is a random number generator; it returns a dictionary mapping the features to outcomes.uncertainty
: a function ofevidence
; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.threshold
: uncertainty threshold.evidence=Evidence()
: initial experimental evidence.solver=default_solver
: a POMDPs.jl compatible solver used to solve the decision process. The default solver isDPWSolver
.repetitions=0
: number of runoffs used to estimate the expected experimental cost.mdp_options
: aNamedTuple
of additional keyword arguments that will be passed to the constructor ofUncertaintyReductionMDP
.realized_uncertainty=false
: whenever the initial state uncertainty is below the selected threshold, return the actual uncertainty of this state.
Example
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver = GenerativeDesigns.DPWSolver(; n_iterations = 60_000, tree_in_info = true)
designs = efficient_design(
costs;
experiments,
sampler,
uncertainty,
threshold = 0.6,
evidence,
solver, # planner
mdp_options = (; max_parallel = 1),
repetitions = 5,
)
CEEDesigns.GenerativeDesigns.efficient_designs
— Methodefficient_designs(costs; sampler, uncertainty, thresholds, evidence=Evidence(), <keyword arguments>)
In the uncertainty reduction setup, minimize the expected experimental resource spend over a range of uncertainty thresholds, and return the set of Pareto-efficient designs in the dimension of cost and uncertainty threshold.
Internally, an instance of the UncertaintyReductionMDP
structure is created for every selected uncertainty threshold and the corresponding runoffs are simulated.
Arguments
costs
: a dictionary containing pairsexperiment => cost
, wherecost
can either be a scalar cost (modelled as a monetary cost) or a tuple(monetary cost, execution time)
.
Keyword Arguments
sampler
: a function of(evidence, features, rng)
, in whichevidence
denotes the current experimental evidence,features
represent the set of features we want to sample from, andrng
is a random number generator; it returns a dictionary mapping the features to outcomes.uncertainty
: a function ofevidence
; it returns the measure of variance or uncertainty about the target variable, conditioned on the experimental evidence acquired so far.thresholds
: number of thresholds to consider uniformly in the range between 0 and 1, inclusive.evidence=Evidence()
: initial experimental evidence.solver=default_solver
: a POMDPs.jl compatible solver used to solve the decision process. The default solver isDPWSolver
.repetitions=0
: number of runoffs used to estimate the expected experimental cost.mdp_options
: aNamedTuple
of additional keyword arguments that will be passed to the constructor ofUncertaintyReductionMDP
.realized_uncertainty=false
: whenever the initial state uncertainty is below the selected threshold, return the actual uncertainty of this state.
Example
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver = GenerativeDesigns.DPWSolver(; n_iterations = 60_000, tree_in_info = true)
designs = efficient_designs(
costs;
experiments,
sampler,
uncertainty,
thresholds = 6,
evidence,
solver, # planner
mdp_options = (; max_parallel = 1),
repetitions = 5,
)
CEEDesigns.GenerativeDesigns.efficient_value
— Methodefficient_value(costs; sampler, value, evidence=Evidence(), <keyword arguments>)
Estimate the maximum value of experimental evidence (such as clinical utility), adjusted for experimental costs.
Internally, an instance of the EfficientValueMDP
structure is created and a summary over repetitions
runoffs is returned.
Arguments
costs
: a dictionary containing pairsexperiment => cost
, wherecost
can either be a scalar cost (modelled as a monetary cost) or a tuple(monetary cost, execution time)
.
Keyword Arguments
sampler
: a function of(evidence, features, rng)
, in whichevidence
denotes the current experimental evidence,features
represent the set of features we want to sample from, andrng
is a random number generator; it returns a dictionary mapping the features to outcomes.value
: a function of(evidence, (monetary costs, execution time))
; it quantifies the utility of experimental evidence.evidence=Evidence()
: initial experimental evidence.solver=default_solver
: a POMDPs.jl compatible solver used to solve the decision process. The default solver isDPWSolver
.repetitions=0
: number of runoffs used to estimate the expected experimental cost.mdp_options
: aNamedTuple
of additional keyword arguments that will be passed to the constructor ofEfficientValueMDP
.
Example
(; sampler, uncertainty, weights) = DistanceBased(
data;
target = "HeartDisease",
uncertainty = Entropy(),
similarity = Exponential(; λ = 5),
);
value = (evidence, costs) -> (1 - uncertainty(evidence) + 0.005 * sum(costs));
# initialize evidence
evidence = Evidence("Age" => 35, "Sex" => "M")
# set up solver (or use default)
solver =
GenerativeDesigns.DPWSolver(; n_iterations = 10_000, depth = 3, tree_in_info = true)
design = efficient_value(
experiments;
sampler,
value,
evidence,
solver, # planner
mdp_options = (; max_parallel = 1),
repetitions = 5,
)