ClustForOpt.ClustDataType
  ClustData <: TSData

Contains time series data by attribute (e.g. wind, solar, electricity demand) and respective information.

Fields:

  • region::String: optional information to specify the region data belongs to
  • K::Int: number of periods
  • T::Int: time steps per period
  • data::Dict{String,Array}: Dictionary with an entry for each attribute [file name (attribute: e.g technology)]-[column name (node: e.g. location)], Each entry of the dictionary is a 2-dimensional time-steps T x periods K-Array holding the data
  • weights::Array{Float64,2}: 1-dimensional periods K-Array with the absolute weight for each period. The weight of a period corresponds to the number of days it representes. E.g. for a year of 365 days, sum(weights)=365
  • mean::Dict{String,Array}: Dictionary with a entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the shift of the mean. This is used internally for normalization.
  • sdv::Dict{String,Array}: Dictionary with an entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the standard deviation. This is used internally for normalization.
  • delta_t::Array{Float64,2}: 2-dimensional time-steps T x periods K-Array with the temporal duration Δt for each timestep. The default is that all timesteps have the same length.
  • k_ids::Array{Int}: 1-dimensional original periods I-Array with the information, which original period is represented by which period K. E.g. if the data is a year of 365 periods, the array has length 365. If an original period is not represented by any period within this ClustData the entry will be 0.
ClustForOpt.ClustDataMethod
ClustData(data::ClustDataMerged)

constructor 2: Convert ClustDataMerged to ClustData

ClustForOpt.ClustDataMethod
ClustData(data::FullInputData,K,T)

constructor 3: Convert FullInputData to ClustData

ClustForOpt.ClustDataMethod
ClustData(region::String,
                  years::Array{Int,1},
                  K::Int,
                  T::Int,
                  data::Dict{String,Array},
                  weights::Array{Float64},
                  delta_t::Array{Float64,2},
                  k_ids::Array{Int,1};
                  mean::Dict{String,Array}=Dict{String,Array}(),
                  sdv::Dict{String,Array}=Dict{String,Array}()
                  )

constructor 1 for ClustData: provide data as dict

ClustForOpt.ClustDataMergedType
  ClustDataMerged <: TSData

Contains time series data by attribute (e.g. wind, solar, electricity demand) and respective information.

Fields:

  • region::String: optional information to specify the region data belongs to
  • K::Int: number of periods
  • T::Int: time steps per period
  • data::Array: Array of the dimension (time-steps T * length(data_types) x periods K. The first T rows are datatype 1, the second T rows are datatype 2, ...
  • data_type::Array{String}: The data types (attributes) of the data.
  • weights::Array{Float64,2}: 1-dimensional periods K-Array with the absolute weight for each period. E.g. for a year of 365 days, sum(weights)=365
  • mean::Dict{String,Array}: Dictionary with a entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the shift of the mean
  • sdv::Dict{String,Array}: Dictionary with an entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the standard deviation
  • delta_t::Array{Float64,2}: 2-dimensional time-steps T x periods K-Array with the temporal duration Δt for each timestep in [h]
  • k_ids::Array{Int}: 1-dimensional original periods I-Array with the information, which original period is represented by which period K. If an original period is not represented by any period within this ClustData the entry will be 0.
ClustForOpt.ClustDataMergedMethod
ClustDataMerged(region::String,
                    years::Array{Int,1},
                    K::Int,
                    T::Int,
                    data::Array,
                    data_type::Array{String},
                    weights::Array{Float64},
                    k_ids::Array{Int,1};
                    delta_t::Array{Float64,2}=ones(T,K),
                    mean::Dict{String,Array}=Dict{String,Array}(),
                    sdv::Dict{String,Array}=Dict{String,Array}()
                    )

constructor 1: construct ClustDataMerged

ClustForOpt.ClustResultType
ClustResult <: AbstractClustResult

Contains the results from a clustering run: The data, the cost in terms of the clustering algorithm, and a config file describing the clustering method used.

Fields:

  • clust_data::ClustData
  • cost::Float64: Cost of the clustering algorithm
  • config::Dict{String,Any}: Details on the clustering method used
ClustForOpt.ClustResultAllType
ClustResultAll <: AbstractClustResult

Contains the results from a clustering run for all locally converged solutions

Fields:

  • clust_data::ClustData: The best centers, weights, clustids in terms of cost of the clustering algorithm
  • cost::Float64: Cost of the clustering algorithm
  • config::Dict{String,Any}: Details on the clustering method used
  • centers_all::Array{Array{Float64},1}
  • weights_all::Array{Array{Float64},1}
  • clustids_all::Array{Array{Int,1},1}
  • cost_all::Array{Float64,1}
  • iter_all::Array{Int,1}
ClustForOpt.SimpleExtremeValueDescrType
SimpleExtremeValueDescr

Defines a simple extreme day by its characteristics

Fields:

  • data_type::String : Choose one of the attributes from the data you have loaded into ClustData
  • extremum::String : min,max
  • peak_def::String : absolute,integral
  • consecutive_periods::Int: For a single extreme day, set as 1
ClustForOpt.SimpleExtremeValueDescrMethod
SimpleExtremeValueDescr(data_type::String,
                             extremum::String,
                             peak_def::String)

Defines a simple extreme day by its characteristics

Input options:

  • data_type::String : Choose one of the attributes from the data you have loaded into ClustData
  • extremum::String : min,max
  • peak_def::String : absolute,integral
ClustForOpt.calc_SSEMethod
 calc_SSE(data::Array,centers::Array,assignments::Array)

calculates Sum of Squared Errors between cluster representations and the data

ClustForOpt.calc_SSEMethod
calc_SSE(data::Array,centers::Array,assignments::Array)

calculates Sum of Squared Errors between cluster representations and the data

ClustForOpt.combine_timeseries_weather_dataMethod
    combine_timeseries_weather_data(ts::ClustData,ts_weather::ClustData)

-ts is the shorter timeseries with e.g. the demand -ts_weather is the longer timeseries with the weather information The ts-timeseries is repeated to match the number of periods of the longer ts_weather-timeseries. If the number of periods of the ts_weather data isn't a multiple of the ts-timeseries, the necessary number of the ts-timeseries periods 1 to x are attached to the end of the new combined timeseries.

ClustForOpt.extreme_val_outputMethod
extreme_val_output(data::ClustData,
                        extr_val_idcs::Array{Int,1};
                        rep_mod_method="feasibility")

Takes indices as input and returns ClustData struct that contains the extreme vals from within data.

ClustForOpt.extreme_val_outputMethod

extremevaloutput(data::ClustData,extrvalidcs::Array{Int,1};repmodmethod="feasibility")

wrapper function for a single extreme val. Takes indices as input and returns ClustData struct that contains the extreme vals from within data.

ClustForOpt.kmedoids_exactMethod
kmedoids_exact(
 data::Array{Float64},
 nclust::Int,
 _dist::SemiMetric = SqEuclidean(),
 env::Any;
 )

results = kmedoids_exact() data { HOURS,DAYS } Performs the exact kmedoids algorithm as in Kotzur et al, 2017 optimizer=Gurobi.Optimizer

ClustForOpt.load_timeseries_dataMethod
function load_timeseries_data(data_path::String;
                          region::String="none",
                          T::Int=24,
                          years::Array{Int,1}=[2016],
                          att::Array{String,1}=Array{String,1}())

Return all time series as ClustData struct that are stored as csv files in the specified path.

  • Loads *.csv files in the folder or the file data_path
  • Loads all attributes (all *.csv files) if the att-Array is empty or only the files specified in att
  • The *.csv files shall have the following structure and must have the same length:
TimestampYear[column names...]
[iterator][year][values]
  • The first column of a .csv file should be called Timestamp if it contains a time iterator
  • The second column should be called Year and contains the corresponding year
  • Each other column should contain the time series data. For one node systems, only one column is used; for an N-node system, N columns need to be used. In an N-node system, each column specifies time series data at a specific geolocation.
  • Returns time series as ClustData struct
  • The .data field of the ClustData struct is a Dictionary where each column in [file name].csv file is the key (called "[file name]-[column name]"). file name should correspond to the attribute name, and column name should correspond to the node name.

Optional inputs to load_timeseries_data:

  • region-region descriptor
  • T- Number of Segments
  • years::Array{Int,1}= The years to be selected from the csv file as specified in years column
  • att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.
ClustForOpt.load_timeseries_dataMethod
function load_timeseries_data(existing_data::Symbol;
                          region::String="none",
                          T::Int=24,
                          years::Array{Int,1}=[2016],
                          att::Array{String,1}=Array{String,1}())

Return time series of example data sets as ClustData struct.

The choice of example data set is given by e.g. existing_data=:CEP-GER1. Example data sets are:

  • :DAM_CA : Hourly Day Ahead Market Electricity prices for California-Stanford 2015
  • :DAM_GER : Hourly Day Ahead Market Electricity prices for Germany 2015
  • :CEP_GER1 : Hourly Wind, Solar, Demand data Germany one node
  • :CEP_GER18: Hourly Wind, Solar, Demand data Germany 18 nodes

Optional inputs to load_timeseries_data:

  • region-region descriptor
  • T- Number of Segments
  • years::Array{Int,1}= The years to be selected from the csv file as specified in years column
  • att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.
ClustForOpt.representation_modificationMethod
representation_modification(extr_vals::ClustData,clust_data::ClustData)

Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.representation_modificationMethod
representation_modification(full_data::ClustData,clust_data::ClustData,extr_val_idcs::Array{Int,1};rep_mod_method::String="feasibility")

Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.representation_modificationMethod
representation_modification(full_data::ClustData,clust_data::ClustData,extr_val_idcs::Int;rep_mod_method::String="feasibility")

wrapper function for a single extreme val. Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.representation_modificationMethod
representation_modification(extr_vals::ClustData,clust_data::ClustData)

Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.resize_medoidsMethod
resize_medoids(data::Array,centers::Array,weights::Array)

This is the DEFAULT resize medoids function Takes in centers (typically medoids) and normalizes them such that the yearly average of the clustered data is the same as the yearly average of the original data.

ClustForOpt.resize_medoidsMethod
 resize_medoids(data::Array,centers::Array,weights::Array)

This is the DEFAULT resize medoids function Takes in centers (typically medoids) and normalizes them such that the yearly average of the clustered data is the same as the yearly average of the original data.

ClustForOpt.run_battery_optMethod
run_battery_opt(data::ClustData)

operational battery storage optimization problem runs every day seperately and adds results in the end

ClustForOpt.run_clustMethod
run_clust(
  data::ClustData,
  n_clust_ar::Array{Int,1};
  norm_op::String="zscore",
  norm_scope::String="full",
  method::String="kmeans",
  representation::String="centroid",
  n_init::Int=100,
  iterations::Int=300,
  save::String="",
  kwargs...)

Run multiple number of clusters k and return an array of results. This function is a wrapper function around run_clust().

ClustForOpt.run_clustMethod
run_clust(data::ClustData;
  norm_op::String="zscore",
  norm_scope::String="full",
  method::String="kmeans",
  representation::String="centroid",
  n_clust::Int=5,
  n_seg::Int=data.T,
  n_init::Int=1000,
  iterations::Int=300,
  attribute_weights::Dict{String,Float64}=Dict{String,Float64}(),
  save::String="",#QUESTION dead?
  get_all_clust_results::Bool=false,
  kwargs...)

Take input data data of dimensionality N x T and cluster into data of dimensionality K x T.

The following combinations of method and representation are supported by run_clust:

Namemethodrepresentationcomment
k-means clustering<kmeans><centroid>-
k-means clustering with medoid representation<kmeans><medoid>-
k-medoids clustering (partitional)<kmedoids><medoid>-
k-medoids clustering (exact)<kmedoids_exact><medoid>requires Gurobi and the additional keyword argument kmexact_optimizer. See [examples] folder for example use. Set n_init=1
hierarchical clustering with centroid representation<hierarchical><centroid>set n_init=1
hierarchical clustering with medoid representation<hierarchical><medoid>set n_init=1

The other optional inputs are:

Keywordoptionscomment
norm_opzscoreNormalization operation. 0-1 not yet implemented
norm_scopefull,sequence,hourlyNormalization scope. The default (full) is used in most of the current literature.
n_cluste.g. 5Number of clusters that you want to obtain
n_sege.g. 10Number of segments per period. Not yet implemented, keep as default value.
n_inite.g. 1000Number of initializations of locally converging clustering algorithms. 10000 often yields very stable results.
iterationse.g. 300Internal parameter of the partitional clustering algorithms.
attribute_weightse.g. Dict("wind-germany"=>3,"solar-germany"=>1,"el_demand-germany"=>5)weights the respective attributes when clustering. In this example, demand and wind are deemed more important than solar.
savefalseSave clustered data as csv or jld2 file. Not yet implemented.
get_all_clust_resultstrue,falsefalse gives a ClustData struct with only the best locally converged solution in terms of clustering measure. true gives a ClustDataAll struct as output, with all locally converged solutions.
kwargse.g. kmexact_optimizeroptional keyword arguments that are required for specific methods, for example k-medoids exact.
ClustForOpt.run_gas_optMethod
run_gas_opt(data::ClustData)

operational gas turbine optimization problem runs every day seperately and adds results in the end

ClustForOpt.sakoe_chiba_bandMethod
 sakoe_chiba_band(r::Int,l::Int)

calculates the minimum and maximum allowed indices for a lxl windowed matrix for the sakoe chiba band (see Sakoe Chiba, 1978). Input: radius r, such that |i(k)-j(k)| <= r length l: dimension 2 of the matrix

ClustForOpt.simple_extr_val_selMethod
simple_extr_val_sel(data::ClustData,
                    extreme_value_descr_ar::Array{SimpleExtremeValueDescr,1};
                    rep_mod_method::String="feasibility")

Selects simple extreme values and returns modified data, extreme values, and the corresponding indices.

Inputs options for rep_mod_method:

  • rep_mod_method::String : feasibility,append
ClustForOpt.simple_extr_val_selMethod
simple_extr_val_sel(data::ClustData,
                    extreme_value_descr::SimpleExtremeValueDescr;
                    rep_mod_method::String="feasibility")

Wrapper function for only one simple extreme value. Selects simple extreme values and returns modified data, extreme values, and the corresponding indices.

ClustForOpt.sort_centersMethod
sort_centers(centers::Array,weights::Array)
  • centers: hours x days e.g.[24x9]
  • weights: days [e.g. 9], unsorted

sorts the centers by weights from largest to smallest

ClustForOpt.undo_z_normalizeMethod
undo_z_normalize(data_norm, mn, sdv; idx=[])

undo z-normalization data with mean and sdv by hour normalized data: input format: (1st dimension: 24 hours, 2nd dimension: # of days) hourlymean ; 24 hour vector with hourly means hourlysdv; 24 hour vector with hourly standard deviations

ClustForOpt.undo_z_normalizeMethod
undo_z_normalize(data_norm_merged::Array,mn::Dict{String,Array},sdv::Dict{String,Array};idx=[])

provide idx should usually be done as default within function call in order to enable sequence-based normalization, even though optional.

ClustForOpt.z_normalizeMethod
 z_normalize(data::Array;scope="full")

z-normalize data with mean and sdv by hour data: input format: (1st dimension: 24 hours, 2nd dimension: # of days) scope: "full": one mean and sdv for the full data set; "hourly": univariate scaling: each hour is scaled seperately; "sequence": sequence based scaling

ClustForOpt.add_timeseries_data!Method
add_timeseries_data!(dt::Dict{String,Array}, data::DataFrame; K::Int=0, T::Int=24, years::Array{Int,1}=[2016])

selects first the years and second the data_points so that their number is a multiple of T and same with the other timeseries

ClustForOpt.add_timeseries_data!Method
add_timeseries_data!(dt::Dict{String,Array}, data::DataFrame; K::Int=0, T::Int=24, years::Array{Int,1}=[2016])

selects first the years and second the data_points so that their number is a multiple of T and same with the other timeseries

ClustForOpt.attribute_weightingMethod

function attributeweighting(data::ClustData,attributeweights::Dict{String,Float64})

apply the different attribute weights based on the dictionary entry for each tech or exact name

ClustForOpt.calc_centroidsMethod
calc_centroids(data::Array,assignments::Array)

Given the data and cluster assignments, this function finds the centroid of the respective clusters.

ClustForOpt.calc_medoidsMethod
 calc_medoids(data::Array,assignments::Array)

Given the data and cluster assignments, this function finds the medoids that are closest to the cluster center.

ClustForOpt.calc_weightsMethod
calc_weights(clustids::Array{Int}, n_clust::Int)

Calculates weights for clusters, based on clustids that are assigned to a certain cluster. The weights are absolute: weights[i]>=1

ClustForOpt.check_kw_argsMethod
check_kw_args(region,opt_problems,norm_op,norm_scope,method,representation)

checks if the arguments supplied for run_clust are supported

ClustForOpt.find_column_nameMethod
    find_column_name(df::DataFrame, name_itr::Arrray{Symbol,1})

find wich of the supported name in name_itr is used as an

ClustForOpt.get_mean_dataMethod
  get_mean_data(data::Array, clustids::Array{Int,1})

Calculate mean of data: The number of columns is kept the same, mean is calculated for aggregated columns and the same in all with same clustid

ClustForOpt.input_data_modificationMethod
input_data_modification(data::ClustData,
                             extr_val_idcs::Array{Int,1})

Returns ClustData structs with extreme vals and with remaining input data [data-extreme_vals]. Gives extreme vals the weight that they had in data. This function is needed for the append method for representation modification ! the k-ids have to be monoton increasing - don't modify clustered data !

ClustForOpt.input_data_modificationMethod
input_data_modification(data::ClustData,extr_val_idcs::Int)

wrapper function for a single extreme val. returns ClustData structs with extreme vals and with remaining input data [data-extreme_vals]. Gives extreme vals the weight that they had in data.

ClustForOpt.intraperiod_segmentationMethod
  intraperiod_segmentation(data_merged::ClustDataMerged;n_seg::Int=24,iterations::Int=300,norm_scope::String="full")

!!! Not yet proven implementation of segmentation introduced by Bahl et al. 2018

ClustForOpt.merge_clustids!Method
  merge_clustids!(clustids::Array{Int,1},index::Int)

Calculate the new clustids by merging the cluster of the index provided with the cluster of index+1

ClustForOpt.run_clust_hierarchicalMethod
run_clust_hierarchical(
  data_norm::ClustDataMerged,
  n_clust::Int,
  iterations::Int;
  _dist::SemiMetric = SqEuclidean()
)

Helper function to run runclusthierarchicalcentroids and runclusthierarchicalmedoid

ClustForOpt.run_clust_methodMethod
run_clust_method(data::ClustData;
              norm_op::String="zscore",
              norm_scope::String="full",
              method::String="kmeans",
              representation::String="centroid",
              n_clust::Int=5,
              n_seg::Int=data.T,
              n_init::Int=100,
              iterations::Int=300,
              orig_k_ids::Array{Int,1}=Array{Int,1}(),
              kwargs...)

method: "kmeans","kmedoids","kmedoids_exact","hierarchical" representation: "centroid","medoid"

ClustForOpt.run_clust_segmentationMethod
  run_clust_segmentation(period::Array{Float64,2};n_seg::Int=24,iterations::Int=300,norm_scope::String="full")

!!! Not yet proven implementation of segmentation introduced by Bahl et al. 2018

ClustForOpt.run_pure_clustMethod
run_pure_clust(data::ClustData; norm_op::String="zscore", norm_scope::String="full", method::String="kmeans", representation::String="centroid", n_clust_1::Int=5, n_clust_2::Int=3, n_seg::Int=data.T, n_init::Int=100, iterations::Int=300, attribute_weights::Dict{String,Float64}=Dict{String,Float64}(), clust::Array{String,1}=Array{String,1}(), get_all_clust_results::Bool=false, kwargs...)

Replace the original timeseries of the attributes in clust with their clustered value

ClustForOpt.simple_extr_val_identMethod
simple_extr_val_ident(data::ClustData,
                           extreme_value_descr_ar::Array{SimpleExtremeValueDescr,1})

Identifies multiple simple extreme values from the data and returns array of column indices of extreme value within data

  • data_type: any attribute from the attributes contained within data
  • extremum: "min" or "max"
  • peak_def: "absolute" or "integral"
ClustForOpt.simple_extr_val_identMethod
simple_extr_val_ident(data::ClustData,
                           extreme_value_descr::SimpleExtremeValueDescr)

Wrapper function for only one simple extreme value: identifies a single simple extreme value from the data and returns column index of extreme value

  • data_type: any attribute from the attributes contained within data
  • extremum: "min" or "max"
  • peak_def: "absolute" or "integral"
  • consecutive_periods: number of consecutive_periods combined to analyze
ClustForOpt.simple_extr_val_identMethod
simple_extr_val_ident(clust_data::ClustData,
                           data_type::String;
                           extremum::String="max",
                           peak_def::String="absolute",
                           consecutive_periods::Int=1)

Identifies a single simple extreme period from the data and returns column index of extreme period

  • data_type: any attribute from the attributes contained within data
  • extremum: "min" or "max"
  • peak_def: "absolute" or "integral"
  • consecutive_periods: The number of consecutive periods that are summed to identify a maximum or minimum. A rolling approach is used: E.g. for a value of consecutive_periods=2: 1) 1st & 2nd periods summed, 2) 2nd & 3rd period summed, 3) 3rd & 4th ... The min/max of the 1), 2), 3)... is determined and the two periods indices, where the min/max were identified, are returned