Docstrings · ClustForOpt.jl

ClustForOpt.ClustData — Type

  ClustData <: TSData

Contains time series data by attribute (e.g. wind, solar, electricity demand) and respective information.

Fields:

region::String: optional information to specify the region data belongs to
K::Int: number of periods
T::Int: time steps per period
data::Dict{String,Array}: Dictionary with an entry for each attribute [file name (attribute: e.g technology)]-[column name (node: e.g. location)], Each entry of the dictionary is a 2-dimensional time-steps T x periods K-Array holding the data
weights::Array{Float64,2}: 1-dimensional periods K-Array with the absolute weight for each period. The weight of a period corresponds to the number of days it representes. E.g. for a year of 365 days, sum(weights)=365
mean::Dict{String,Array}: Dictionary with a entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the shift of the mean. This is used internally for normalization.
sdv::Dict{String,Array}: Dictionary with an entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the standard deviation. This is used internally for normalization.
delta_t::Array{Float64,2}: 2-dimensional time-steps T x periods K-Array with the temporal duration Δt for each timestep. The default is that all timesteps have the same length.
k_ids::Array{Int}: 1-dimensional original periods I-Array with the information, which original period is represented by which period K. E.g. if the data is a year of 365 periods, the array has length 365. If an original period is not represented by any period within this ClustData the entry will be 0.

ClustForOpt.ClustData — Method

ClustData(data::ClustDataMerged)

constructor 2: Convert ClustDataMerged to ClustData

ClustForOpt.ClustData — Method

ClustData(data::FullInputData,K,T)

constructor 3: Convert FullInputData to ClustData

ClustForOpt.ClustData — Method

ClustData(region::String,
                  years::Array{Int,1},
                  K::Int,
                  T::Int,
                  data::Dict{String,Array},
                  weights::Array{Float64},
                  delta_t::Array{Float64,2},
                  k_ids::Array{Int,1};
                  mean::Dict{String,Array}=Dict{String,Array}(),
                  sdv::Dict{String,Array}=Dict{String,Array}()
                  )

constructor 1 for ClustData: provide data as dict

ClustForOpt.ClustDataMerged — Type

  ClustDataMerged <: TSData

Contains time series data by attribute (e.g. wind, solar, electricity demand) and respective information.

Fields:

region::String: optional information to specify the region data belongs to
K::Int: number of periods
T::Int: time steps per period
data::Array: Array of the dimension (time-steps T * length(data_types) x periods K. The first T rows are datatype 1, the second T rows are datatype 2, ...
data_type::Array{String}: The data types (attributes) of the data.
weights::Array{Float64,2}: 1-dimensional periods K-Array with the absolute weight for each period. E.g. for a year of 365 days, sum(weights)=365
mean::Dict{String,Array}: Dictionary with a entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the shift of the mean
sdv::Dict{String,Array}: Dictionary with an entry for each attribute [file name (e.g technology)]-[column name (e.g. location)], Each entry of the dictionary is a 1-dimensional periods K-Array holding the standard deviation
delta_t::Array{Float64,2}: 2-dimensional time-steps T x periods K-Array with the temporal duration Δt for each timestep in [h]
k_ids::Array{Int}: 1-dimensional original periods I-Array with the information, which original period is represented by which period K. If an original period is not represented by any period within this ClustData the entry will be 0.

ClustForOpt.ClustDataMerged — Method

ClustDataMerged(data::ClustData)

constructor 2: convert ClustData into merged format

ClustForOpt.ClustDataMerged — Method

ClustDataMerged(region::String,
                    years::Array{Int,1},
                    K::Int,
                    T::Int,
                    data::Array,
                    data_type::Array{String},
                    weights::Array{Float64},
                    k_ids::Array{Int,1};
                    delta_t::Array{Float64,2}=ones(T,K),
                    mean::Dict{String,Array}=Dict{String,Array}(),
                    sdv::Dict{String,Array}=Dict{String,Array}()
                    )

constructor 1: construct ClustDataMerged

ClustForOpt.ClustResult — Type

ClustResult <: AbstractClustResult

Contains the results from a clustering run: The data, the cost in terms of the clustering algorithm, and a config file describing the clustering method used.

Fields:

clust_data::ClustData
cost::Float64: Cost of the clustering algorithm
config::Dict{String,Any}: Details on the clustering method used

ClustForOpt.ClustResultAll — Type

ClustResultAll <: AbstractClustResult

Contains the results from a clustering run for all locally converged solutions

Fields:

clust_data::ClustData: The best centers, weights, clustids in terms of cost of the clustering algorithm
cost::Float64: Cost of the clustering algorithm
config::Dict{String,Any}: Details on the clustering method used
centers_all::Array{Array{Float64},1}
weights_all::Array{Array{Float64},1}
clustids_all::Array{Array{Int,1},1}
cost_all::Array{Float64,1}
iter_all::Array{Int,1}

ClustForOpt.SimpleExtremeValueDescr — Type

SimpleExtremeValueDescr

Defines a simple extreme day by its characteristics

Fields:

data_type::String : Choose one of the attributes from the data you have loaded into ClustData
extremum::String : min,max
peak_def::String : absolute,integral
consecutive_periods::Int: For a single extreme day, set as 1

ClustForOpt.SimpleExtremeValueDescr — Method

SimpleExtremeValueDescr(data_type::String,
                             extremum::String,
                             peak_def::String)

Defines a simple extreme day by its characteristics

Input options:

data_type::String : Choose one of the attributes from the data you have loaded into ClustData
extremum::String : min,max
peak_def::String : absolute,integral

ClustForOpt.calc_SSE — Method

 calc_SSE(data::Array,centers::Array,assignments::Array)

calculates Sum of Squared Errors between cluster representations and the data

ClustForOpt.calc_SSE — Method

calc_SSE(data::Array,centers::Array,assignments::Array)

calculates Sum of Squared Errors between cluster representations and the data

ClustForOpt.combine_timeseries_weather_data — Method

    combine_timeseries_weather_data(ts::ClustData,ts_weather::ClustData)

-ts is the shorter timeseries with e.g. the demand -ts_weather is the longer timeseries with the weather information The ts-timeseries is repeated to match the number of periods of the longer ts_weather-timeseries. If the number of periods of the ts_weather data isn't a multiple of the ts-timeseries, the necessary number of the ts-timeseries periods 1 to x are attached to the end of the new combined timeseries.

ClustForOpt.extreme_val_output — Method

extreme_val_output(data::ClustData,
                        extr_val_idcs::Array{Int,1};
                        rep_mod_method="feasibility")

Takes indices as input and returns ClustData struct that contains the extreme vals from within data.

ClustForOpt.extreme_val_output — Method

extremevaloutput(data::ClustData,extrvalidcs::Array{Int,1};repmodmethod="feasibility")

wrapper function for a single extreme val. Takes indices as input and returns ClustData struct that contains the extreme vals from within data.

ClustForOpt.get_sup_kw_args — Method

get_sup_kw_args

Returns supported keyword arguments for clustering function run_clust()

ClustForOpt.kmedoids_exact — Method

kmedoids_exact(
 data::Array{Float64},
 nclust::Int,
 _dist::SemiMetric = SqEuclidean(),
 env::Any;
 )

results = kmedoids_exact() data { HOURS,DAYS } Performs the exact kmedoids algorithm as in Kotzur et al, 2017 optimizer=Gurobi.Optimizer

ClustForOpt.load_timeseries_data — Method

function load_timeseries_data(data_path::String;
                          region::String="none",
                          T::Int=24,
                          years::Array{Int,1}=[2016],
                          att::Array{String,1}=Array{String,1}())

Return all time series as ClustData struct that are stored as csv files in the specified path.

Loads *.csv files in the folder or the file data_path
Loads all attributes (all *.csv files) if the att-Array is empty or only the files specified in att
The *.csv files shall have the following structure and must have the same length:

Timestamp	Year	[column names...]
[iterator]	[year]	[values]

The first column of a .csv file should be called Timestamp if it contains a time iterator
The second column should be called Year and contains the corresponding year
Each other column should contain the time series data. For one node systems, only one column is used; for an N-node system, N columns need to be used. In an N-node system, each column specifies time series data at a specific geolocation.
Returns time series as ClustData struct
The .data field of the ClustData struct is a Dictionary where each column in [file name].csv file is the key (called "[file name]-[column name]"). file name should correspond to the attribute name, and column name should correspond to the node name.

Optional inputs to load_timeseries_data:

region-region descriptor
T- Number of Segments
years::Array{Int,1}= The years to be selected from the csv file as specified in years column
att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.

ClustForOpt.load_timeseries_data — Method

function load_timeseries_data(existing_data::Symbol;
                          region::String="none",
                          T::Int=24,
                          years::Array{Int,1}=[2016],
                          att::Array{String,1}=Array{String,1}())

Return time series of example data sets as ClustData struct.

The choice of example data set is given by e.g. existing_data=:CEP-GER1. Example data sets are:

:DAM_CA : Hourly Day Ahead Market Electricity prices for California-Stanford 2015
:DAM_GER : Hourly Day Ahead Market Electricity prices for Germany 2015
:CEP_GER1 : Hourly Wind, Solar, Demand data Germany one node
:CEP_GER18: Hourly Wind, Solar, Demand data Germany 18 nodes

Optional inputs to load_timeseries_data:

region-region descriptor
T- Number of Segments
years::Array{Int,1}= The years to be selected from the csv file as specified in years column
att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.

ClustForOpt.representation_modification — Method

representation_modification(extr_vals::ClustData,clust_data::ClustData)

Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.representation_modification — Method

representation_modification(full_data::ClustData,clust_data::ClustData,extr_val_idcs::Array{Int,1};rep_mod_method::String="feasibility")

Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.representation_modification — Method

representation_modification(full_data::ClustData,clust_data::ClustData,extr_val_idcs::Int;rep_mod_method::String="feasibility")

wrapper function for a single extreme val. Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.representation_modification — Method

representation_modification(extr_vals::ClustData,clust_data::ClustData)

Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod

ClustForOpt.resize_medoids — Method

resize_medoids(data::Array,centers::Array,weights::Array)

This is the DEFAULT resize medoids function Takes in centers (typically medoids) and normalizes them such that the yearly average of the clustered data is the same as the yearly average of the original data.

ClustForOpt.resize_medoids — Method

 resize_medoids(data::Array,centers::Array,weights::Array)

ClustForOpt.run_battery_opt — Method

run_battery_opt(data::ClustData)

operational battery storage optimization problem runs every day seperately and adds results in the end

ClustForOpt.run_clust — Method

run_clust(
  data::ClustData,
  n_clust_ar::Array{Int,1};
  norm_op::String="zscore",
  norm_scope::String="full",
  method::String="kmeans",
  representation::String="centroid",
  n_init::Int=100,
  iterations::Int=300,
  save::String="",
  kwargs...)

Run multiple number of clusters k and return an array of results. This function is a wrapper function around run_clust().

ClustForOpt.run_clust — Method

run_clust(data::ClustData;
  norm_op::String="zscore",
  norm_scope::String="full",
  method::String="kmeans",
  representation::String="centroid",
  n_clust::Int=5,
  n_seg::Int=data.T,
  n_init::Int=1000,
  iterations::Int=300,
  attribute_weights::Dict{String,Float64}=Dict{String,Float64}(),
  save::String="",#QUESTION dead?
  get_all_clust_results::Bool=false,
  kwargs...)

Take input data data of dimensionality N x T and cluster into data of dimensionality K x T.

The following combinations of method and representation are supported by run_clust:

Name	method	representation	comment
k-means clustering	`<kmeans>`	`<centroid>`	-
k-means clustering with medoid representation	`<kmeans>`	`<medoid>`	-
k-medoids clustering (partitional)	`<kmedoids>`	`<medoid>`	-
k-medoids clustering (exact)	`<kmedoids_exact>`	`<medoid>`	requires Gurobi and the additional keyword argument `kmexact_optimizer`. See [examples] folder for example use. Set `n_init=1`
hierarchical clustering with centroid representation	`<hierarchical>`	`<centroid>`	set `n_init=1`
hierarchical clustering with medoid representation	`<hierarchical>`	`<medoid>`	set `n_init=1`

The other optional inputs are:

Keyword	options	comment
`norm_op`	`zscore`	Normalization operation. `0-1` not yet implemented
`norm_scope`	`full`,`sequence`,`hourly`	Normalization scope. The default (`full`) is used in most of the current literature.
`n_clust`	e.g. `5`	Number of clusters that you want to obtain
`n_seg`	e.g. `10`	Number of segments per period. Not yet implemented, keep as default value.
`n_init`	e.g. `1000`	Number of initializations of locally converging clustering algorithms. `10000` often yields very stable results.
`iterations`	e.g. `300`	Internal parameter of the partitional clustering algorithms.
`attribute_weights`	e.g. Dict("wind-germany"=>3,"solar-germany"=>1,"el_demand-germany"=>5)	weights the respective attributes when clustering. In this example, demand and wind are deemed more important than solar.
`save`	`false`	Save clustered data as csv or jld2 file. Not yet implemented.
`get_all_clust_results`	`true`,`false`	`false` gives a `ClustData` struct with only the best locally converged solution in terms of clustering measure. `true` gives a `ClustDataAll` struct as output, with all locally converged solutions.
`kwargs`	e.g. `kmexact_optimizer`	optional keyword arguments that are required for specific methods, for example k-medoids exact.

ClustForOpt.run_gas_opt — Method

run_gas_opt(data::ClustData)

operational gas turbine optimization problem runs every day seperately and adds results in the end

ClustForOpt.sakoe_chiba_band — Method

 sakoe_chiba_band(r::Int,l::Int)

calculates the minimum and maximum allowed indices for a lxl windowed matrix for the sakoe chiba band (see Sakoe Chiba, 1978). Input: radius r, such that |i(k)-j(k)| <= r length l: dimension 2 of the matrix

ClustForOpt.simple_extr_val_sel — Method

simple_extr_val_sel(data::ClustData,
                    extreme_value_descr_ar::Array{SimpleExtremeValueDescr,1};
                    rep_mod_method::String="feasibility")

Selects simple extreme values and returns modified data, extreme values, and the corresponding indices.

Inputs options for rep_mod_method:

rep_mod_method::String : feasibility,append

ClustForOpt.simple_extr_val_sel — Method

simple_extr_val_sel(data::ClustData,
                    extreme_value_descr::SimpleExtremeValueDescr;
                    rep_mod_method::String="feasibility")

Wrapper function for only one simple extreme value. Selects simple extreme values and returns modified data, extreme values, and the corresponding indices.

ClustForOpt.sort_centers — Method

sort_centers(centers::Array,weights::Array)

centers: hours x days e.g.[24x9]
weights: days [e.g. 9], unsorted

sorts the centers by weights from largest to smallest

ClustForOpt.undo_z_normalize — Method

undo_z_normalize(data_norm, mn, sdv; idx=[])

undo z-normalization data with mean and sdv by hour normalized data: input format: (1st dimension: 24 hours, 2nd dimension: # of days) hourlymean ; 24 hour vector with hourly means hourlysdv; 24 hour vector with hourly standard deviations

ClustForOpt.undo_z_normalize — Method

undo_z_normalize(data_norm_merged::Array,mn::Dict{String,Array},sdv::Dict{String,Array};idx=[])

provide idx should usually be done as default within function call in order to enable sequence-based normalization, even though optional.

ClustForOpt.z_normalize — Method

 z_normalize(data::Array;scope="full")

z-normalize data with mean and sdv by hour data: input format: (1st dimension: 24 hours, 2nd dimension: # of days) scope: "full": one mean and sdv for the full data set; "hourly": univariate scaling: each hour is scaled seperately; "sequence": sequence based scaling

ClustForOpt.z_normalize — Method

z_normalize(data::ClustData;scope="full")

scope: "full", "sequence", "hourly"

ClustForOpt.kmedoidsResult — Type

Holds results of kmedoids run

ClustForOpt.add_timeseries_data! — Method

add_timeseries_data!(dt::Dict{String,Array}, data::DataFrame; K::Int=0, T::Int=24, years::Array{Int,1}=[2016])

selects first the years and second the data_points so that their number is a multiple of T and same with the other timeseries

ClustForOpt.add_timeseries_data! — Method

add_timeseries_data!(dt::Dict{String,Array}, data::DataFrame; K::Int=0, T::Int=24, years::Array{Int,1}=[2016])

selects first the years and second the data_points so that their number is a multiple of T and same with the other timeseries

ClustForOpt.attribute_weighting — Method

function attributeweighting(data::ClustData,attributeweights::Dict{String,Float64})

apply the different attribute weights based on the dictionary entry for each tech or exact name

ClustForOpt.calc_centroids — Method

calc_centroids(data::Array,assignments::Array)

Given the data and cluster assignments, this function finds the centroid of the respective clusters.

ClustForOpt.calc_medoids — Method

 calc_medoids(data::Array,assignments::Array)

Given the data and cluster assignments, this function finds the medoids that are closest to the cluster center.

ClustForOpt.calc_weights — Method

calc_weights(clustids::Array{Int}, n_clust::Int)

Calculates weights for clusters, based on clustids that are assigned to a certain cluster. The weights are absolute: weights[i]>=1

ClustForOpt.check_kw_args — Method

check_kw_args(region,opt_problems,norm_op,norm_scope,method,representation)

checks if the arguments supplied for run_clust are supported

ClustForOpt.find_column_name — Method

    find_column_name(df::DataFrame, name_itr::Arrray{Symbol,1})

find wich of the supported name in name_itr is used as an

ClustForOpt.get_mean_data — Method

  get_mean_data(data::Array, clustids::Array{Int,1})

Calculate mean of data: The number of columns is kept the same, mean is calculated for aggregated columns and the same in all with same clustid

ClustForOpt.input_data_modification — Method

input_data_modification(data::ClustData,
                             extr_val_idcs::Array{Int,1})

Returns ClustData structs with extreme vals and with remaining input data [data-extreme_vals]. Gives extreme vals the weight that they had in data. This function is needed for the append method for representation modification ! the k-ids have to be monoton increasing - don't modify clustered data !

ClustForOpt.input_data_modification — Method

input_data_modification(data::ClustData,extr_val_idcs::Int)

wrapper function for a single extreme val. returns ClustData structs with extreme vals and with remaining input data [data-extreme_vals]. Gives extreme vals the weight that they had in data.

ClustForOpt.intraperiod_segmentation — Method

  intraperiod_segmentation(data_merged::ClustDataMerged;n_seg::Int=24,iterations::Int=300,norm_scope::String="full")

!!! Not yet proven implementation of segmentation introduced by Bahl et al. 2018

ClustForOpt.merge_clustids! — Method

  merge_clustids!(clustids::Array{Int,1},index::Int)

Calculate the new clustids by merging the cluster of the index provided with the cluster of index+1

ClustForOpt.run_clust_hierarchical — Method

run_clust_hierarchical(
  data_norm::ClustDataMerged,
  n_clust::Int,
  iterations::Int;
  _dist::SemiMetric = SqEuclidean()
)

Helper function to run runclusthierarchicalcentroids and runclusthierarchicalmedoid

ClustForOpt.run_clust_hierarchical_centroid — Method

run_clust_hierarchical_centroid(
  data_norm::ClustDataMerged,
  n_clust::Int,
  iterations::Int;
  _dist::SemiMetric = SqEuclidean()
)

ClustForOpt.run_clust_hierarchical_medoid — Method

run_clust_hierarchical_medoid(
  data_norm::ClustDataMerged,
  n_clust::Int,
  iterations::Int;
  _dist::SemiMetric = SqEuclidean()
)

ClustForOpt.run_clust_hierarchical_partitional — Method

  run_clust_hierarchical_partitional(data::Array, n_seg::Int)

!!! Not yet proven Usees provided data and number of segments to aggregate them together

ClustForOpt.run_clust_kmeans_centroid — Method

run_clust_kmeans_centroid(data_norm::ClustDataMerged,n_clust::Int,iterations::Int)

ClustForOpt.run_clust_kmeans_medoid — Method

run_clust_kmeans_medoid(
  data_norm::ClustDataMerged,
  n_clust::Int,
  iterations::Int
)

ClustForOpt.run_clust_kmedoids_exact_medoid — Method

run_clust_kmedoids_exact_medoid(
  data_norm::ClustDataMerged,
  n_clust::Int,
  iterations::Int;
  gurobi_env=0
)

ClustForOpt.run_clust_kmedoids_medoid — Method

run_clust_kmedoids_medoid(
  data_norm::ClustDataMerged,
  n_clust::Int,
  iterations::Int
)

ClustForOpt.run_clust_method — Method

run_clust_method(data::ClustData;
              norm_op::String="zscore",
              norm_scope::String="full",
              method::String="kmeans",
              representation::String="centroid",
              n_clust::Int=5,
              n_seg::Int=data.T,
              n_init::Int=100,
              iterations::Int=300,
              orig_k_ids::Array{Int,1}=Array{Int,1}(),
              kwargs...)

method: "kmeans","kmedoids","kmedoids_exact","hierarchical" representation: "centroid","medoid"

ClustForOpt.run_clust_segmentation — Method

  run_clust_segmentation(period::Array{Float64,2};n_seg::Int=24,iterations::Int=300,norm_scope::String="full")

!!! Not yet proven implementation of segmentation introduced by Bahl et al. 2018

ClustForOpt.run_pure_clust — Method

run_pure_clust(data::ClustData; norm_op::String="zscore", norm_scope::String="full", method::String="kmeans", representation::String="centroid", n_clust_1::Int=5, n_clust_2::Int=3, n_seg::Int=data.T, n_init::Int=100, iterations::Int=300, attribute_weights::Dict{String,Float64}=Dict{String,Float64}(), clust::Array{String,1}=Array{String,1}(), get_all_clust_results::Bool=false, kwargs...)

Replace the original timeseries of the attributes in clust with their clustered value

ClustForOpt.set_clust_config — Method

set_clust_config(;kwargs...)

Add kwargs to a new Dictionary with the variables as entries

ClustForOpt.simple_extr_val_ident — Method

simple_extr_val_ident(data::ClustData,
                           extreme_value_descr_ar::Array{SimpleExtremeValueDescr,1})

Identifies multiple simple extreme values from the data and returns array of column indices of extreme value within data

data_type: any attribute from the attributes contained within data
extremum: "min" or "max"
peak_def: "absolute" or "integral"

ClustForOpt.simple_extr_val_ident — Method

simple_extr_val_ident(data::ClustData,
                           extreme_value_descr::SimpleExtremeValueDescr)

Wrapper function for only one simple extreme value: identifies a single simple extreme value from the data and returns column index of extreme value

data_type: any attribute from the attributes contained within data
extremum: "min" or "max"
peak_def: "absolute" or "integral"
consecutive_periods: number of consecutive_periods combined to analyze

ClustForOpt.simple_extr_val_ident — Method

simple_extr_val_ident(clust_data::ClustData,
                           data_type::String;
                           extremum::String="max",
                           peak_def::String="absolute",
                           consecutive_periods::Int=1)

Identifies a single simple extreme period from the data and returns column index of extreme period

data_type: any attribute from the attributes contained within data
extremum: "min" or "max"
peak_def: "absolute" or "integral"
consecutive_periods: The number of consecutive periods that are summed to identify a maximum or minimum. A rolling approach is used: E.g. for a value of consecutive_periods=2: 1) 1st & 2nd periods summed, 2) 2nd & 3rd period summed, 3) 3rd & 4th ... The min/max of the 1), 2), 3)... is determined and the two periods indices, where the min/max were identified, are returned