ClustForOpt.ClustData
— Type ClustData <: TSData
Contains time series data by attribute (e.g. wind, solar, electricity demand) and respective information.
Fields:
- region::String: optional information to specify the region data belongs to
- K::Int: number of periods
- T::Int: time steps per period
- data::Dict{String,Array}: Dictionary with an entry for each attribute
[file name (attribute: e.g technology)]-[column name (node: e.g. location)]
, Each entry of the dictionary is a 2-dimensionaltime-steps T x periods K
-Array holding the data - weights::Array{Float64,2}: 1-dimensional
periods K
-Array with the absolute weight for each period. The weight of a period corresponds to the number of days it representes. E.g. for a year of 365 days, sum(weights)=365 - mean::Dict{String,Array}: Dictionary with a entry for each attribute
[file name (e.g technology)]-[column name (e.g. location)]
, Each entry of the dictionary is a 1-dimensionalperiods K
-Array holding the shift of the mean. This is used internally for normalization. - sdv::Dict{String,Array}: Dictionary with an entry for each attribute
[file name (e.g technology)]-[column name (e.g. location)]
, Each entry of the dictionary is a 1-dimensionalperiods K
-Array holding the standard deviation. This is used internally for normalization. - delta_t::Array{Float64,2}: 2-dimensional
time-steps T x periods K
-Array with the temporal duration Δt for each timestep. The default is that all timesteps have the same length. - k_ids::Array{Int}: 1-dimensional
original periods I
-Array with the information, which original period is represented by which period K. E.g. if the data is a year of 365 periods, the array has length 365. If an original period is not represented by any period within this ClustData the entry will be0
.
ClustForOpt.ClustData
— MethodClustData(data::ClustDataMerged)
constructor 2: Convert ClustDataMerged to ClustData
ClustForOpt.ClustData
— MethodClustData(data::FullInputData,K,T)
constructor 3: Convert FullInputData to ClustData
ClustForOpt.ClustData
— MethodClustData(region::String,
years::Array{Int,1},
K::Int,
T::Int,
data::Dict{String,Array},
weights::Array{Float64},
delta_t::Array{Float64,2},
k_ids::Array{Int,1};
mean::Dict{String,Array}=Dict{String,Array}(),
sdv::Dict{String,Array}=Dict{String,Array}()
)
constructor 1 for ClustData: provide data as dict
ClustForOpt.ClustDataMerged
— Type ClustDataMerged <: TSData
Contains time series data by attribute (e.g. wind, solar, electricity demand) and respective information.
Fields:
- region::String: optional information to specify the region data belongs to
- K::Int: number of periods
- T::Int: time steps per period
- data::Array: Array of the dimension
(time-steps T * length(data_types) x periods K
. The first T rows are datatype 1, the second T rows are datatype 2, ... - data_type::Array{String}: The data types (attributes) of the data.
- weights::Array{Float64,2}: 1-dimensional
periods K
-Array with the absolute weight for each period. E.g. for a year of 365 days, sum(weights)=365 - mean::Dict{String,Array}: Dictionary with a entry for each attribute
[file name (e.g technology)]-[column name (e.g. location)]
, Each entry of the dictionary is a 1-dimensionalperiods K
-Array holding the shift of the mean - sdv::Dict{String,Array}: Dictionary with an entry for each attribute
[file name (e.g technology)]-[column name (e.g. location)]
, Each entry of the dictionary is a 1-dimensionalperiods K
-Array holding the standard deviation - delta_t::Array{Float64,2}: 2-dimensional
time-steps T x periods K
-Array with the temporal duration Δt for each timestep in [h] - k_ids::Array{Int}: 1-dimensional
original periods I
-Array with the information, which original period is represented by which period K. If an original period is not represented by any period within this ClustData the entry will be0
.
ClustForOpt.ClustDataMerged
— MethodClustDataMerged(data::ClustData)
constructor 2: convert ClustData into merged format
ClustForOpt.ClustDataMerged
— MethodClustDataMerged(region::String,
years::Array{Int,1},
K::Int,
T::Int,
data::Array,
data_type::Array{String},
weights::Array{Float64},
k_ids::Array{Int,1};
delta_t::Array{Float64,2}=ones(T,K),
mean::Dict{String,Array}=Dict{String,Array}(),
sdv::Dict{String,Array}=Dict{String,Array}()
)
constructor 1: construct ClustDataMerged
ClustForOpt.ClustResult
— TypeClustResult <: AbstractClustResult
Contains the results from a clustering run: The data, the cost in terms of the clustering algorithm, and a config file describing the clustering method used.
Fields:
- clust_data::ClustData
- cost::Float64: Cost of the clustering algorithm
- config::Dict{String,Any}: Details on the clustering method used
ClustForOpt.ClustResultAll
— TypeClustResultAll <: AbstractClustResult
Contains the results from a clustering run for all locally converged solutions
Fields:
- clust_data::ClustData: The best centers, weights, clustids in terms of cost of the clustering algorithm
- cost::Float64: Cost of the clustering algorithm
- config::Dict{String,Any}: Details on the clustering method used
- centers_all::Array{Array{Float64},1}
- weights_all::Array{Array{Float64},1}
- clustids_all::Array{Array{Int,1},1}
- cost_all::Array{Float64,1}
- iter_all::Array{Int,1}
ClustForOpt.SimpleExtremeValueDescr
— TypeSimpleExtremeValueDescr
Defines a simple extreme day by its characteristics
Fields:
- data_type::String : Choose one of the attributes from the data you have loaded into ClustData
- extremum::String :
min
,max
- peak_def::String :
absolute
,integral
- consecutive_periods::Int: For a single extreme day, set as 1
ClustForOpt.SimpleExtremeValueDescr
— MethodSimpleExtremeValueDescr(data_type::String,
extremum::String,
peak_def::String)
Defines a simple extreme day by its characteristics
Input options:
- data_type::String : Choose one of the attributes from the data you have loaded into ClustData
- extremum::String :
min
,max
- peak_def::String :
absolute
,integral
ClustForOpt.calc_SSE
— Method calc_SSE(data::Array,centers::Array,assignments::Array)
calculates Sum of Squared Errors between cluster representations and the data
ClustForOpt.calc_SSE
— Methodcalc_SSE(data::Array,centers::Array,assignments::Array)
calculates Sum of Squared Errors between cluster representations and the data
ClustForOpt.combine_timeseries_weather_data
— Method combine_timeseries_weather_data(ts::ClustData,ts_weather::ClustData)
-ts
is the shorter timeseries with e.g. the demand -ts_weather
is the longer timeseries with the weather information The ts
-timeseries is repeated to match the number of periods of the longer ts_weather
-timeseries. If the number of periods of the ts_weather
data isn't a multiple of the ts
-timeseries, the necessary number of the ts
-timeseries periods 1 to x are attached to the end of the new combined timeseries.
ClustForOpt.extreme_val_output
— Methodextreme_val_output(data::ClustData,
extr_val_idcs::Array{Int,1};
rep_mod_method="feasibility")
Takes indices as input and returns ClustData struct that contains the extreme vals from within data.
ClustForOpt.extreme_val_output
— Methodextremevaloutput(data::ClustData,extrvalidcs::Array{Int,1};repmodmethod="feasibility")
wrapper function for a single extreme val. Takes indices as input and returns ClustData struct that contains the extreme vals from within data.
ClustForOpt.get_sup_kw_args
— Methodget_sup_kw_args
Returns supported keyword arguments for clustering function run_clust()
ClustForOpt.kmedoids_exact
— Methodkmedoids_exact(
data::Array{Float64},
nclust::Int,
_dist::SemiMetric = SqEuclidean(),
env::Any;
)
results = kmedoids_exact() data { HOURS,DAYS } Performs the exact kmedoids algorithm as in Kotzur et al, 2017 optimizer=Gurobi.Optimizer
ClustForOpt.load_timeseries_data
— Methodfunction load_timeseries_data(data_path::String;
region::String="none",
T::Int=24,
years::Array{Int,1}=[2016],
att::Array{String,1}=Array{String,1}())
Return all time series as ClustData struct that are stored as csv files in the specified path.
- Loads
*.csv
files in the folder or the filedata_path
- Loads all attributes (all
*.csv
files) if theatt
-Array is empty or only the files specified inatt
- The
*.csv
files shall have the following structure and must have the same length:
Timestamp | Year | [column names...] |
---|---|---|
[iterator] | [year] | [values] |
- The first column of a
.csv
file should be calledTimestamp
if it contains a time iterator - The second column should be called
Year
and contains the corresponding year - Each other column should contain the time series data. For one node systems, only one column is used; for an N-node system, N columns need to be used. In an N-node system, each column specifies time series data at a specific geolocation.
- Returns time series as ClustData struct
- The
.data
field of the ClustData struct is a Dictionary where each column in[file name].csv
file is the key (called"[file name]-[column name]"
).file name
should correspond to the attribute name, andcolumn name
should correspond to the node name.
Optional inputs to load_timeseries_data
:
- region-region descriptor
- T- Number of Segments
- years::Array{Int,1}= The years to be selected from the csv file as specified in
years column
- att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.
ClustForOpt.load_timeseries_data
— Methodfunction load_timeseries_data(existing_data::Symbol;
region::String="none",
T::Int=24,
years::Array{Int,1}=[2016],
att::Array{String,1}=Array{String,1}())
Return time series of example data sets as ClustData struct.
The choice of example data set is given by e.g. existing_data=:CEP-GER1. Example data sets are:
:DAM_CA
: Hourly Day Ahead Market Electricity prices for California-Stanford 2015:DAM_GER
: Hourly Day Ahead Market Electricity prices for Germany 2015:CEP_GER1
: Hourly Wind, Solar, Demand data Germany one node:CEP_GER18
: Hourly Wind, Solar, Demand data Germany 18 nodes
Optional inputs to load_timeseries_data
:
- region-region descriptor
- T- Number of Segments
- years::Array{Int,1}= The years to be selected from the csv file as specified in
years column
- att::Array{String,1}= The attributes to be loaded. If left empty, all attributes will be loaded.
ClustForOpt.representation_modification
— Methodrepresentation_modification(extr_vals::ClustData,clust_data::ClustData)
Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod
ClustForOpt.representation_modification
— Methodrepresentation_modification(full_data::ClustData,clust_data::ClustData,extr_val_idcs::Array{Int,1};rep_mod_method::String="feasibility")
Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod
ClustForOpt.representation_modification
— Methodrepresentation_modification(full_data::ClustData,clust_data::ClustData,extr_val_idcs::Int;rep_mod_method::String="feasibility")
wrapper function for a single extreme val. Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod
ClustForOpt.representation_modification
— Methodrepresentation_modification(extr_vals::ClustData,clust_data::ClustData)
Merges the clustered data and extreme vals into one ClustData struct. Weights are chosen according to the repmodmethod
ClustForOpt.resize_medoids
— Methodresize_medoids(data::Array,centers::Array,weights::Array)
This is the DEFAULT resize medoids function Takes in centers (typically medoids) and normalizes them such that the yearly average of the clustered data is the same as the yearly average of the original data.
ClustForOpt.resize_medoids
— Method resize_medoids(data::Array,centers::Array,weights::Array)
This is the DEFAULT resize medoids function Takes in centers (typically medoids) and normalizes them such that the yearly average of the clustered data is the same as the yearly average of the original data.
ClustForOpt.run_battery_opt
— Methodrun_battery_opt(data::ClustData)
operational battery storage optimization problem runs every day seperately and adds results in the end
ClustForOpt.run_clust
— Methodrun_clust(
data::ClustData,
n_clust_ar::Array{Int,1};
norm_op::String="zscore",
norm_scope::String="full",
method::String="kmeans",
representation::String="centroid",
n_init::Int=100,
iterations::Int=300,
save::String="",
kwargs...)
Run multiple number of clusters k and return an array of results. This function is a wrapper function around run_clust().
ClustForOpt.run_clust
— Methodrun_clust(data::ClustData;
norm_op::String="zscore",
norm_scope::String="full",
method::String="kmeans",
representation::String="centroid",
n_clust::Int=5,
n_seg::Int=data.T,
n_init::Int=1000,
iterations::Int=300,
attribute_weights::Dict{String,Float64}=Dict{String,Float64}(),
save::String="",#QUESTION dead?
get_all_clust_results::Bool=false,
kwargs...)
Take input data data
of dimensionality N x T
and cluster into data of dimensionality K x T
.
The following combinations of method
and representation
are supported by run_clust
:
Name | method | representation | comment |
---|---|---|---|
k-means clustering | <kmeans> | <centroid> | - |
k-means clustering with medoid representation | <kmeans> | <medoid> | - |
k-medoids clustering (partitional) | <kmedoids> | <medoid> | - |
k-medoids clustering (exact) | <kmedoids_exact> | <medoid> | requires Gurobi and the additional keyword argument kmexact_optimizer . See [examples] folder for example use. Set n_init=1 |
hierarchical clustering with centroid representation | <hierarchical> | <centroid> | set n_init=1 |
hierarchical clustering with medoid representation | <hierarchical> | <medoid> | set n_init=1 |
The other optional inputs are:
Keyword | options | comment |
---|---|---|
norm_op | zscore | Normalization operation. 0-1 not yet implemented |
norm_scope | full ,sequence ,hourly | Normalization scope. The default (full ) is used in most of the current literature. |
n_clust | e.g. 5 | Number of clusters that you want to obtain |
n_seg | e.g. 10 | Number of segments per period. Not yet implemented, keep as default value. |
n_init | e.g. 1000 | Number of initializations of locally converging clustering algorithms. 10000 often yields very stable results. |
iterations | e.g. 300 | Internal parameter of the partitional clustering algorithms. |
attribute_weights | e.g. Dict("wind-germany"=>3,"solar-germany"=>1,"el_demand-germany"=>5) | weights the respective attributes when clustering. In this example, demand and wind are deemed more important than solar. |
save | false | Save clustered data as csv or jld2 file. Not yet implemented. |
get_all_clust_results | true ,false | false gives a ClustData struct with only the best locally converged solution in terms of clustering measure. true gives a ClustDataAll struct as output, with all locally converged solutions. |
kwargs | e.g. kmexact_optimizer | optional keyword arguments that are required for specific methods, for example k-medoids exact. |
ClustForOpt.run_gas_opt
— Methodrun_gas_opt(data::ClustData)
operational gas turbine optimization problem runs every day seperately and adds results in the end
ClustForOpt.sakoe_chiba_band
— Method sakoe_chiba_band(r::Int,l::Int)
calculates the minimum and maximum allowed indices for a lxl windowed matrix for the sakoe chiba band (see Sakoe Chiba, 1978). Input: radius r, such that |i(k)-j(k)| <= r length l: dimension 2 of the matrix
ClustForOpt.simple_extr_val_sel
— Methodsimple_extr_val_sel(data::ClustData,
extreme_value_descr_ar::Array{SimpleExtremeValueDescr,1};
rep_mod_method::String="feasibility")
Selects simple extreme values and returns modified data, extreme values, and the corresponding indices.
Inputs options for rep_mod_method
:
rep_mod_method
::String :feasibility
,append
ClustForOpt.simple_extr_val_sel
— Methodsimple_extr_val_sel(data::ClustData,
extreme_value_descr::SimpleExtremeValueDescr;
rep_mod_method::String="feasibility")
Wrapper function for only one simple extreme value. Selects simple extreme values and returns modified data, extreme values, and the corresponding indices.
ClustForOpt.sort_centers
— Methodsort_centers(centers::Array,weights::Array)
- centers: hours x days e.g.[24x9]
- weights: days [e.g. 9], unsorted
sorts the centers by weights from largest to smallest
ClustForOpt.undo_z_normalize
— Methodundo_z_normalize(data_norm, mn, sdv; idx=[])
undo z-normalization data with mean and sdv by hour normalized data: input format: (1st dimension: 24 hours, 2nd dimension: # of days) hourlymean ; 24 hour vector with hourly means hourlysdv; 24 hour vector with hourly standard deviations
ClustForOpt.undo_z_normalize
— Methodundo_z_normalize(data_norm_merged::Array,mn::Dict{String,Array},sdv::Dict{String,Array};idx=[])
provide idx should usually be done as default within function call in order to enable sequence-based normalization, even though optional.
ClustForOpt.z_normalize
— Method z_normalize(data::Array;scope="full")
z-normalize data with mean and sdv by hour data: input format: (1st dimension: 24 hours, 2nd dimension: # of days) scope: "full": one mean and sdv for the full data set; "hourly": univariate scaling: each hour is scaled seperately; "sequence": sequence based scaling
ClustForOpt.z_normalize
— Methodz_normalize(data::ClustData;scope="full")
scope: "full", "sequence", "hourly"
ClustForOpt.kmedoidsResult
— TypeHolds results of kmedoids run
ClustForOpt.add_timeseries_data!
— Methodadd_timeseries_data!(dt::Dict{String,Array}, data::DataFrame; K::Int=0, T::Int=24, years::Array{Int,1}=[2016])
selects first the years and second the data_points so that their number is a multiple of T and same with the other timeseries
ClustForOpt.add_timeseries_data!
— Methodadd_timeseries_data!(dt::Dict{String,Array}, data::DataFrame; K::Int=0, T::Int=24, years::Array{Int,1}=[2016])
selects first the years and second the data_points so that their number is a multiple of T and same with the other timeseries
ClustForOpt.attribute_weighting
— Methodfunction attributeweighting(data::ClustData,attributeweights::Dict{String,Float64})
apply the different attribute weights based on the dictionary entry for each tech or exact name
ClustForOpt.calc_centroids
— Methodcalc_centroids(data::Array,assignments::Array)
Given the data and cluster assignments, this function finds the centroid of the respective clusters.
ClustForOpt.calc_medoids
— Method calc_medoids(data::Array,assignments::Array)
Given the data and cluster assignments, this function finds the medoids that are closest to the cluster center.
ClustForOpt.calc_weights
— Methodcalc_weights(clustids::Array{Int}, n_clust::Int)
Calculates weights for clusters, based on clustids that are assigned to a certain cluster. The weights are absolute: weights[i]>=1
ClustForOpt.check_kw_args
— Methodcheck_kw_args(region,opt_problems,norm_op,norm_scope,method,representation)
checks if the arguments supplied for run_clust are supported
ClustForOpt.find_column_name
— Method find_column_name(df::DataFrame, name_itr::Arrray{Symbol,1})
find wich of the supported name in name_itr
is used as an
ClustForOpt.get_mean_data
— Method get_mean_data(data::Array, clustids::Array{Int,1})
Calculate mean of data: The number of columns is kept the same, mean is calculated for aggregated columns and the same in all with same clustid
ClustForOpt.input_data_modification
— Methodinput_data_modification(data::ClustData,
extr_val_idcs::Array{Int,1})
Returns ClustData structs with extreme vals and with remaining input data [data-extreme_vals]. Gives extreme vals the weight that they had in data. This function is needed for the append method for representation modification ! the k-ids have to be monoton increasing - don't modify clustered data !
ClustForOpt.input_data_modification
— Methodinput_data_modification(data::ClustData,extr_val_idcs::Int)
wrapper function for a single extreme val. returns ClustData structs with extreme vals and with remaining input data [data-extreme_vals]. Gives extreme vals the weight that they had in data.
ClustForOpt.intraperiod_segmentation
— Method intraperiod_segmentation(data_merged::ClustDataMerged;n_seg::Int=24,iterations::Int=300,norm_scope::String="full")
!!! Not yet proven implementation of segmentation introduced by Bahl et al. 2018
ClustForOpt.merge_clustids!
— Method merge_clustids!(clustids::Array{Int,1},index::Int)
Calculate the new clustids by merging the cluster of the index provided with the cluster of index+1
ClustForOpt.run_clust_hierarchical
— Methodrun_clust_hierarchical(
data_norm::ClustDataMerged,
n_clust::Int,
iterations::Int;
_dist::SemiMetric = SqEuclidean()
)
Helper function to run runclusthierarchicalcentroids and runclusthierarchicalmedoid
ClustForOpt.run_clust_hierarchical_centroid
— Methodrun_clust_hierarchical_centroid(
data_norm::ClustDataMerged,
n_clust::Int,
iterations::Int;
_dist::SemiMetric = SqEuclidean()
)
ClustForOpt.run_clust_hierarchical_medoid
— Methodrun_clust_hierarchical_medoid(
data_norm::ClustDataMerged,
n_clust::Int,
iterations::Int;
_dist::SemiMetric = SqEuclidean()
)
ClustForOpt.run_clust_hierarchical_partitional
— Method run_clust_hierarchical_partitional(data::Array, n_seg::Int)
!!! Not yet proven Usees provided data and number of segments to aggregate them together
ClustForOpt.run_clust_kmeans_centroid
— Methodrun_clust_kmeans_centroid(data_norm::ClustDataMerged,n_clust::Int,iterations::Int)
ClustForOpt.run_clust_kmeans_medoid
— Methodrun_clust_kmeans_medoid(
data_norm::ClustDataMerged,
n_clust::Int,
iterations::Int
)
ClustForOpt.run_clust_kmedoids_exact_medoid
— Methodrun_clust_kmedoids_exact_medoid(
data_norm::ClustDataMerged,
n_clust::Int,
iterations::Int;
gurobi_env=0
)
ClustForOpt.run_clust_kmedoids_medoid
— Methodrun_clust_kmedoids_medoid(
data_norm::ClustDataMerged,
n_clust::Int,
iterations::Int
)
ClustForOpt.run_clust_method
— Methodrun_clust_method(data::ClustData;
norm_op::String="zscore",
norm_scope::String="full",
method::String="kmeans",
representation::String="centroid",
n_clust::Int=5,
n_seg::Int=data.T,
n_init::Int=100,
iterations::Int=300,
orig_k_ids::Array{Int,1}=Array{Int,1}(),
kwargs...)
method: "kmeans","kmedoids","kmedoids_exact","hierarchical" representation: "centroid","medoid"
ClustForOpt.run_clust_segmentation
— Method run_clust_segmentation(period::Array{Float64,2};n_seg::Int=24,iterations::Int=300,norm_scope::String="full")
!!! Not yet proven implementation of segmentation introduced by Bahl et al. 2018
ClustForOpt.run_pure_clust
— Methodrun_pure_clust(data::ClustData; norm_op::String="zscore", norm_scope::String="full", method::String="kmeans", representation::String="centroid", n_clust_1::Int=5, n_clust_2::Int=3, n_seg::Int=data.T, n_init::Int=100, iterations::Int=300, attribute_weights::Dict{String,Float64}=Dict{String,Float64}(), clust::Array{String,1}=Array{String,1}(), get_all_clust_results::Bool=false, kwargs...)
Replace the original timeseries of the attributes in clust with their clustered value
ClustForOpt.set_clust_config
— Methodset_clust_config(;kwargs...)
Add kwargs to a new Dictionary with the variables as entries
ClustForOpt.simple_extr_val_ident
— Methodsimple_extr_val_ident(data::ClustData,
extreme_value_descr_ar::Array{SimpleExtremeValueDescr,1})
Identifies multiple simple extreme values from the data and returns array of column indices of extreme value within data
- data_type: any attribute from the attributes contained within data
- extremum: "min" or "max"
- peak_def: "absolute" or "integral"
ClustForOpt.simple_extr_val_ident
— Methodsimple_extr_val_ident(data::ClustData,
extreme_value_descr::SimpleExtremeValueDescr)
Wrapper function for only one simple extreme value: identifies a single simple extreme value from the data and returns column index of extreme value
data_type
: any attribute from the attributes contained within dataextremum
: "min" or "max"peak_def
: "absolute" or "integral"consecutive_periods
: number of consecutive_periods combined to analyze
ClustForOpt.simple_extr_val_ident
— Methodsimple_extr_val_ident(clust_data::ClustData,
data_type::String;
extremum::String="max",
peak_def::String="absolute",
consecutive_periods::Int=1)
Identifies a single simple extreme period from the data and returns column index of extreme period
data_type
: any attribute from the attributes contained within dataextremum
: "min" or "max"peak_def
: "absolute" or "integral"consecutive_periods
: The number of consecutive periods that are summed to identify a maximum or minimum. A rolling approach is used: E.g. for a value ofconsecutive_periods
=2: 1) 1st & 2nd periods summed, 2) 2nd & 3rd period summed, 3) 3rd & 4th ... The min/max of the 1), 2), 3)... is determined and the two periods indices, where the min/max were identified, are returned