# Methods

## Time Domain Reduction

Rather than modeling and optimizing power grid operations at a high temporal resolution (e.g., hourly) while evaluating new capacity investments, which can be computationally expensive for large-scale studies with several resources, it may be useful to consider a reduced temporal resolution to model annual grid operations. Such a time-domain reduction is often employed in CEMs as a way to balance model spatial and temporal resolution as well as representation of dispatch, while ensuring reasonable computational times. The time-domain reduction method provided allows the user to automate these feature by specifying the various parameters related to the time-domain reduction algorithm (via time_domain_reduction_settings.yml described under Model Inputs/Outputs documentations/Inputs), including the desired level of temporal resolution to be used in formulating the resulting optimization model.

`Dolphyn.RemoveConstCols`

— Function`RemoveConstCols(all_profiles, all_col_names)`

Remove and store the columns that do not vary during the period.

`Dolphyn.check_condition`

— Method`check_condition(Threshold, R, OldColNames, ScalingMethod, TimestepsPerRepPeriod)`

Check whether the greatest Euclidean deviation in the input data and the clustered representation is within a given proportion of the "maximum" possible deviation.

(1 for Normalization covers 100%, 4 for Standardization covers ~95%)

`Dolphyn.cluster`

— Function`cluster(ClusterMethod, ClusteringInputDF, NClusters, nIters)`

Get representative periods using cluster centers from kmeans or kmedoids.

K-Means: https://juliastats.org/Clustering.jl/dev/kmeans.html

K-Medoids: https://juliastats.org/Clustering.jl/stable/kmedoids.html

`Dolphyn.cluster_inputs`

— Function`cluster_inputs(inpath, settings_path, v=false, norm_plot=false, silh_plot=false, res_plots=false, indiv_plots=false, pair_plots=false)`

Use kmeans or kmedoids to cluster raw load profiles and resource capacity factor profiles into representative periods. Use Extreme Periods to capture noteworthy periods or periods with notably poor fits.

In Load_data.csv, include the following:

- Timesteps_per_Rep_Period - Typically 168 timesteps (e.g., hours) per period, this designates the length of each representative period.
- UseExtremePeriods - Either 1 or 0, this designates whether or not to include outliers (by performance or load/resource extreme) as their own representative periods. This setting automatically includes the periods with maximum load, minimum solar cf and minimum wind cf as extreme periods.
- ClusterMethod - Either 'kmeans' or 'kmedoids', this designates the method used to cluster periods and determine each point's representative period.
- ScalingMethod - Either 'N' or 'S', this designates directs the module to normalize ([0,1]) or standardize (mean 0, variance 1) the input data.
- MinPeriods - The minimum number of periods used to represent the input data. If using UseExtremePeriods, this must be at least three. If IterativelyAddPeriods if off, this will be the total number of periods.
- MaxPeriods - The maximum number of periods - both clustered periods and extreme periods - that may be used to represent the input data.
- IterativelyAddPeriods - Either 1 or 0, this designates whether or not to add periods until the error threshold between input data and represented data is met or the maximum number of periods is reached.
- Threshold - Iterative period addition will end if the period farthest (Euclidean Distance) from its representative period is within this percentage of the total possible error (for normalization) or ~95% of the total possible error (for standardization). E.g., for a threshold of 0.01, every period must be within 1% of the spread of possible error before the clustering iterations will terminate (or until the max number of periods is reached).
- IterateMethod - Either 'cluster' or 'extreme', this designates whether to add clusters to the kmeans/kmedoids method or to set aside the worst-fitting periods as a new extreme periods.
- nReps - The number of times to repeat each kmeans/kmedoids clustering at the same setting.
- LoadWeight - Default 1, this is an optional multiplier on load columns in order to prioritize better fits for load profiles over resource capacity factor profiles.
- WeightTotal - Default 8760, the sum to which the relative weights of representative periods will be scaled.
- ClusterFuelPrices - Either 1 or 0, this indicates whether or not to use the fuel price time series in Fuels_data.csv in the clustering process. If 'no', this function will still write Fuels_data_clustered.csv with reshaped fuel prices based on the number and size of the representative weeks, assuming a constant time series of fuel prices with length equal to the number of timesteps in the raw input data.

`Dolphyn.get_absolute_extreme`

— Method`get_absolute_extreme(DF, statKey, col_names, ConstCols)`

Get the period index of the single timestep with the minimum or maximum load or capacity factor.

`Dolphyn.get_extreme_period`

— Function```
get_extreme_period(DF, GDF, profKey, typeKey, statKey,
ConstCols, load_col_names, solar_col_names, wind_col_names)
```

Identify extreme week by specification of profile type (Load, PV, Wind), measurement type (absolute (timestep with min/max value) vs. integral (period with min/max summed value)), and statistic (minimum or maximum). I.e., the user could want the hour with the most load across the whole system to be included among the extreme periods. They would select "Load", "System, "Absolute, and "Max".

`Dolphyn.get_integral_extreme`

— Method`get_integral_extreme(GDF, statKey, col_names, ConstCols)`

Get the period index with the minimum or maximum load or capacity factor summed over the period.

`Dolphyn.get_load_multipliers`

— Function`get_load_multipliers(ClusterOutputData, ModifiedData, M, W, LoadCols, TimestepsPerRepPeriod, NewColNames, NClusters, Ncols)`

Get multipliers to linearly scale clustered load profiles L zone-wise such that their weighted sum equals the original zonal total load. Scale load profiles later using these multipliers in order to ensure that a copy of the original load is kept for validation.

Find $k_z$ such that:

\[\sum_{i \in I} L_{i,z} = \sum_{t \in T, m \in M} C_{t,m,z} \cdot \frac{w_m}{T} \cdot k_z \: \: \: \forall z \in Z\]

where $Z$ is the set of zones, $I$ is the full time domain, $T$ is the length of one period (e.g., 168 for one week in hours), $M$ is the set of representative periods, $L_{i,z}$ is the original zonal load profile over time (hour) index $i$, $C_{i,m,z}$ is the load in timestep $i$ for representative period $m$ in zone $z$, $w_m$ is the weight of the representative period equal to the total number of hours that one hour in representative period $m$ represents in the original profile, and $k_z$ is the zonal load multiplier returned by the function.

`Dolphyn.get_worst_period_idx`

— Method`get_worst_period_idx(R)`

Get the index of the period that is farthest from its representative period by Euclidean distance.

`Dolphyn.parse_data`

— Method`parse_data(myinputs)`

Get load, solar, wind, and other curves from the input data.

`Dolphyn.rmse_score`

— Method`rmse_score(y_true, y_pred)`

Calculates Root Mean Square Error.

\[RMSE = \sqrt{\frac{1}{n}\Sigma_{i=1}^{n}{\Big(\frac{d_i -f_i}{\sigma_i}\Big)^2}}\]

`Dolphyn.scale_weights`

— Function`scale_weights(W, H)`

Linearly scale weights W such that they sum to the desired number of timesteps (hours) H.

\[w_j \leftarrow H \cdot \frac{w_j}{\sum_i w_i} \: \: \: \forall w_j \in W\]