smote(X, y, k, under, over)

This function implements the SMOTE algorithm for generating synthetic cases to re-balance the proportion of positive and negative observations. The pct_under and pct_over parameters control the proportion of under-sampling of the majority class and the proportion of over-sampling the minority class. Note that pct_under controls undersampling by selecting pctunder/100 observations for each _newly created minority class observation. The value of k allows us determine who is considered a "neighbor" when generating synthetic cases.


Given a column from a DataFrames.DataFrame, this function returns the majority/minority class label.