Examples

This section contains a number of examples on how to use CluGen. Each example must be preceded with:

using CluGen, Distributions, Plots

The plots of each example are generated with helper functions available here.

2D examples

The 2D examples were plotted with the plot_examples_2d() helper function, available here. For plotting an example directly, e.g. for e001, run:

plot(e001.points[:, 1], e001.points[:, 2], seriestype=:scatter, group=e001.clusters)

Manipulating the direction of cluster-supporting lines

Using the direction parameter

# PRNG seed
seed = 1
e001 = clugen(2, 4, 200, [1, 0], 0, [10, 10], 10, 1.5, 0.5; rng=seed)
e002 = clugen(2, 4, 200, [1, 1], 0, [10, 10], 10, 1.5, 0.5; rng=seed)
e003 = clugen(2, 4, 200, [0, 1], 0, [10, 10], 10, 1.5, 0.5; rng=seed)
plt = plot_examples_2d(
    e001, "e001: direction = [1, 0]",
    e002, "e002: direction = [1, 1]",
    e003, "e003: direction = [0, 1]")

Changing the angle_disp parameter and using a custom angle_deltas_fn function

# PRNG seed
seed = 1

# Custom angle_deltas function: arbitrarily rotate some clusters by 90 degrees
angdel_90_fn(nclu, astd; rng=nothing) = rand(rng, [0, pi / 2], nclu)
e004 = clugen(2, 6, 500, [1, 0], 0, [10, 10], 10, 1.5, 0.5; rng=seed)
e005 = clugen(2, 6, 500, [1, 0], pi / 8, [10, 10], 10, 1.5, 0.5; rng=seed)
e006 = clugen(2, 6, 500, [1, 0], 0, [10, 10], 10, 1.5, 0.5;
    angle_deltas_fn=angdel_90_fn, rng=seed)
plt = plot_examples_2d(
    e004, "e004: angle_disp = 0",
    e005, "e005: angle_disp = π/8",
    e006, "e006: custom angle_deltas function")

Manipulating the length of cluster-supporting lines

Using the llength parameter

# PRNG seed
seed = 11
e007 = clugen(2, 5, 800, [1, 0], pi / 10, [10, 10], 0, 0, 0.5; point_dist_fn="n", rng=seed)
e008 = clugen(2, 5, 800, [1, 0], pi / 10, [10, 10], 10, 0, 0.5; point_dist_fn="n", rng=seed)
e009 = clugen(2, 5, 800, [1, 0], pi / 10, [10, 10], 30, 0, 0.5; point_dist_fn="n", rng=seed)
plt = plot_examples_2d(
    e007, "e007: llength = 0",
    e008, "e008: llength = 10",
    e009, "e009: llength = 30")

Changing the llength_disp parameter and using a custom llengths_fn function

# PRNG seed
seed = 11

# Custom llengths function: line lengths grow for each new cluster
llen_grow_fn(nclu, llen, llenstd; rng=nothing) =
    llen * (collect(0:(nclu - 1)) + llenstd * randn(rng, nclu))
e010 = clugen(2, 5, 800, [1, 0], pi / 10, [10, 10], 15,  0.0, 0.5;
    point_dist_fn="n", rng=seed)
e011 = clugen(2, 5, 800, [1, 0], pi / 10, [10, 10], 15, 10.0, 0.5;
    point_dist_fn="n", rng=seed)
e012 = clugen(2, 5, 800, [1, 0], pi / 10, [10, 10], 10,  0.1, 0.5;
    llengths_fn=llen_grow_fn, point_dist_fn="n", rng=seed)
plt = plot_examples_2d(
    e010, "e010: llength_disp = 0.0",
    e011, "e011: llength_disp = 5.0",
    e012, "e012: custom llengths function")

Manipulating relative cluster positions

Using the cluster_sep parameter

# PRNG seed
seed = 1
e013 = clugen(2, 8, 1000, [1, 1], pi / 4, [10, 10], 10, 2, 2.5; rng=seed)
e014 = clugen(2, 8, 1000, [1, 1], pi / 4, [30, 10], 10, 2, 2.5; rng=seed)
e015 = clugen(2, 8, 1000, [1, 1], pi / 4, [10, 30], 10, 2, 2.5; rng=seed)
plt = plot_examples_2d(
    e013, "e013: cluster_sep = [10, 10]",
    e014, "e014: cluster_sep = [30, 10]",
    e015, "e015: cluster_sep = [10, 30]")

Changing the cluster_offset parameter and using a custom clucenters_fn function

# PRNG seed
seed = 1

# Custom clucenters function: places clusters in a diagonal
centers_diag_fn(nclu, csep, coff; rng=nothing) =
    ones(nclu, length(csep)) .* (1:nclu) * maximum(csep) .+ coff'
e016 = clugen(2, 8, 1000, [1, 1], pi / 4, [10, 10], 10, 2, 2.5; rng=seed)
e017 = clugen(2, 8, 1000, [1, 1], pi / 4, [10, 10], 10, 2, 2.5;
    cluster_offset=[20, -20], rng=seed)
e018 = clugen(2, 8, 1000, [1, 1], pi / 4, [10, 10], 10, 2, 2.5;
    cluster_offset=[-50, -50], clucenters_fn=centers_diag_fn, rng=seed)
plt = plot_examples_2d(
    e016, "e016: default",
    e017, "e017: cluster_offset = [20, -20]",
    e018, "e018: custom clucenters function")

Lateral dispersion and placement of point projections on the line

Normal projection placement (default): proj_dist_fn = "norm"

# PRNG seed
seed = 6
e019 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 0.0; rng=seed)
e020 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 1.0; rng=seed)
e021 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 3.0; rng=seed)
plt = plot_examples_2d(
    e019, "e019: lateral_disp = 0",
    e020, "e020: lateral_disp = 1",
    e021, "e021: lateral_disp = 3")

Uniform projection placement: proj_dist_fn = "unif"

# PRNG seed
seed = 6
e022 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 0.0;
    proj_dist_fn="unif", rng=seed)
e023 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 1.0;
    proj_dist_fn="unif", rng=seed)
e024 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 3.0;
    proj_dist_fn="unif", rng=seed)
plt = plot_examples_2d(
    e022, "e022: lateral_disp = 0",
    e023, "e023: lateral_disp = 1",
    e024, "e024: lateral_disp = 3")

Custom projection placement using the Laplace distribution

# PRNG seed
seed = 6

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e025 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 0.0;
    proj_dist_fn=proj_laplace, rng=seed)
e026 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 1.0;
    proj_dist_fn=proj_laplace, rng=seed)
e027 = clugen(2, 4, 1000, [1, 0], pi / 2, [20, 20], 13, 2, 3.0;
    proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_2d(
    e025, "e025: lateral_disp = 0",
    e026, "e026: lateral_disp = 1",
    e027, "e027: lateral_disp = 3")

Controlling final point positions from their projections on the cluster-supporting line

Points on hyperplane orthogonal to cluster-supporting line (default): point_dist_fn = "n-1"

# PRNG seed
seed = 5

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e028 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0; rng=seed)
e029 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    proj_dist_fn="unif", rng=seed)
e030 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_2d(
    e028, "e028: proj_dist_fn=\"norm\" (default)",
    e029, "e029: proj_dist_fn=\"unif\"",
    e030, "e030: custom proj_dist_fn (Laplace)")

Points around projection on cluster-supporting line: point_dist_fn = "n"

# PRNG seed
seed = 5

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e031 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    point_dist_fn="n", rng=seed)
e032 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    point_dist_fn="n", proj_dist_fn="unif", rng=seed)
e033 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    point_dist_fn="n", proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_2d(
    e031, "e031: proj_dist_fn=\"norm\" (default)",
    e032, "e032: proj_dist_fn=\"unif\"",
    e033, "e033: custom proj_dist_fn (Laplace)")

Custom point placement using the exponential distribution

# PRNG seed
seed = 5

# Custom point_dist_fn: final points placed using the Exponential distribution
function clupoints_n_1_exp(projs, lat_std, len, clu_dir, clu_ctr; rng=nothing)
    dist_exp(npts, lstd, rg) = lstd .* rand(rg, Exponential(2 / lstd), npts, 1)
    return CluGen.clupoints_n_1_template(projs, lat_std, clu_dir, dist_exp; rng=rng)
end

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e034 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    point_dist_fn=clupoints_n_1_exp, rng=seed)
e035 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    point_dist_fn=clupoints_n_1_exp, proj_dist_fn="unif", rng=seed)
e036 = clugen(2, 5, 1500, [1, 0], pi / 3, [20, 20], 12, 3, 1.0;
    point_dist_fn=clupoints_n_1_exp, proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_2d(
    e034, "e034: proj_dist_fn=\"norm\" (default)",
    e035, "e035: proj_dist_fn=\"unif\"",
    e036, "e036: custom proj_dist_fn (Laplace)")

Manipulating cluster sizes

# PRNG seed
seed = 9

# Custom clusizes_fn (e038): cluster sizes determined via the uniform distribution,
# no correction for total points
clusizes_unif(nclu, npts, ae; rng=nothing) =
    rand(rng, DiscreteUniform(1, 2 * npts / nclu), nclu)

# Custom clusizes_fn (e039): clusters all have the same size, no correction for total points
clusizes_equal(nclu, npts, ae; rng=nothing) = (npts ÷ nclu) .* ones(Integer, nclu)

# Custom clucenters_fn (all): yields fixed positions for the clusters
centers_fixed(nclu, csep, coff; rng=nothing) =
    [-csep[1] -csep[2]; csep[1] -csep[2]; -csep[1] csep[2]; csep[1] csep[2]]
e037 = clugen(2, 4, 1500, [1, 1], pi, [20, 20], 0, 0, 5;
    clucenters_fn=centers_fixed, point_dist_fn="n", rng=seed)
e038 = clugen(2, 4, 1500, [1, 1], pi, [20, 20], 0, 0, 5;
    clucenters_fn=centers_fixed, clusizes_fn=clusizes_unif, point_dist_fn="n", rng=seed)
e039 = clugen(2, 4, 1500, [1, 1], pi, [20, 20], 0, 0, 5;
    clucenters_fn=centers_fixed, clusizes_fn=clusizes_equal, point_dist_fn="n", rng=seed)
plt = plot_examples_2d(
    e037, "e037: normal dist. (default)",
    e038, "e038: unif. dist. (custom)",
    e039, "e039: equal size (custom)")

Direct specification of optional parameters

# PRNG seed
seed = 3
e040 = clugen(2, 4, 1000, [-1, 1], 0, [0, 0], 0, 0, 0.2;
    proj_dist_fn="unif", point_dist_fn="n", clusizes_fn=[50, 200, 500, 2000],
    llengths_fn=[0, 2, 4, 6], clucenters_fn=[-5 -5; -2.5 -2.5; 0 0; 2.5 2.5],
    rng=seed)

e041 = clugen(2, 5, 1000, [1 1; 1 0; 1 0; 0 1; 0 1], 0, [0, 0], 0, 0, 0.2;
    proj_dist_fn="unif", point_dist_fn="n", clusizes_fn=[200, 500, 500, 500, 500],
    llengths_fn=[0, 5, 5, 5, 5], clucenters_fn=[0 0; 0 5; 0 -5; 5 0; -5 0],
    rng=seed)

e042 = clugen(2, 5, 1000, [0 1; 0.25 0.75; 0.5 0.5; 0.75 0.25; 1 0], 0, [0, 0], 5, 0, 0.2;
    proj_dist_fn="unif", point_dist_fn="n", clusizes_fn=500 * ones(Int32,5),
    clucenters_fn=[-5 0; -3 -0.3; -1 -0.8; 1 -1.6; 3 -2.5],
    rng=seed)
plt = plot_examples_2d(
    e040, "e040: direct params 1",
    e041, "e041: direct params 2",
    e042, "e042: direct params 3")

3D examples

The 3D examples were plotted with the plot_examples_3d() helper function available here. For plotting an example directly, e.g. for e043, run:

plot(e043.points[:, 1], e043.points[:, 2], e043.points[:, 3], seriestype=:scatter, group=e043.clusters)`.

Manipulating the direction of cluster-supporting lines

Using the direction parameter

# PRNG seed
seed = 1
e043 = clugen(3, 4, 500, [1, 0, 0], 0, [10, 10, 10], 15, 1.5, 0.5; rng=seed)
e044 = clugen(3, 4, 500, [1, 1, 1], 0, [10, 10, 10], 15, 1.5, 0.5; rng=seed)
e045 = clugen(3, 4, 500, [0, 0, 1], 0, [10, 10, 10], 15, 1.5, 0.5; rng=seed)
plt = plot_examples_3d(
    e043, "e043: direction = [1, 0, 0]",
    e044, "e044: direction = [1, 1, 1]",
    e045, "e045: direction = [0, 0, 1]")

Changing the angle_disp parameter and using a custom angle_deltas_fn function

# PRNG seed
seed = 2

# Custom angle_deltas function: arbitrarily rotate some clusters by 90 degrees
angdel_90_fn(nclu, astd; rng=nothing) = rand(rng, [0, pi / 2], nclu)
e046 = clugen(3, 6, 1000, [1, 0, 0], 0, [10, 10, 10], 15, 1.5, 0.5; rng=seed)
e047 = clugen(3, 6, 1000, [1, 0, 0], pi / 8, [10, 10, 10], 15, 1.5, 0.5; rng=seed)
e048 = clugen(3, 6, 1000, [1, 0, 0], 0, [10, 10, 10], 15, 1.5, 0.5;
    angle_deltas_fn=angdel_90_fn, rng=seed)
plt = plot_examples_3d(
    e046, "e046: angle_disp = 0",
    e047, "e047: angle_disp = π / 8",
    e048, "e048: custom angle_deltas function")

Specifying a main direction for each cluster and changing angle_disp

# PRNG seed
seed = 7

# Directions for each cluster
dirs = [[0 0 1];[1 1 0];[-1 1 0];[1 0 0];[0 1 0]]
e049 = clugen(3, 5, 1000, dirs, 0, zeros(3), 10, 0, 0.1; rng=seed)
e050 = clugen(3, 5, 1000, dirs, π/12, zeros(3), 10, 0, 0.1; rng=seed)
e051 = clugen(3, 5, 1000, dirs, π/4, zeros(3), 10, 0, 0.1; rng=seed)
plt = plot_examples_3d(
    e049, "e049: angle_disp = 0",
    e050, "e050: angle_disp = π / 12",
    e051, "e051: angle_disp = π / 4")

Manipulating the length of cluster-supporting lines

Using the llength parameter

# PRNG seed
seed = 11
e052 = clugen(3, 5, 800, [1, 0, 0], pi / 10, [10, 10, 10], 0, 0, 0.5;
    point_dist_fn="n", rng=seed)
e053 = clugen(3, 5, 800, [1, 0, 0], pi / 10, [10, 10, 10], 10, 0, 0.5;
    point_dist_fn="n", rng=seed)
e054 = clugen(3, 5, 800, [1, 0, 0], pi / 10, [10, 10, 10], 30, 0, 0.5;
    point_dist_fn="n", rng=seed)
plt = plot_examples_3d(
    e052, "e052: llength = 0",
    e053, "e053: llength = 10",
    e054, "e054: llength = 30")

Changing the llength_disp parameter and using a custom llengths_fn function

# PRNG seed
seed = 11

# Custom llengths function: line lengths tend to grow for each new cluster
llen_grow_fn(nclu, llen, llenstd; rng=nothing) =
    llen * (collect(0:(nclu - 1)) + llenstd * randn(rng, nclu))
e055 = clugen(3, 5, 800, [1, 0, 0], pi / 10, [10, 10, 10], 15,  0.0, 0.5;
    point_dist_fn="n", rng=seed)
e056 = clugen(3, 5, 800, [1, 0, 0], pi / 10, [10, 10, 10], 15, 10.0, 0.5;
    point_dist_fn="n", rng=seed)
e057 = clugen(3, 5, 800, [1, 0, 0], pi / 10, [10, 10, 10], 10,  0.1, 0.5;
    llengths_fn=llen_grow_fn, point_dist_fn="n", rng=seed)
plt = plot_examples_3d(
    e055, "e055: llength_disp = 0.0",
    e056, "e056: llength_disp = 10.0",
    e057, "e057: custom llengths function")

Manipulating relative cluster positions

Using the cluster_sep parameter

# PRNG seed
seed = 1
e058 = clugen(3, 8, 1000, [1, 1, 1], pi / 4, [30, 10, 10], 25, 4, 3; rng=seed)
e059 = clugen(3, 8, 1000, [1, 1, 1], pi / 4, [10, 30, 10], 25, 4, 3; rng=seed)
e060 = clugen(3, 8, 1000, [1, 1, 1], pi / 4, [10, 10, 30], 25, 4, 3; rng=seed)
plt = plot_examples_3d(
    e058, "e058: cluster_sep = [30, 10, 10]",
    e059, "e059: cluster_sep = [10, 30, 10]",
    e060, "e060: cluster_sep = [10, 10, 30]")

Changing the cluster_offset parameter and using a custom clucenters_fn function

# PRNG seed
seed = 1

# Custom clucenters function: places clusters in a diagonal
centers_diag_fn(nclu, csep, coff; rng=nothing) = ones(nclu, length(csep)) .* (1:nclu) * maximum(csep) .+ coff'
e061 = clugen(3, 8, 1000, [1, 1, 1], pi / 4, [10, 10, 10], 12, 3, 2.5; rng=seed)
e062 = clugen(3, 8, 1000, [1, 1, 1], pi / 4, [10, 10, 10], 12, 3, 2.5;
    cluster_offset=[20, -20, 20], rng=seed)
e063 = clugen(3, 8, 1000, [1, 1, 1], pi / 4, [10, 10, 10], 12, 3, 2.5;
    cluster_offset=[-50, -50, -50], clucenters_fn=centers_diag_fn, rng=seed)
plt = plot_examples_3d(
    e061, "e061: default",
    e062, "e062: cluster_offset = [20, -20, 20]",
    e063, "e063: custom clucenters function")

Lateral dispersion and placement of point projections on the line

Normal projection placement (default): proj_dist_fn = "norm"

# PRNG seed
seed = 6
e064 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 0.0; rng=seed)
e065 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 1.0; rng=seed)
e066 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 3.0; rng=seed)
plt = plot_examples_3d(
    e064, "e064: lateral_disp = 0",
    e065, "e065: lateral_disp = 1",
    e066, "e066: lateral_disp = 3")

Uniform projection placement: proj_dist_fn = "unif"

# PRNG seed
seed = 6
e067 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 0.0;
    proj_dist_fn="unif", rng=seed)
e068 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 1.0;
    proj_dist_fn="unif", rng=seed)
e069 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 3.0;
    proj_dist_fn="unif", rng=seed)
plt = plot_examples_3d(
    e067, "e067: lateral_disp = 0",
    e068, "e068: lateral_disp = 1",
    e069, "e069: lateral_disp = 3")

Custom projection placement using the Laplace distribution

# PRNG seed
seed = 6

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e070 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 0.0;
    proj_dist_fn=proj_laplace, rng=seed)
e071 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 1.0;
    proj_dist_fn=proj_laplace, rng=seed)
e072 = clugen(3, 4, 1000, [1, 0, 0], pi / 2, [20, 20, 20], 13, 2, 3.0;
    proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_3d(
    e070, "e070: lateral_disp = 0",
    e071, "e071: lateral_disp = 1",
    e072, "e072: lateral_disp = 3")

Controlling final point positions from their projections on the cluster-supporting line

Points on hyperplane orthogonal to cluster-supporting line (default): point_dist_fn = "n-1"

# PRNG seed
seed = 4

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e073 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2; rng=seed)
e074 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    proj_dist_fn="unif", rng=seed)
e075 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_3d(
    e073, "e073: proj_dist_fn=\"norm\" (default)",
    e074, "e074: proj_dist_fn=\"unif\"",
    e075, "e075: custom proj_dist_fn (Laplace)")

Points around projection on cluster-supporting line: point_dist_fn = "n"

# PRNG seed
seed = 4

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e076 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    point_dist_fn="n", rng=seed)
e077 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    point_dist_fn="n", proj_dist_fn="unif", rng=seed)
e078 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    point_dist_fn="n", proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_3d(
    e076, "e076: proj_dist_fn=\"norm\" (default)",
    e077, "e077: proj_dist_fn=\"unif\"",
    e078, "e078: custom proj_dist_fn (Laplace)")

Custom point placement using the exponential distribution

# PRNG seed
seed = 4

# Custom point_dist_fn: final points placed using the Exponential distribution
function clupoints_n_1_exp(projs, lat_std, len, clu_dir, clu_ctr; rng=nothing)
    dist_exp(npts, lstd, rg) = lstd .* rand(rg, Exponential(2 / lstd), npts, 1)
    return CluGen.clupoints_n_1_template(projs, lat_std, clu_dir, dist_exp; rng=rng)
end

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e079 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    point_dist_fn=clupoints_n_1_exp, rng=seed)
e080 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    point_dist_fn=clupoints_n_1_exp, proj_dist_fn="unif", rng=seed)
e081 = clugen(3, 5, 1500, [1, 0, 0], pi / 3, [20, 20, 20], 22, 3, 2;
    point_dist_fn=clupoints_n_1_exp, proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_3d(
    e079, "e079: proj_dist_fn=\"norm\" (default)",
    e080, "e080: proj_dist_fn=\"unif\"",
    e081, "e081: custom proj_dist_fn (Laplace)")

Manipulating cluster sizes

# PRNG seed
seed = 9

# Custom clusizes_fn (e083): cluster sizes determined via the uniform distribution,
# no correction for total points
clusizes_unif(nclu, npts, ae; rng=nothing) =
    rand(rng, DiscreteUniform(1, 2 * npts / nclu), nclu)

# Custom clusizes_fn (e084): clusters all have the same size, no correction for total points
clusizes_equal(nclu, npts, ae; rng=nothing) = (npts ÷ nclu) .* ones(Integer, nclu)

# Custom clucenters_fn (all): yields fixed positions for the clusters
centers_fixed(nclu, csep, coff; rng=nothing) =
    [ -csep[1] -csep[2] -csep[3]; csep[1] -csep[2] -csep[3];
      -csep[1] csep[2] csep[3]; csep[1] csep[2] csep[3] ]
e082 = clugen(3, 4, 1500, [1, 1, 1], pi, [20, 20, 20], 0, 0, 5;
    clucenters_fn=centers_fixed, point_dist_fn="n", rng=seed)
e083 = clugen(3, 4, 1500, [1, 1, 1], pi, [20, 20, 20], 0, 0, 5;
    clucenters_fn=centers_fixed, clusizes_fn=clusizes_unif, point_dist_fn="n",
    rng=seed)
e084 = clugen(3, 4, 1500, [1, 1, 1], pi, [20, 20, 20], 0, 0, 5;
    clucenters_fn=centers_fixed, clusizes_fn=clusizes_equal, point_dist_fn="n",
    rng=seed)
plt = plot_examples_3d(
    e082, "e082: normal dist. (default)",
    e083, "e083: unif. dist. (custom)",
    e084, "e084: equal size (custom)")

Examples in other dimensions

Basic 1D example with density plot

The following example was plotted with the plot_examples_1d() function available here.

# PRNG seed
seed = 27

# Custom proj_dist_fn: point projections placed using the Laplace distribution
proj_laplace(len, n, rng) = rand(rng, Laplace(0, len / 6), n)
e085 = clugen(1, 3, 2000, [1], 0, [10], 6, 1.5, 0; rng=seed)
e086 = clugen(1, 3, 2000, [1], 0, [10], 6, 1.5, 0; proj_dist_fn="unif", rng=seed)
e087 = clugen(1, 3, 2000, [1], 0, [10], 6, 1.5, 0; proj_dist_fn=proj_laplace, rng=seed)
plt = plot_examples_1d(
    e085, "e085: proj_dist_fn=\"norm\" (default)",
    e086, "e086: proj_dist_fn=\"unif\"",
    e087, "e087: custom proj_dist_fn (Laplace)")

5D example with default optional arguments

The following examples were plotted with the plot_examples_nd() function available here.

# Number of dimensions
nd = 5
# PRNG seed
seed = 59
e088 = clugen(nd, 6, 1500, [1, 1, 0.5, 0, 0], pi / 16, 30 .* ones(nd), 30, 4, 3; rng=seed)
plt = plot_examples_nd(e088, "e088: 5D with optional parameters set to defaults")

5D example with proj_dist_fn = "unif" and point_dist_fn = "n"

# Number of dimensions
nd = 5
# PRNG seed
seed = 99
e089 = clugen(nd, 6, 1500, [0.1, 0.3, 0.5, 0.3, 0.1], pi / 12, 30 .* ones(nd), 35, 5, 3.5;
    proj_dist_fn="unif", point_dist_fn="n", rng=seed)
plt = plot_examples_nd(e089, "e089: 5D with proj_dist_fn=\"unif\" and point_dist_fn=\"n\"")

Merging and hierarchical cluster examples

Merging two data sets generated with clugen()

The clumerge() function allows merging two or more data sets. For example:

# PRNG seeds
seed1 = 444
seed2 = 555
e090 = clugen(2, 5, 1000, [1, 1], pi / 12, [20, 20], 14, 1.2, 1.5;
    proj_dist_fn="unif", point_dist_fn="n", rng=seed1)
e091 = clugen(2, 3, 1500, [1, 0], 0.05, [20, 20], 0, 0, 4;
    point_dist_fn="n", cluster_offset=[20, 0], rng=seed2)
e092 = clumerge(e090, e091)
plt = plot_examples_2d(
    e090, "e090: data set 1",
    e091, "e091: data set 2",
    e092, "e092: merged data sets")

In the previous example, clusters from individual data sets remain as separate clusters in the merged data set. It's also possible to maintain the original cluster labels by setting the clusters_field parameter to nothing:

e093 = clumerge(e090, e091; clusters_field=nothing)
plt = plot_examples_2d(
    e090, "e090: data set 1",
    e091, "e091: data set 2",
    e093, "e093: merged data sets")

Adding noise to a clugen()-generated data set

e094 = (points=120 * rand(500, 2) .- 60, clusters=ones(Int32, 500))
e095 = clumerge(e094, e092) # clumerge(e094, e090, e091) would also work
plt = plot_examples_2d(
    e092, "e092: original merged data sets",
    e094, "e094: random uniform noise",
    e095, "e095: data sets with noise";
    pmargin=0)

Merging with data not generated with clugen()

Data generated with clugen() can be merged with other data sets, for example data created with MLJ's scikit-learn-like generators:

# PRNG seeds
seed1 = 333
seed2 = 444
# From the MLJ package
X, y = make_moons(100; noise=0.05, as_table=false, rng=seed1)
e096 = (points=X, clusters=y)
e097 = clugen(2, 5, 200, [1, 1], pi / 12, [1, 1], 0.1, 0.01, 0.25;
           proj_dist_fn="unif", point_dist_fn="n", rng=seed2)
e098 = clumerge(e096, e097)
plt = plot_examples_2d(
    e096, "e096: generated with MLJ's make_moons()",
    e097, "e097: generated with clugen()",
    e098, "e098: merged data")

We can also hierarchize clusters from different sources:

e099 = Dict(pairs(e096)); e099[:hclusters]=ones(Int32, length(e099[:clusters]))
e100 = Dict(pairs(e097)); e100[:hclusters]=ones(Int32, length(e100[:clusters]))
e101 = clumerge(e099, e100; clusters_field=:hclusters)
plt = plot_examples_2d(
    e099, "e099: generated with MLJ's make_moons()",
    e100, "e100: generated with clugen()",
    e101, "e101: merged data";
    clusters_field=:hclusters)