ProportionalFitting

Multidimensional iterative proportional fitting in Julia.

ProportionalFitting implements a multidimensional version of the factor estimation method for performing iterative proportional fitting (also called RAS algorithm, raking, matrix scaling).

In the two-dimensional case, iterative proportional fitting means changing a matrix $X$ to have marginal sum totals $u, v$. One prime use is in survey data analysis, where $X$ could be your data's cross-tabulation of demographic characteristics, and $u, v$ the known population proportions of those characteristics.

Getting started

Assume you have a matrix X:

X = [40 30 20 10; 35 50 100 75; 30 80 70 120; 20 30 40 50]
4×4 Matrix{Int64}:
 40  30   20   10
 35  50  100   75
 30  80   70  120
 20  30   40   50

And the row and column margins of another matrix Y (u and v, respectively) but not the full matrix:

u = [150, 300, 400, 150]
v = [200, 300, 400, 100]
4-element Vector{Int64}:
 200
 300
 400
 100

Then the ipf function from ProportionalFitting will find the array factors which adjust matrix X to have the margins u and v:

fac = ipf(X, [u, v])
Factors for 2D array:
  [1]: [0.9986403503185242, 0.8833622306385376, 1.1698911437112522, 0.8895042701910321]
  [2]: [1.616160156063788, 1.5431801747375655, 1.771623700829941, 0.38299396265192226]

Array factors (ArrayFactors) are a specific type exported by ProportionalFitting with a few methods, for example Array():

Array(fac)
4×4 Matrix{Float64}:
 1.61396  1.54108  1.76921  0.382473
 1.42765  1.36319  1.56499  0.338322
 1.89073  1.80535  2.07261  0.448061
 1.43758  1.37267  1.57587  0.340675

To create the adjusted matrix Z with the margins u and v, we perform elementwise multiplication of this matrix with X:

Z = Array(fac) .* X
4×4 Matrix{Float64}:
 64.5585   46.2325   35.3843   3.82473
 49.9679   68.1594  156.499   25.3742
 56.7219  144.428   145.082   53.7673
 28.7516   41.18     63.0347  17.0337

We can then check that the marginal sum totals are correct:

ArrayMargins(Z)
Margins of 2D array:
  [1]: [150.0000000009452, 299.99999999962523, 399.99999999949796, 149.99999999993148]
  [2]: [200.0, 299.99999999999994, 399.99999999999994, 99.99999999999997]

Inconsistent margins

If the margins are inconsistent (i.e., the margins do not sum to the same amounts) then both X and the margins will be transformed to proportions.

m = ArrayMargins([[12, 23, 14, 35], [17, 44, 12, 33]])
af = ipf(X, m)
Factors for 2D array:
  [1]: [1.0426463240564372, 0.9297504397571145, 0.41888533101520664, 2.4187234600325582]
  [2]: [0.949061653550789, 1.8063342017380022, 0.3775175481010491, 0.9908356392601086]

Then, Z needs to be computed in a different way as well:

X_prop = X ./ sum(X)
Z = X_prop .* Array(af)
4×4 Matrix{Float64}:
 0.0494768  0.0706263  0.00984043  0.0129136
 0.0386046  0.104965   0.0438746   0.0863653
 0.0149081  0.0756647  0.0138369   0.062257
 0.0573879  0.163838   0.0456555   0.149785

Multidimensional arrays

ProportionalFitting can also deal with multidimensional arrays of arbitrary shape. For example, consider the following (3, 2, 3) array and target margins:

X = reshape(1:12, 2, 3, 2)
m = [[48, 60], [28, 36, 44], [34, 74]]
3-element Vector{Vector{Int64}}:
 [48, 60]
 [28, 36, 44]
 [34, 74]

Now we can run ipf to compute the adjustment:

fac = ipf(X, m)
Factors for 3D array:
  [1]: [0.7012649814229596, 0.7413620380098563]
  [2]: [1.59452605457307, 1.3830398765538434, 1.2753933840995484]
  [3]: [1.6474060813606772, 1.2880517029245548]

And we can create the adjusted array Z:

Array(fac) .* X
2×3×2 Array{Float64, 3}:
[:, :, 1] =
 1.84211  4.79335  7.36711
 3.89487  6.75656  9.34601

[:, :, 2] =
 10.082   11.2433  12.6722
 12.1811  13.2068  14.6147

Multidimensional margins

ProportionalFitting can also deal with multidimensional margins of arbitrary shape. For example, consider the same (3, 2, 3) array as before:

X = reshape(1:12, 2, 3, 2)
2×3×2 reshape(::UnitRange{Int64}, 2, 3, 2) with eltype Int64:
[:, :, 1] =
 1  3  5
 2  4  6

[:, :, 2] =
 7   9  11
 8  10  12

We have multidimensional target margins (a 1D vector and a 2D matrix):

m1 = [48, 60]
m2 = [9 11 14; 19 25 30]
mar = [m1, m2]
2-element Vector{Array{Int64}}:
 [48, 60]
 [9 11 14; 19 25 30]

Here, m1 belongs to the first dimension of target matrix, and m2 belongs to the third and second dimension (in that order). This can be encoded in a DimIndices object as follows:

dimid = DimIndices([1, [3, 2]])
Indices for 3D array:
[1][3, 2]

Together, the margins and dimension indices they belong to constitute an ArrayMargins object:

m = ArrayMargins(mar, dimid)
Margins of 3D array:
  [1]: [48, 60]
  [3, 2]: [9 11 14; 19 25 30]

Now we can run ipf to compute the adjustment:

fac = ipf(X, m)
Factors for 3D array:
  [1]: [0.9767649889221948, 1.0193941000886413]
  [3, 2]: [2.9845270289155916 1.569663514602841 1.272705342262194; 1.2672996639944127 1.3168411514607572 1.3056452924080841]

And we can create the adjusted array Z:

Z = Array(fac) .* X
2×3×2 Array{Float64, 3}:
[:, :, 1] =
 2.91518  4.59958  6.21567
 6.08482  6.40042  7.78433

[:, :, 2] =
  8.66498  11.5762  14.0284
 10.335    13.4238  15.9716

We then also use ArrayMargins to check whether the margins of this array are indeed as expected!

ArrayMargins(Z, dimid)
Margins of 3D array:
  [1]: [48.000000000010544, 59.99999999998945]
  [3, 2]: [9.0 11.0 14.000000000000002; 19.0 24.999999999999996 30.0]