ProportionalFitting
Multidimensional iterative proportional fitting in Julia.
ProportionalFitting implements a multidimensional version of the factor estimation method for performing iterative proportional fitting (also called RAS algorithm, raking, matrix scaling).
In the two-dimensional case, iterative proportional fitting means changing a matrix $X$ to have marginal sum totals $u, v$. One prime use is in survey data analysis, where $X$ could be your data's cross-tabulation of demographic characteristics, and $u, v$ the known population proportions of those characteristics.
Getting started
Assume you have a matrix X
:
X = [40 30 20 10; 35 50 100 75; 30 80 70 120; 20 30 40 50]
4×4 Matrix{Int64}:
40 30 20 10
35 50 100 75
30 80 70 120
20 30 40 50
And the row and column margins of another matrix Y
(u
and v
, respectively) but not the full matrix:
u = [150, 300, 400, 150]
v = [200, 300, 400, 100]
4-element Vector{Int64}:
200
300
400
100
Then the ipf
function from ProportionalFitting will find the array factors which adjust matrix X
to have the margins u
and v
:
fac = ipf(X, [u, v])
Factors for 2D array:
[1]: [0.9986403503185242, 0.8833622306385376, 1.1698911437112522, 0.8895042701910321]
[2]: [1.616160156063788, 1.5431801747375655, 1.771623700829941, 0.38299396265192226]
Array factors (ArrayFactors
) are a specific type exported by ProportionalFitting with a few methods, for example Array()
:
Array(fac)
4×4 Matrix{Float64}:
1.61396 1.54108 1.76921 0.382473
1.42765 1.36319 1.56499 0.338322
1.89073 1.80535 2.07261 0.448061
1.43758 1.37267 1.57587 0.340675
To create the adjusted matrix Z
with the margins u
and v
, we perform elementwise multiplication of this matrix with X
:
Z = Array(fac) .* X
4×4 Matrix{Float64}:
64.5585 46.2325 35.3843 3.82473
49.9679 68.1594 156.499 25.3742
56.7219 144.428 145.082 53.7673
28.7516 41.18 63.0347 17.0337
We can then check that the marginal sum totals are correct:
ArrayMargins(Z)
Margins of 2D array:
[1]: [150.0000000009452, 299.99999999962523, 399.99999999949796, 149.99999999993148]
[2]: [200.0, 299.99999999999994, 399.99999999999994, 99.99999999999997]
Inconsistent margins
If the margins are inconsistent (i.e., the margins do not sum to the same amounts) then both X
and the margins will be transformed to proportions.
m = ArrayMargins([[12, 23, 14, 35], [17, 44, 12, 33]])
af = ipf(X, m)
Factors for 2D array:
[1]: [1.0426463240564372, 0.9297504397571145, 0.41888533101520664, 2.4187234600325582]
[2]: [0.949061653550789, 1.8063342017380022, 0.3775175481010491, 0.9908356392601086]
Then, Z
needs to be computed in a different way as well:
X_prop = X ./ sum(X)
Z = X_prop .* Array(af)
4×4 Matrix{Float64}:
0.0494768 0.0706263 0.00984043 0.0129136
0.0386046 0.104965 0.0438746 0.0863653
0.0149081 0.0756647 0.0138369 0.062257
0.0573879 0.163838 0.0456555 0.149785
Multidimensional arrays
ProportionalFitting can also deal with multidimensional arrays of arbitrary shape. For example, consider the following (3, 2, 3)
array and target margins:
X = reshape(1:12, 2, 3, 2)
m = [[48, 60], [28, 36, 44], [34, 74]]
3-element Vector{Vector{Int64}}:
[48, 60]
[28, 36, 44]
[34, 74]
Now we can run ipf
to compute the adjustment:
fac = ipf(X, m)
Factors for 3D array:
[1]: [0.7012649814229596, 0.7413620380098563]
[2]: [1.59452605457307, 1.3830398765538434, 1.2753933840995484]
[3]: [1.6474060813606772, 1.2880517029245548]
And we can create the adjusted array Z
:
Array(fac) .* X
2×3×2 Array{Float64, 3}:
[:, :, 1] =
1.84211 4.79335 7.36711
3.89487 6.75656 9.34601
[:, :, 2] =
10.082 11.2433 12.6722
12.1811 13.2068 14.6147
Multidimensional margins
ProportionalFitting can also deal with multidimensional margins of arbitrary shape. For example, consider the same (3, 2, 3)
array as before:
X = reshape(1:12, 2, 3, 2)
2×3×2 reshape(::UnitRange{Int64}, 2, 3, 2) with eltype Int64:
[:, :, 1] =
1 3 5
2 4 6
[:, :, 2] =
7 9 11
8 10 12
We have multidimensional target margins (a 1D vector and a 2D matrix):
m1 = [48, 60]
m2 = [9 11 14; 19 25 30]
mar = [m1, m2]
2-element Vector{Array{Int64}}:
[48, 60]
[9 11 14; 19 25 30]
Here, m1
belongs to the first dimension of target matrix, and m2
belongs to the third and second dimension (in that order). This can be encoded in a DimIndices
object as follows:
dimid = DimIndices([1, [3, 2]])
Indices for 3D array:
[1][3, 2]
Together, the margins and dimension indices they belong to constitute an ArrayMargins
object:
m = ArrayMargins(mar, dimid)
Margins of 3D array:
[1]: [48, 60]
[3, 2]: [9 11 14; 19 25 30]
Now we can run ipf
to compute the adjustment:
fac = ipf(X, m)
Factors for 3D array:
[1]: [0.9767649889221948, 1.0193941000886413]
[3, 2]: [2.9845270289155916 1.569663514602841 1.272705342262194; 1.2672996639944127 1.3168411514607572 1.3056452924080841]
And we can create the adjusted array Z
:
Z = Array(fac) .* X
2×3×2 Array{Float64, 3}:
[:, :, 1] =
2.91518 4.59958 6.21567
6.08482 6.40042 7.78433
[:, :, 2] =
8.66498 11.5762 14.0284
10.335 13.4238 15.9716
We then also use ArrayMargins
to check whether the margins of this array are indeed as expected!
ArrayMargins(Z, dimid)
Margins of 3D array:
[1]: [48.000000000010544, 59.99999999998945]
[3, 2]: [9.0 11.0 14.000000000000002; 19.0 24.999999999999996 30.0]