Site hosted by Angelfire.com: Build your free website today!

Introduction

1  The Latent Budget Model

The latent budget model (LBM) is a mixture model for compositional data and enables us to obtain insight in a compositional data set without the worries of a troubled covariance matrix. By Performing latent budget analysis (LBA) we approximate I observed budgets, which may represent persons, groups or objects, by a small number of latent budgets, consisting of typical characteristics of the sample. Such approximation can be used for classification, for example.

The idea of the LBM was proposed by Goodman (1974a), and elaborated by Clogg (1981) by interpreting a simple latent class model in an asymmetric way. Independently, de Leeuw & van der Heijden (1988) introduced the model and named it ``latent budget analysis'' because they used it to analyze time-budget data. The model was also introduced independently in geology by Renner (1988; 1993), where it is known as the endmember model.

Consider an I × J compositional data matrix P, consisting of I observed budgets p i, with components pj|i. In the LBM the observed budgets p i are approximated by expected budgets pi, which are mixtures of K  (K £ min(I,J)) typical compositions or latent budgets. The latent budgets are denoted by bk  (k = 1, ..., K), and the model can be written as

pi = a1|i b1 + ... + ak|i bk + ... + aK|i bK
  
(i = 1, ..., I)
(1)
where ak|i    (i = 1, ..., I; k = 1, ..., K) are the mixing parameters. The elements of pi are pj|i   (j = 1, ..., J) and are called expected components. The elements of bk are bj|k   (j = 1, ..., J) and are called latent components. Alternative notations for (1) are the scalar notation
pj|i = K
å
k = 1 
ak|i bj|k
  
(i = 1, ..., I; j = 1, ..., J)
(2)
and matrix notation
P = ABT
(3)
In (3) P is an I ×J matrix whose rows are the expected budgets; A is an I ×K matrix of mixing parameters, and B is an J ×K matrix whose columns are the latent budgets. The superscript ``T'' denotes the transpose of a matrix. The latent budget model with K latent budgets is denoted as LBM(K). Similar to the observed components, the parameters of the LBM are subject to the sum constraints
J
å
j = 1 
pj|i = K
å
k = 1 
ak|i = J
å
j = 1 
bj|k = 1
(4)
and the nonnegativity constraints
0   £   pj|i   £   1,   0   £   ak|i   £   1,   0   £   bj|k   £   1.
(5)
Thus, all parameters are proportions and this facilitates the interpretation of the model. In fact, it has been argued frequently that its ease of interpretation is one of the main reasons to use LBA (for example de Leeuw & van der Heijden, 1988; de Leeuw et al., 1990; van der Ark & van der Heijden, 1998; van den Brakel, 1996).

If the data have a product-multinomial distribution, we can compute the unconditional expected probabilities pij from the expected components. The following properties hold for the expected components and the corresponding unconditional probabilities:

pij = pj|i / pi+,
pi+ = pi+,
p+j = p+j
(see de Leeuw et al., 1990).

Van der Heijden, Mooijaart & de Leeuw (1992) proposed two ways to interpret the model (see also van der Ark & van der Heijden, 1998), which we will call the mixture model interpretation and the MIMIC-model interpretation (Multiple Indicator Multiple Cause-model, Goodman, 1974a; see also Clogg, 1981). Thus far we have treated the LBM as a mixture model and the interpretation given earlier is as follows: the LBM writes the expected budgets as a mixture of a small number of typical, or latent, budgets. Hence, each expected budget is built up out of the K latent budgets, and the mixing parameters determine to what extent. The latent budgets can be characterized by comparing them with the latent budget of LBM(1). LBM(1) is the independence model, with a1|i = 1   (i = 1, ..., I) and bj|1 = p+j   (j = 1, ..., J), in this case pi = b1 for i = 1, ..., I. Hence, if latent component bj|k is greater than that component in the independence model, p+j, then bk is characterized by the j-th category. On the other hand, if a bj|k is less than p+j, then the j-th category is of lesser importance. The relative importance of each latent budget, in terms of how much of the expected data they account for, is expressed by the budget proportions pk = åi pi+ ak|i. pk (k = 1, ..., K) also denotes the probability of latent budget k when there is no information about the level of the row variable. To understand how the expected budgets are constructed from the latent budgets we must compare the mixing parameters to pk. If ak|i > pk then expected budget pi is characterized more than average by latent budget bk, and If ak|i < pk then expected budget pi is characterized less than average by latent budget bk. In practice the mixture model interpretation is carried out most easily when we first characterize the latent budgets and then interpret the expected budgets in terms of the latent budgets.

Table 1: Voting behavior by city type in the 1986 elections in the Netherlands, in frequencies (upper half) and in components (lower half). The political parties are PvdA (labor party), CDA (Christian democrats), VVD (liberals), D66 (democrats), left (other left-wing parties), right (other right-wing parties).

PvdA CDA VVD D66 left right
Rural 285 482 186 49 21 60 1083
Rural industrialized 620 914 308 102 42 97 2083
Commuter 355 460 347 104 36 47 1349
Small city 336 337 168 62 27 46 976
Middle large city 548 455 233 91 47 43 1417
Large city 903 516 343 153 110 37 2062
Rural .263 .445 .172 .045 .019 .055 1.00
Rural industrialized .298 .439 .148 .049 .020 .047 1.00
Commuter .263 .341 .257 .077 .027 .035 1.00
Small city .344 .345 .172 .064 .028 .047 1.00
Middle large city .387 .321 .164 .064 .033 .030 1.00
Large city .438 .250 .166 .074 .053 .018 1.00
Source: Statistics Netherlands (1987)

For interpreting the LBM as a MIMIC-model, we view the observed components as conditional proportions of the row variable X (for example ``city type'' in Table 1), with I categories, and the column variable Y (for example ``political party'' in Table 1), with J categories. If we assume that the row variable and the column variable are independent given some latent variable Z with K categories, then the LBM describes the relationship between the row variable and the column variable in an asymmetric way, i.e. pj|i = P(Y = j|X = i) denotes the probability to respond to category j of Y, given that one belongs to the i-th category of X; these probabilities are explained by ak|i = P(Z = k|X = i), which is the probability that row category i belongs to latent category k, and bj|k = P(Y = j|Z = k), which is the probability that a member of latent category k responds to the j-th category of Y.

If the compositional data do not have a product multinomial distribution then the MIMIC-model interpretation may be troublesome: for example, if each observed budgets represents a multivariate observation on a single subject, then it is unclear what P(Y = j|Z = k) means. If the rows of the compositional data are not independent, for example if they denote groups, and people may belong to more than one group, then P(Z = k|Y = i) is not well defined.

A graphic representation of a mixture model and a MIMIC-model is given in Figure . In the left panel of Figure  the squares represent the expected budgets pi and the circles the latent budgets bk. The arrows represent the mixing parameters ak|i. In the right panel the squares on the left-hand side represent the row categories and the squares on the right-hand side represent the column categories. The arrows on the left-hand side represent the mixing parameters ak|i and the arrows on the right-hand side represent the latent components bj|k.

Graphic display of a Mixture model and a MIMIC-model.

As an example of LBA we analyzed the data in Table 1 with LBM(3). The parameters are in Table  and have been identified 1.

Table 2: Mixing parameters and latent components of the LBM(3) solution of the election data in Table 1. The budget proportions (pk) and the independence budget (p+1, ..., p+j, ..., p+J) are also given.

row categories mixing parameters
k = 1 k = 2 k = 3 indep
Rural 0.12 0.65 0.23 1.00
Rural Ind. 0.18 0.61 0.20 1.00
Commuter 0.20 0.44 0.36 1.00
Small city 0.30 0.46 0.24 1.00
Medium city 0.39 0.38 0.22 1.00
Large city 0.54 0.32 0.23 1.00
pk0.31 0.45 0.24 1.00
column categories latent components
k = 1k = 2 k = 3 indep
PvdA 0.70 0.28 0.00 0.34
CDA 0.16 0.63 0.08 0.35
VVD 0.00 0.01 0.71 0.18
D66 0.06 0.00 0.18 0.06
Left-wing 0.08 0.00 0.03 0.03
Right-wing 0.00 0.08 0.00 0.03

The mixture model interpretation of Table 2 is as follows: first, we interpret the latent budgets by comparing them to the independence budget. The first latent budget has greater proportions in the components ``PvdA'' (labor) and the ``small left-wing'' parties than the independence budget, and can be described as a ``socialist budget''. The second latent budget has greater proportions in the components ``CDA'' (Christian Democrats) and ``small right-wing'' parties than the independence budget. Since the right-wing parties are conservative Christian parties, we can describe this budget as a ``christian/conservative budget''. The third latent budget has greater proportions in the components ``VVD'' (right-wing liberals) and ``D66'' (left-wing liberals) and can be described as a ``liberal'' budget.

By comparing the mixing parameters to the budget proportions pk, we see that the rural and rural industrialized areas predominantly have a christian/conservative voting pattern. Commuters are predominantly liberal. The small cities display the average voting pattern, because the mixing parameters are almost equal to pk. In the larger cities the socialist budget is most important.

Alternatively, from the MIMIC-model interpretation we may conclude that subjects from the rural areas have a higher than average probability to be a member of latent stage 2, commuters have a probability higher than average to be a member of latent stage 3, and subjects from the bigger cities have a probability higher than average to be a member of latent stage 1. Subjects of latent stage 1 predominantly vote left-wing and PvdA (labor), subjects of latent stage 2 vote right-wing and CDA (Christian Democrats) and subjects of latent stage 3 predominantly vote liberal (VVD and D66). Interpretation of the latent stages is often difficult. However, we can describe latent stage 1 as the level dominated by left-wing oriented city people, latent stage 2 as a level dominated by religious rural people, and latent stage 3 as a level dominated by liberal commuters.

We conclude this Section with the remark that alternative notations for the parameters are in use, that more or less explicitly indicate that the parameters are conditional proportions. A review is presented in Table . Sometimes, for example in the discussion on the relationship between LBA and Latent Class Analysis (LCA; see van der Ark & van der Heijden, 1996, Section 5) an alternative notation is more convenient.

Table 3: Alternative notations for the observed components and the latent budget parameters.

component 1 2 3 4
latent component bjk bj|k pj|k pjk[`X]Z
observed component pij pj|i pj|i pijX[`Y]
expected component pij pj|i pj|i pijX[`Y]
mixing parameter aik ak|i pk|i pikX[`Z]
1 = de Leeuw et al. (1990).
2 = van der Ark et al. (in press), this monograph.
3 = van der Ark & van der Heijden (1998).
4 = van der Heijden et al. (1992); LCA literature.


Footnotes:

1 The parameter estimates of the LBM should be identified before the latent budget solution can be interpreted. This problem is discussed in Chapter 2. Here we identified the parameter estimates with the outer extreme solution (OES), see van der Ark, van der Heijden & Sikkel (1999).

Back to main page