* Display syntax commands in the Output Viewer . SET Printback=On Length=59 Width=80. * =========================================================================== File: anova2.sps Author: Bruce Weaver, weaverb@mcmaster.ca Date: 9-May-2002 Notes: Two-way ANOVA examples. * =========================================================================== . * BHSc students may wish to skip everything after the * "Further analysis of the AB interaction" section. From * that point on, the material is more advanced, and not * covered in the BHSc course. * NOTE: This syntax has been tested in version 11 of SPSS. It may * be that some things are slightly different if you have another * version of SPSS . * First, an example of two-way ANOVA with equal sample sizes . DATA LIST LIST / a (f2.0) b (f2.0) y (f5.0). BEGIN DATA. 1 1 5 1 1 6 1 1 2 1 1 4 1 1 2 1 2 5 1 2 4 1 2 2 1 2 4 1 2 5 1 3 3 1 3 2 1 3 6 1 3 5 1 3 4 2 1 5 2 1 6 2 1 2 2 1 5 2 1 6 2 2 9 2 2 5 2 2 3 2 2 7 2 2 3 2 3 7 2 3 10 2 3 11 2 3 12 2 3 12 END DATA. var lab a 'Difficulty of problems' b 'Number of distractors' y 'Time to solve problems'. val lab b 1 'B1- 1 distractor' 2 'B2- 2 distractors' 3 'B3- 3 distractors' / a 1 'A1- easy problems' 2 'A2- difficult problems'. save outfile = 'c:\anova2.sav' /compressed. * List cases to show the structure of the data file . list all. * Note that there are 3 columns in the data file, 2 to code the levels of factors A and B (these must be numeric codes in SPSS--use variable labels to give them more meaningful names) and 1 column to record the DV . **** Compute deviation scores and squared deviation scores; then compute the various sums of squares as in Table 1 in the chapter on two-way ANOVA . **** Use AGGREGATE to get the needed cell means . **** One of these will be the Grand Mean, which will require a BREAK variable with the same value for all records in the file . compute all_ = 1. exe. formats all_ (f2.0). **** Save B-means in 'C:\bmeans.sav' . sort cases by b(a). AGGREGATE /OUTFILE='C:\bmeans.sav' /BREAK=b /n_b = n /bmean = MEAN(y). **** Save A-means in 'C:\ameans.sav', AB-means in 'C:\abmeans.sav' . sort cases by a b (a). AGGREGATE /OUTFILE='C:\ameans.sav' /BREAK=a /n_a = n /amean = MEAN(y). AGGREGATE /OUTFILE='C:\abmeans.sav' /BREAK=a b /n_ab = n /abmean = MEAN(y). **** Save Grand Mean to 'C:\grand.sav' . AGGREGATE /OUTFILE='C:\grand.sav' /BREAK=all_ /N_tot=N /gm = MEAN(y). * Now use MATCH FILES with /TABLE subcommand to add A, B, AB, and Grand Means the working file. * Data-->Merge Files-->Add variables. MATCH FILES /FILE=* /TABLE='C:\grand.sav' /BY all_. EXECUTE. **** Add A-means . MATCH FILES /FILE=* /TABLE='C:\ameans.sav' /BY a. EXECUTE. **** Add B-means . sort cases by b(a). MATCH FILES /FILE=* /TABLE='C:\bmeans.sav' /BY b. EXECUTE. **** Add AB-means . sort cases by a b(a). MATCH FILES /FILE=* /TABLE='C:\abmeans.sav' /BY a b. EXECUTE. * Now compute deviation scores. compute dev1 = y-gm. compute dev2 = abmean - gm. compute dev3 = y - abmean. compute dev4 = amean - gm. compute dev5 = bmean - gm. compute dev6 = (abmean-gm)-(amean - gm) - (bmean - gm). exe. * Now compute squared deviation scores, as in Table 1 of chapter on on two-way ANOVA . compute sqdev1 = dev1**2. compute sqdev2 = dev2**2. compute sqdev3 = dev3**2. compute sqdev4 = dev4**2. compute sqdev5 = dev5**2. compute sqdev6 = dev6**2. exe. var lab dev1 '(Y - GM)' dev2 '(cell mean - GM)' dev3 '(Y - cell mean)' dev4 '(A mean - GM)' dev5 '(B mean - GM)' dev6 '(AB mean - A mean - B mean + GM)' sqdev1 'Sums to SS(Total)' sqdev2 'Sums to SS(cells)' sqdev3 'Sums to SS(error)' sqdev4 'Sums to SS(A)' sqdev5 'Sums to SS(B)' sqdev6 'Sums to SS(AB)'. formats dev1 to sqdev6 (f8.3). means sqdev1 to sqdev6 /cells = sum count. * Get some pieces before using AGGREGATE to caclulate sums . compute dftot = n_tot - 1. compute aflag = ~missing(n_a). compute bflag = ~missing(n_b). compute dfwab = n_ab - 1. compute dfcells = 2*3-1. exe. AGGREGATE /OUTFILE= * /BREAK=all_ /n_tot = first(n_tot) /k_a = sum(aflag) /k_b = sum(bflag) /sstot = sum(sqdev1) /sscells = sum(sqdev2) /sserror = sum(sqdev3) /ssa = sum(sqdev4) /ssb = sum(sqdev5) /ssab = sum(sqdev6) /dferror = sum(dfwab) . compute k_ab = k_a * k_b. compute dfa = k_a - 1. compute dfb = k_b - 1. compute dfab = dfa*dfb. compute dfcells = k_ab - 1. compute dftot = n_tot - 1. compute mscells = sscells/dfcells. compute msa = ssa/dfa. compute msb = ssb/dfb. compute msab = ssab/dfab. compute mserror = sserror/dferror. compute fa = msa/mserror. compute fb = msb/mserror. compute fab = msab/mserror. compute pa = 1 - cdf.f(fa,dfa,dferror). compute pb = 1 - cdf.f(fb,dfb,dferror). compute pab = 1 - cdf.f(fab,dfab,dferror). exe. formats ssa ssb ssab sscells sserror sstot msa msb msab mscells mserror fa fb fab pa pb pab (f8.3). formats dfa dfb dfab dfcells dftot (f4.0). **** Display ANOVA results . **** List information for Between cells, within cells & total . list var SScells dfcells sserror dferror sstot dftot. * Main effect of A. list var SSA dfa msa mserror fa pa. * Main effect of B. list var SSb dfb msb mserror fb pb. * The AB interaction. list var SSab dfab msab mserror fab pab. * Finally, show that SS(A) + SS(B) + SS(AB) = SS(cells). compute sumofss = ssa + ssb + ssab. exe. formats sumofss (f8.3). list var ssa ssb ssab sumofss sscells. * Sum of SSA + SSB + SSAB = SScells, because with equal n's, the effects of A, B, and the AB interaction are orthogonal . **** Finally, erase the temporary files we created earlier . erase file = 'C:\ameans.sav'. erase file = 'C:\bmeans.sav'. erase file = 'C:\abmeans.sav'. erase file = 'C:\grand.sav'. * ================================================================= . **** Now use SPSS GLM UNIVARIATE procedure to produce the same results more easily . get file = 'c:\anova2.sav'. UNIANOVA y BY a b /* Y is the DV; A and B the IV's */ /METHOD = SSTYPE(3) /* Type III sum of squares is the default */ /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /emmeans = table(a) /emmeans = table(b) /emmeans = table(a*b) /plot = profile(b*a) /DESIGN = a b a*b. **** In the ANOVA summary table shown above: SS(Corrected Total) = SS(Total) - SS(Intercept); SS(Corrected Model) = SS(Between Cells). **** NOTE: SS(Corrected Total) = same thing we previously called SS(Total). **** Everything else is as we saw before (Table 4 in the notes). * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * Further analysis of the AB interaction . * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . **** One strategy for determining or describing the nature of an interaction effect is to look at the "simple main effects" of one of the variables; in this case, for example, we might wish to look at the simple main effects of variable B (amount of distraction); to do so, we would look at the effect of B at level 1 of A, and the effect of B at level 2 of A. **** It is not immediately obvious how to get SPSS to produce this analysis; I only know how because I read about it in the SPSS newsgroup. **** One way to do it is to use EMMEANS with COMPARE in GLM UNIANOVA. UNIANOVA y BY a b /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /emmeans = table(a) /emmeans = table(b) /emmeans = table(a*b) COMPARE(b) /* <---- COMPARE option added here */ /plot = profile(b*a) /DESIGN = a b a*b. **** The Univariate Tests shows F-tests for the simple effects of variable B (number of distractors), one F-test for easy problems (A1), and one for difficult problems (A2). **** Simple main effects can also be analyzed using MANOVA syntax; As far as I know, MANOVA cannot be accessed through the pull-down menus since version 9 (possibly earlier); so you have to write the syntax yourself (or copy it from a newsgroup post, like I did!) . MANOVA y BY a(1 2) b(1 3) /NOPRINT PARAM(ESTIM) /METHOD=UNIQUE /ERROR within+residual /DESIGN= a, b W a(1), b W a(2). **** As Howell (1997) puts it: "All sums of squares in the analysis of variance (other than SS_total) represent a partitioning of some larger sum of squares, and the simple effects are no exception." **** In our example, the simple main effects of B represent a partitioning of SS(B) and SS(AB); in other words, SS(B at A1) + SS(B at A2) = SS(B) + SS(AB) . **** SS(B at A1) + SS(B at A2) = 0.13 + 94.53 = 94.66 . **** SS(B) + SS(AB) = 49.400 + 45.267 = 94.667 . * ===================================================================== . * IF YOU ARE IN THE BHSc CLASS, YOU MAY WISH TO STOP HERE. * THE MATERIAL IN THE REMAINDER OF THIS FILE IS BEYOND THE * SCOPE OF YOUR COURSE. * OF COURSE, IF YOU ARE DESPARATELY CURIOUS ABOUT FIXED AND * RANDOM FACTORS AND UNBALANCED DESIGNS, PLEASE FEEL FREE * TO READ ON. * ===================================================================== . * Same data, but with problem difficulty as a random factor . UNIANOVA y BY a b /RANDOM = a /* Problem difficulty is a random factor */ /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /DESIGN = a b a*b. * Note that SPSS has used MS(AB) as the error term for both * main effects. Many textbooks suggest that MS(AB) is the * appropriate error term for the FIXED factor, and MS(error) * is the appropriate error term for the RANDOM factor. * For more on why SPSS has done things this way, read the following: * http://www.angelfire.com/wv/bwhomedir/spss/SPSS_GLM_mixed_model.html . * We can produce the desired F-tests by doing an analysis that treats both A and B as fixed (this will yield the appropriate F-test for the random factor), and adding a custom hypothesis test for the fixed factor, B . UNIANOVA y BY a b /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /test = b vs a*b /* B a fixed factor; use MS(AB) as error term */ /DESIGN = a b a*b. * If problem difficulty (variable A) is a random factor, the appropriate F-test for B (the fixed factor) uses MS(AB) as the error term; this F-test is shown above as the Custom Hypothesis Test. * Note that that the effect of A (number of distractors) was not statistically significant in the first analysis, but is when MS(error) is used as the error term . * =========================================================================== . * Unbalanced design, data from Table 13 in the chapter on two-way ANOVA, or Table 20-1 in Kleinbaum et al (1988, 2nd ed) . * =========================================================================== . DATA LIST LIST / a (f2.0) b (f2.0) y (f5.0). BEGIN DATA. 1 1 2 1 1 5 1 1 8 1 1 6 1 1 2 1 1 4 1 1 3 1 1 10 1 2 7 1 2 5 1 2 8 1 2 6 1 2 3 1 2 5 1 2 6 1 2 4 1 2 5 1 2 6 1 2 8 1 2 9 2 1 4 2 1 6 2 1 3 2 1 3 2 2 7 2 2 7 2 2 8 2 2 6 2 2 4 2 2 9 2 2 8 2 2 7 3 1 8 3 1 7 3 1 5 3 1 9 3 1 9 3 1 10 3 1 8 3 1 6 3 1 8 3 1 10 3 2 5 3 2 8 3 2 6 3 2 6 3 2 9 3 2 7 3 2 7 3 2 8 END DATA. var lab a 'Affective Communication' b 'Patient Worry' Y 'DV'. val lab a 1 'A1- High' 2 'A2- Medium' 3 'A3- Low' / b 1 'B1- Negative' 2 'B2- Positive'. save outfile = 'c:\tab20-1.sav' /compressed. **** Compute sums of squares as we did previously for balanced design . **** Use AGGREGATE to get the needed cell means . **** One of these will be the Grand Mean, which will require a BREAK variable with the same value for all records in the file . compute all_ = 1. exe. formats all_ (f2.0). **** Save B-means in 'C:\bmeans.sav' . sort cases by b(a). AGGREGATE /OUTFILE='C:\bmeans.sav' /BREAK=b /n_b = n /bmean = MEAN(y). **** Save A-means in 'C:\ameans.sav', AB-means in 'C:\abmeans.sav' . sort cases by a b (a). AGGREGATE /OUTFILE='C:\ameans.sav' /BREAK=a /n_a = n /amean = MEAN(y). AGGREGATE /OUTFILE='C:\abmeans.sav' /BREAK=a b /n_ab = n /abmean = MEAN(y). **** Save Grand Mean to 'C:\grand.sav' . AGGREGATE /OUTFILE='C:\grand.sav' /BREAK=all_ /N_tot=N /gm = MEAN(y). * Now use MERGE FILES to add A, B, AB, and Grand Means the working file. * Data-->Merge Files-->Add variables. MATCH FILES /FILE=* /TABLE='C:\grand.sav' /BY all_. EXECUTE. **** Add A-means . MATCH FILES /FILE=* /TABLE='C:\ameans.sav' /BY a. EXECUTE. **** Add B-means . sort cases by b(a). MATCH FILES /FILE=* /TABLE='C:\bmeans.sav' /BY b. EXECUTE. **** Add AB-means . sort cases by a b(a). MATCH FILES /FILE=* /TABLE='C:\abmeans.sav' /BY a b. EXECUTE. * Now compute deviation scores. compute dev1 = y-gm. compute dev2 = abmean - gm. compute dev3 = y - abmean. compute dev4 = amean - gm. compute dev5 = bmean - gm. compute dev6 = (abmean-gm)-(amean - gm) - (bmean - gm). exe. * Now compute squared deviation scores, as in Table 1 of chapter on on two-way ANOVA . compute sqdev1 = dev1**2. compute sqdev2 = dev2**2. compute sqdev3 = dev3**2. compute sqdev4 = dev4**2. compute sqdev5 = dev5**2. compute sqdev6 = dev6**2. exe. var lab dev1 '(Y - GM)' dev2 '(cell mean - GM)' dev3 '(Y - cell mean)' dev4 '(A mean - GM)' dev5 '(B mean - GM)' dev6 '(AB mean - A mean - B mean + GM)' sqdev1 'Sums to SS(Total)' sqdev2 'Sums to SS(cells)' sqdev3 'Sums to SS(error)' sqdev4 'Sums to SS(A)' sqdev5 'Sums to SS(B)' sqdev6 'Sums to SS(AB)'. formats dev1 to sqdev6 (f8.3). means sqdev1 to sqdev6 /cells = sum count. * Get some pieces before using AGGREGATE to caclulate sums . compute dftot = n_tot - 1. compute aflag = ~missing(n_a). compute bflag = ~missing(n_b). compute dfwab = n_ab - 1. compute dfcells = 2*3-1. exe. AGGREGATE /OUTFILE= * /BREAK=all_ /n_tot = first(n_tot) /k_a = sum(aflag) /k_b = sum(bflag) /sstot = sum(sqdev1) /sscells = sum(sqdev2) /sserror = sum(sqdev3) /ssa = sum(sqdev4) /ssb = sum(sqdev5) /ssab = sum(sqdev6) /dferror = sum(dfwab) . compute k_ab = k_a * k_b. compute dfa = k_a - 1. compute dfb = k_b - 1. compute dfab = dfa*dfb. compute dfcells = k_ab - 1. compute dftot = n_tot - 1. compute mscells = sscells/dfcells. compute msa = ssa/dfa. compute msb = ssb/dfb. compute msab = ssab/dfab. compute mserror = sserror/dferror. compute fa = msa/mserror. compute fb = msb/mserror. compute fab = msab/mserror. compute pa = 1 - cdf.f(fa,dfa,dferror). compute pb = 1 - cdf.f(fb,dfb,dferror). compute pab = 1 - cdf.f(fab,dfab,dferror). exe. formats ssa ssb ssab sscells sserror sstot msa msb msab mscells mserror fa fb fab pa pb pab (f8.3). formats dfa dfb dfab dfcells dftot (f4.0). **** Display ANOVA results . **** List information for Between cells, within cells & total . list var SScells dfcells sserror dferror sstot dftot. * Main effect of A. list var SSA dfa msa mserror fa pa. * Main effect of B. list var SSb dfb msb mserror fb pb. * The AB interaction. list var SSab dfab msab mserror fab pab. * Finally, show that SS(A) + SS(B) + SS(AB) = SS(cells). compute sumofss = ssa + ssb + ssab. exe. formats sumofss (f8.3). list var ssa ssb ssab sumofss sscells. * SSA + SSB + SSAB <> SScells, because with unequal n's, the * effects of A, B, and the AB interaction are NOT orthogonal . **** Erase the temporary files we created earlier . erase file = 'C:\ameans.sav'. erase file = 'C:\bmeans.sav'. erase file = 'C:\abmeans.sav'. erase file = 'C:\grand.sav'. * ================================================================= . * Kleinbaum text suggests using regression to analyze these data; we need to use a series of regression models to get the required sums of squares. get file = 'c:\tab20-1.sav' . * First, compute dummy variables . compute a2 = (a=2). compute a3 = (a=3). compute b2 = (b=2). compute a2b2 = a2*b2. compute a3b2 = a3*b2. exe. formats a2 to a3b2 (f2.0). REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS CI R ANOVA CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT y /METHOD= test (a2 a3) /test(a2 a3) (b2) /test (a2 a3) (b2) (a2b2 a3b2) /RESIDUALS HIST(ZRESID) . * SS(A) = 38.756 = SS for subset (A2,A3) in Model 1 . * SS(B) = 5.861 = SS for subset (B2) in Model 2 . * SS(AB) = 27.383 = SS for subset (A2B2,A3B2) in Model 3 . * Note that SS(A) + SS(B) + SS(AB) = 72 = SS(cells) . * Kleinbaum et al use the F-tests from Models 1, 2, and 3 for the effects of A, B, and AB respectively; in other words, they do not use the same error term for all of these tests. * Note that these F-tests are identical to the F Change tests presented in the Model Summary output table. * In SPSS, there is an easier way to get essentially the same results: i.e., using the GLM UNIANOVA procedure with Type I SS; I say essentially the same, because the SS for A, B, and AB will be exactly as we saw above; but the F-ratios will be calculated using a common error term (the error term from Model 3 above) . * NOTE: Using GLM UNIANOVA also negates the need to compute dummy variables--they are generated internally by UNIANOVA . UNIANOVA y BY a b /* Y is the DV; A and B the IV's */ /METHOD = SSTYPE(1) /* Type III sum of squares is the default */ /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /print etasq /DESIGN = a b a*b. **** One problem with this approach is that the results depend on the order in which you enter the variables; note how things change when we enter B, then A, then AB . REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS CI R ANOVA CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT y /METHOD= test (b2) /test (b2) (a2 a3) /test (b2) (a2 a3)(a2b2 a3b2) /RESIDUALS HIST(ZRESID) . * SS(A) = 42.747 = SS for subset (A2,A3) in Model 2 . * SS(B) = 1.870 = SS for subset (B2) in Model 1 . * SS(AB) = 27.383 = SS for subset (A2B2,A3B2) in Model 3 . * Now the same analysis using UNIANOVA and Type I SS . UNIANOVA y BY a b /* Y is the DV; A and B the IV's */ /METHOD = SSTYPE(1) /* Type III sum of squares is the default */ /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /print etasq /DESIGN = b a a*b. **** Type I SS is sometimes called the "hierarchical" sum of squares, or the "sequential" sum of squares; as you can see above, SS(A) + SS(B) + SS(AB) = SS(Corrected Model) = SS(cells); but the values of SS(A) and SS(B) depend on the order in which you enter A and B in the model; this is because the portion of variance that is explained by BOTH A and B will be included as part of SS(A) when A is entered first, and as part of SS(B) when B is entered first. **** An alternative approach is to use a method that includes only that portion of the variance that is UNIQUE to each of A, B, and AB; this is what you get if you select Type III sums of squares (which is the default option in SPSS); because it is including only the unique portions of variance, the order in which variables are entered no longer matters . * GLM using Type III SS . UNIANOVA y BY a b /* Y is the DV; A and B the IV's */ /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /print etasq /DESIGN = a b a*b. * Now change order of entry (on the /DESIGN line) to B, A, AB, and show that results do not change . UNIANOVA y BY a b /* Y is the DV; A and B the IV's */ /METHOD = SSTYPE(3) /* Type III sum of squares is the default */ /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /print etasq /DESIGN = b a a*b. **** SS(A) = 44.577, SS(B) = 11.134, and SS(AB) = 27.383 for both of the foregoing analyses: The order in which terms are entered does not matter when you use Type III (or 'unique') SS . **** Finally, note that GLM UNIANOVA with Type III SS produces results that are approximated by the method of "unweighted means" . * Repeat and produce tables/plots of estimated marginal means . UNIANOVA y BY a b /* Y is the DV; A and B the IV's */ /METHOD = SSTYPE(3) /* Type III sum of squares is the default */ /INTERCEPT = INCLUDE /CRITERIA = ALPHA(.05) /print etasq /emmeans = table(a) /emmeans = table(b) /emmeans = table(a*b) /plot = profile(b*a) /plot = profile(a*b) /DESIGN = a b a*b. new file. erase file = 'C:\anova2.sav'. erase file = 'C:\tab20-1.sav'. * =========================================================================== .