Orthogonal contrasts for analysis of variance are independent linear comparisons between the groups of a factor with at least three fixed levels. The sum of squares for a factor A with a levels is partitioned into a set of a - 1 orthogonal contrasts each with two levels (so each has p = 1 test degree of freedom), to be tested against the same error MS as for the factor. Each contrast is assigned a coefficient at each level of A such that its a coefficients sum to zero, with coefficients of equal value indicating pooled levels of the factor, coefficients of opposite sign indicating factor levels to be contrasted, and a zero indicating an excluded factor level. With this numbering system, two contrasts are orthogonal to each other if the products of their coefficients sum to zero.

For example, a three-level factor A has the following coefficients for its two orthogonal contrasts B and C:

 Factor Contrast * A B C(B) 1 2 0 2 -1 1 3 -1 -1

Contrast B compares group A1 to the average of groups A2 and A3; contrast C (which is nested in B) compares group A2 to group A3. If A1 is a control and A2 and A3 are treatments, then the contrasts test respectively for a difference between the control and the pooled treatments, and for a difference between the treatments. The contrasts are orthogonal because they have a zero sum of the products of their coefficients (2x0 + -1x1 + -1x-1 = 0). If the control belongs to a different level of A, then the rows of the contrast coefficients can be rearranged accordingly without losing orthogonality. These two contrasts can be analysed in GLM with sequential SS by requesting the terms: B + C(B) as fixed factors. This will give SS[B] + SS[C(B)] = SS[A], and df[B] + df[C(B)] = df[A].

A four-level factor A can have the following alternative sets of three orthogonal contrasts B to D (in any permutation of coefficient rows for each set, and analysed in GLM by requesting the fixed contrast terms with sequential SS):

 Factor Contrast set 1 * A B C(B) D(C B) 1 3 0 0 2 -1 2 0 3 -1 -1 1 4 -1 -1 -1

 Factor Contrast set 2 A B C(B) D(B) 1 1 1 0 2 1 -1 0 3 -1 0 1 4 -1 0 -1

 Factor Contrast set 3 A B C D 1 1 1 -1 2 1 -1 1 3 -1 1 1 4 -1 -1 -1

Note that the analysis of contrast set 3, by running GLM with sequential SS on terms B + C + D, is equivalent to running a balanced ANOVA either on terms B + C + C*B or on terms B + D + D*B or on terms C + D + D*C.

A five-level factor A can have the following alternative sets of four orthogonal contrasts B to E (in any permutation of coefficient rows for each set, and analysed in GLM by requesting the fixed contrast terms with sequential SS):

 Factor Contrast set 1 A B C(B) D(B) E(D B) 1 3 1 0 0 2 3 -1 0 0 3 -2 0 2 0 4 -2 0 -1 1 5 -2 0 -1 -1

 Factor Contrast set 2 * A B C(B) D(C B) E(D C B) 1 4 0 0 0 2 -1 3 0 0 3 -1 -1 2 0 4 -1 -1 -1 1 5 -1 -1 -1 -1

 Factor Contrast set 3 A B C(B) D(C B) E(C B) 1 4 0 0 0 2 -1 1 1 0 3 -1 1 -1 0 4 -1 -1 0 1 5 -1 -1 0 -1

 Factor Contrast set 4 A B C(B) D(B) E(B) 1 4 0 0 0 2 -1 1 1 -1 3 -1 1 -1 1 4 -1 -1 1 1 5 -1 -1 -1 -1

Analysis of contrasts on a factor A does not require a significant A effect. If it is significant, however, at least one of the orthogonal sets will contain at least one significant contrast. For a priori planned orthogonal contrasts, the conceptual unit for error rate is conventionally taken to be the individual contrast (rather than the family of contrasts in the full set), just as it is taken to be the individual term in multi-factorial ANOVA partitioned into treatment effects and interactions (rather than the full experiment). The family-wise Type-I error must apply, however, if contrasts are used for post hoc comparisons to locate the biggest differences amongst levels of a treatment. The family-wise error rate for m independent tests, each with an individual error rate α, is 1 - (1 - α)m; the family-wise error rate for m orthogonal contrasts is some small amount less than this because their significance tests are not independent (since all use the same error mean square, even though the contrasts are independent since orthogonal). The size of α can be reduced to control the family-wise error rate, though at a cost of substantially diminishing power to detect individual differences.

In the usual application of orthogonal contrasts, for a priori planned comparisons, the choice of contrast set for a factor A with 4 or more levels will be informed by the study design. For example, a 4-level factor A may be suited to set 1 when the levels include a control and three treatments, whereas it may be suited to set 3 when the levels include cross-factored treatment combinations (e.g., +/+, +/-, -/+, -/-).

Significance tests should be reported for all orthogonal contrasts in the set, because the set partitions the variation due to factor A. For example, consider the two contrasts B and C(B) comprising the set for a 3-level factor A applied to a control and two treatments. Although the contrasts test independent hypotheses, since they are orthogonal, interpretation of the difference between the two treatments in contrast C(B) depends on their combined difference from a control in contrast B, since both contrasts share the same error mean square.

A set of orthogonal contrasts is balanced only if each level of A has the same number of replicates, and if all pairs of crossed contrasts in the set have a consistent number of levels of A representing each pair of contrast levels. For example, in contrast set 3 of the 4-level factor A above, all three of its crossed contrast pairs have one level of factor A representing each pair of contrast levels (1, 1 and 1, -1, and -1, 1, and -1, -1). The same is true of contrast set 4 of the 5-level factor A. For a factor A with eight or more levels, it is possible - though not desirable - to construct unbalanced orthogonal contrast sets with pairs of crossed contrasts having inconsistent numbers of levels of A representing each pair of contrast levels.

These web pages include examples of balanced orthogonal contrasts for a priori planned comparisons amongst three- and five-level single factors, examples for three- and four-level factors in cross-factored designs, including contrast-by-contrast interactions, an example of contrasts for a one-factor randomized block and an example for a two-factor randomized block, and an example of contrasts for a three-factor split plot. Click here for the suite of commands in R (freeware statistical package, R Development Core Team 2010) that will analyze each of the example datasets.

Above five levels for a factor, the number of alternative sets of orthogonal contrasts starts to increase rapidly with each additional level (sequence A165438 in OEIS). The program Contrasts.exe will provide coefficients for all possible sets of balanced orthogonal contrasts on a factor with any number of levels up to a maximum of 12. For a chosen set or range of sets, it will store contrast coefficients in a text file for any specified number of replicates, and will identify the (unique) GLM model for analysing the set (with sequential SS, after each data line has been tagged with the response value for the replicate).

__________________

* This orthogonal set is also known as the set of Helmert contrasts for a factor with this number of levels.

Doncaster, C. P. & Davey, A. J. H. (2007) Analysis of Variance and Covariance: How to Choose and Construct Models for the Life Sciences. Cambridge: Cambridge University Press.

http://www.southampton.ac.uk/~cpd/anovas/datasets/

R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.