Loading Calculator...
Please wait a moment
Please wait a moment
Perform one-way analysis of variance (ANOVA) to test whether the means of multiple groups are statistically different using the F-test
The table below shows critical F-values at the 0.05 significance level. If your calculated F-statistic exceeds the critical value for your degrees of freedom, the result is statistically significant.
| df1 (Between) | df2 = 10 | df2 = 15 | df2 = 20 | df2 = 30 | df2 = 60 | df2 = 120 |
|---|---|---|---|---|---|---|
| 2 | 4.10 | 3.68 | 3.49 | 3.32 | 3.15 | 3.07 |
| 3 | 3.71 | 3.29 | 3.10 | 2.92 | 2.76 | 2.68 |
| 4 | 3.48 | 3.06 | 2.87 | 2.69 | 2.53 | 2.45 |
| 5 | 3.33 | 2.90 | 2.71 | 2.53 | 2.37 | 2.29 |
| 6 | 3.22 | 2.79 | 2.60 | 2.42 | 2.25 | 2.18 |
| 7 | 3.14 | 2.71 | 2.51 | 2.33 | 2.17 | 2.09 |
| 8 | 3.07 | 2.64 | 2.45 | 2.27 | 2.10 | 2.02 |
| 9 | 3.02 | 2.59 | 2.39 | 2.21 | 2.04 | 1.96 |
| 10 | 2.98 | 2.54 | 2.35 | 2.16 | 1.99 | 1.91 |
These values represent Fcritical for α = 0.05. df1 = k - 1 (number of groups minus 1). df2 = N - k (total observations minus number of groups).
Analysis of Variance (ANOVA) is a statistical method developed by Ronald Fisher in the 1920s that tests whether the means of three or more groups are significantly different from each other. Despite its name referring to "variance," ANOVA is fundamentally a test about means. It works by comparing the variance between group means to the variance within groups. If the between-group variance is substantially larger than the within-group variance, this provides evidence that at least one group mean differs from the others.
One-way ANOVA examines the effect of a single categorical independent variable (called a factor) on a continuous dependent variable. For example, comparing test scores across three different teaching methods, or comparing plant growth across four different fertilizers. Two-way ANOVA extends this to two factors simultaneously and can detect interaction effects between them. There are also repeated measures ANOVA for within-subjects designs and MANOVA for multiple dependent variables.
ANOVA is preferred over running multiple t-tests because performing many pairwise comparisons inflates the Type I error rate (the probability of a false positive). With three groups, you would need three t-tests, and the chance of at least one false positive rises from 5% to about 14%. With five groups, that probability jumps to nearly 40%. ANOVA controls this by testing all groups simultaneously in a single test, maintaining the overall error rate at the chosen significance level (typically 0.05).
The core output of ANOVA is the F-statistic, which follows an F-distribution under the null hypothesis. When the F-statistic is large enough to exceed the critical value (or when the p-value is less than the significance level), you reject the null hypothesis. However, a significant ANOVA result only tells you that differences exist somewhere among the groups. To identify which specific groups differ, you must follow up with post-hoc tests such as Tukey's HSD, Bonferroni correction, or Scheffe's method.
SSBetween = ∑ ni × (x̄i - x̄grand)²
SSWithin = ∑∑ (xij - x̄i)²
dfBetween = k - 1 | dfWithin = N - k
MSBetween = SSBetween ÷ dfBetween
MSWithin = SSWithin ÷ dfWithin
F = MSBetween ÷ MSWithin
Where k = number of groups, N = total observations, ni = size of group i, x̄i = mean of group i, x̄grand = grand mean
Compute the arithmetic mean of each group (x̄i) and the overall grand mean (x̄grand) of all data points combined.
For each group, multiply the group size by the squared difference between the group mean and the grand mean. Sum these values across all groups.
For each observation, calculate the squared difference from its group mean. Sum all these squared deviations across every observation in every group.
Divide SSBetween by dfBetween (k - 1) to get MSBetween. Divide SSWithin by dfWithin (N - k) to get MSWithin. The F-statistic is MSBetween ÷ MSWithin.
Look up the critical F-value for your degrees of freedom at your chosen α level. If F > Fcritical (or p < α), reject the null hypothesis.
Problem: A researcher tests three teaching methods on student exam scores.
Data: Method A: 85, 90, 88, 92, 86 | Method B: 78, 82, 80, 76, 84 | Method C: 92, 95, 89, 91, 93
Problem: An agronomist tests four fertilizers on crop yield (kg per plot).
Data: Fert A: 20, 22, 19, 21 | Fert B: 25, 28, 26, 27 | Fert C: 18, 20, 17, 21 | Fert D: 23, 24, 22, 25
Problem: A company tests whether three office layouts affect employee productivity (tasks per hour).
Data: Open: 12, 14, 11, 13, 15 | Cubicle: 13, 12, 14, 11, 15 | Private: 14, 13, 12, 15, 11
If the group means are very close together relative to the spread within each group, the F-value will be small (near 1) and the result will not be significant. A quick visual check: if the group means overlap considerably with the ranges within each group, ANOVA is unlikely to be significant.
A clinical trial compares three drug dosages (Placebo, Low, High) on pain reduction scores.
| Source | SS | df | MS | F | p-value |
|---|---|---|---|---|---|
| Between Groups | 210.00 | 2 | 105.00 | 8.40 | 0.002 |
| Within Groups | 300.00 | 24 | 12.50 | - | - |
| Total | 510.00 | 26 | - | - | - |
Interpretation: F(2, 24) = 8.40, p = 0.002. Significant at α = 0.05. At least one dosage group has a different mean pain reduction score.
An HR department compares four training programs on employee performance scores (n = 10 per group).
| Source | SS | df | MS | F | p-value |
|---|---|---|---|---|---|
| Between Groups | 450.00 | 3 | 150.00 | 5.00 | 0.005 |
| Within Groups | 1080.00 | 36 | 30.00 | - | - |
| Total | 1530.00 | 39 | - | - | - |
Interpretation: F(3, 36) = 5.00, p = 0.005. Significant at α = 0.05. At least one training program leads to different performance.
| Eta-Squared (η²) | Effect Size | Interpretation |
|---|---|---|
| 0.01 - 0.06 | Small | 1-6% of variance explained by group membership |
| 0.06 - 0.14 | Medium | 6-14% of variance explained by group membership |
| 0.14+ | Large | 14%+ of variance explained by group membership |
η² = SSBetween ÷ SSTotal. Cohen's guidelines for interpreting effect size in ANOVA.
ANOVA is the backbone of experimental research, allowing scientists to compare treatment groups, test hypotheses about drug efficacy, and evaluate intervention outcomes across multiple conditions simultaneously.
Manufacturing and engineering teams use ANOVA to compare production processes, identify optimal machine settings, and determine whether batch-to-batch differences in product quality are statistically significant.
Clinical trials rely on ANOVA to compare treatment effectiveness across multiple dosage levels or drug combinations, helping determine which therapies provide the best patient outcomes.
Educators and researchers use ANOVA to compare teaching methods, evaluate curriculum effectiveness across different schools, and assess whether interventions improve student performance across demographic groups.
Always verify the three key assumptions: independence of observations, approximate normality within groups (use Shapiro-Wilk test), and homogeneity of variances (use Levene's test). If variances are unequal, use Welch's ANOVA instead.
Running pairwise t-tests for 4 groups requires 6 comparisons, inflating the family-wise error rate to nearly 26%. ANOVA tests all groups simultaneously while maintaining the overall α at 0.05. Use ANOVA first, then post-hoc tests if significant.
A significant ANOVA result only tells you that differences exist, not where. Use Tukey's HSD for all pairwise comparisons, Dunnett's test for comparing to a control group, or Bonferroni for a small number of planned comparisons.
Statistical significance alone is not enough. With large sample sizes, even tiny differences can be "significant." Always report effect size (eta-squared or omega-squared) to convey the practical importance of the difference. A significant p-value with η² = 0.01 means groups differ but the effect is trivial.
ANOVA is robust to mild violations of normality with larger samples (generally n ≥ 20 per group). With small samples, the test has low statistical power and may fail to detect real differences. Use power analysis to determine the minimum sample size needed.
If normality or homogeneity of variances assumptions are severely violated and sample sizes are small, consider the Kruskal-Wallis test (non-parametric one-way ANOVA). It compares medians rather than means and does not require normality, though it has less statistical power.
ANOVA (Analysis of Variance) is a statistical test used to determine whether there are significant differences between the means of three or more groups. You should use ANOVA when you have one categorical independent variable with three or more levels and one continuous dependent variable. For comparing only two groups, a t-test is more appropriate.
One-way ANOVA tests the effect of a single factor (independent variable) on a dependent variable across multiple groups. Two-way ANOVA tests the effects of two factors simultaneously and can also detect interaction effects between those factors. For example, one-way ANOVA might test the effect of three different diets on weight loss, while two-way ANOVA could test both diet and exercise level together.
The F-statistic is the ratio of between-group variance to within-group variance (F = MS_between / MS_within). A larger F-value indicates that the differences between group means are large relative to the variability within groups. When F is close to 1, it suggests the group means are similar. When F is much larger than 1, it suggests at least one group mean significantly differs from the others.
One-way ANOVA has three main assumptions: (1) Independence - observations within and between groups must be independent. (2) Normality - the dependent variable should be approximately normally distributed within each group. (3) Homogeneity of variances - the variance of the dependent variable should be roughly equal across all groups (Levene's test can check this). Moderate violations of normality are tolerable with larger sample sizes.
The null hypothesis (H0) in one-way ANOVA states that all group population means are equal: H0: mu_1 = mu_2 = mu_3 = ... = mu_k. The alternative hypothesis (H1) states that at least one group mean is different from the others. Importantly, rejecting H0 does not tell you which specific groups differ - you need post-hoc tests for that.
If ANOVA shows a significant result, post-hoc tests identify which specific group pairs differ. Common choices include Tukey's HSD (best for equal sample sizes), Bonferroni correction (conservative, good for few comparisons), Scheffe's test (most conservative, for complex contrasts), and Games-Howell (when variances are unequal). Tukey's HSD is the most widely used for pairwise comparisons.
The p-value in ANOVA represents the probability of observing an F-statistic as extreme as the calculated value if the null hypothesis were true. If p < 0.05 (at the 5% significance level), you reject the null hypothesis and conclude that at least one group mean is significantly different. If p >= 0.05, you fail to reject the null hypothesis and conclude there is no statistically significant difference between group means.
Yes, one-way ANOVA can handle unequal sample sizes (unbalanced designs), though equal sample sizes provide the most statistical power and robustness. With unequal sizes, the test is more sensitive to violations of the homogeneity of variances assumption. If sample sizes and variances are both unequal, consider using Welch's ANOVA instead, which does not assume equal variances.
ANOVA is a generalization of the independent samples t-test. When comparing exactly two groups, one-way ANOVA produces an F-statistic that equals the square of the t-statistic (F = t-squared), and the p-values are identical. ANOVA extends this comparison to three or more groups while controlling the overall Type I error rate, which would inflate if multiple t-tests were used instead.
Sum of squares (SS) quantifies variability in the data. SS_Between measures variability between group means and the grand mean - larger values indicate groups differ more. SS_Within measures variability of individual observations around their group means - this represents random error. SS_Total equals SS_Between + SS_Within. The ratio SS_Between / SS_Total gives eta-squared, a measure of effect size.