This is a very rough introduction to some of the statistical concepts and terms that you will encounter in your ecology discussion papers. It is intended mostly for those who have never encountered ANOVAs and p-values before, so you can make some sense of the results sections in your readings (and contribute meaningfully to discussion!) For a much better introduction to statistics and experimental design in ecology, you can take BIOL 1420 Experimental Design in Ecology (offered next Fall). PHP 2510 Principles of Biostatistics and Data Analysis (every Fall) is a good summary course that covers probability, descriptive statistics and inference. The Applied Math and Sociology departments also offer basic statistics courses.
Statistics are vitally important in experimental ecology (and in most other fields as well!) Ecologists use a variety of statistical models and tests to describe and make inferences about patterns and relationships in the natural world, based on data collected.
For example, you want to investigate if there is a relationship between location and the size of periwinkle snails on the New England rocky shore. You cannot measure every single snail on the shore, so you measure 50 snails from Maine and 50 snails from Rhode island, and find that the average length of the snails from Maine is greater than the average length of the snails from Rhode Island. How can you tell if the difference you observe is related to location (Maine snails are bigger that RI snails) or if you happened to pick up 50 bigger snails from Maine and 50 smaller snails from RI, just by chance? We know that snails also vary in length even within a single shore, based on other factors like age, genetics, etc. Snail length cannot be predicted exactly; there is an amount of variation that is random and cannot be explained by known factors.
Statistical analysis allows us to take this random variation into account by quantifying the amount of uncertainty in your samples using statistical models. We can then evaluate the evidence and draw conclusions using a hypothesis test. A hypothesis test (e.g. t-test, chi-square test, ANOVA) weighs the evidence for one basic hypothesis against another. In ecology, we are usually testing the hypothesis that there is a relationship between things against the 'default' or 'null' hypothesis that there is no relationship and that any differences you see are due to random variation and chance. Hypothesis tests are generally associated with two things, a p-value and a significance level.
p-values and significance
A p-value is essentially a measure of evidence for the null hypothesis. More specifically, it is the probability of obtaining your observed result if the null hypothesis is true (and there is no relationship at all). If you test for a difference between periwinkle snail length in Maine and RI and get a p-value of 0.03, you can say that "if there were no relationship between location and snail size, there would only be a 3% chance that I would see this amount of difference between Maine and RI snails." A 3% probability is pretty low, but is it low enough to conclude that it is more likely that a relationship exists?
To decide, you need to have a cut-off value to compare it to. This value is your significance level, or critical value. It is an arbitrary value that indicates how much evidence is required to conclude that the null hypothesis is not true. By convention, the level of significance used in ecology (and most other fields) is 0.05. That is, if your p-value is less than 0.05 (i.e. there is less than a 5% chance of observing your result if there is no actual relationship), then there is sufficient evidence to accept that there is a 'statistically significant' relationship.
Note: A p-value is not the probability that the null hypothesis is true! You would not be able to say "there is a 3% probability that there is no relationship between location and snail size" or "there is a 97% probability that there is a relationship." The probability refers to the data, not the hypothesis.
T-tests
Probably the simplest of them all. The t-test is a test that two averages (means) are different from each other. For example, you could use a t-test on your snail measurements to test if the mean lengths of snails from Maine and RI are statistically different.
ANOVAs
One of the most common tests you will see in discussion papers is the Analysis of Variance (ANOVA). This tests the relationship between one or more categorical factors (independent variables) and a continuous response (dependent) variable.
Some types of ANOVAs:
- One-way ANOVA: one categorical factor. Very similar to a t-test, but more than two groups can be compared (e.g. mean snail lengths from Maine, RI, and Massachusetts)
- Two-/three-way ANOVA: two/three categorical factors.
- ANCOVA (analysis of covariance): one or more categorical factors plus a continuous factor; tests for effects of the categorical factor(s) on the response variable after the effects of the continuous factor have been accounted for.
- Repeated-measures ANOVA: includes the effect of time (e.g. change in response to a factor over time)
- MANOVA (multivariate analysis of variance): more than one response variable
Simple linear regressions
A basic linear regression tests for a linear relationship between a continuous factor (independent variable) X and a continuous response (dependent) variable Y – basically, a relationship of the familiar Y = aX + b form. Two values are usually reported, the p-value as an indicator of how significant the linear relationship is (see above for a description of p-values) and an r2 or R2 value, which indicates how well the Y = aX + b line describes the data. r2 values range from 0 (no linear fit) to 1 (perfect linear fit). A linear regression with a low (<0.05) p-value and a high r2 value would indicate a good linear relationship between your factor and response variables.

