Introduction to Hypothesis Testing
A town has a local law governing the recall of the mayor:
At any time, anyone can request a referendum of non-confidence in the mayor. If more than 30% of all registered voters express non-confidence, the mayor will be recalled, and a new mayoral election will be held. If 30% or less of the registered voters vote for non-confidence, the mayor stays, and the person requesting the referendum must cover the cost.
Some residents are unhappy with the current mayor, and they want to request a referendum of non-confidence. However, they want to be reasonably certain that they will win, so they conduct a survey of 200 randomly selected registered voters, asking them whether they support the recall. Out of the 200 voters, 72 say they support the non-confidence vote.
The important question: Does more than 30% of the registered voters in the town support the non-confidence vote?
In symbols: If \(p\) is the proportion of all registered voters in the town that support the non-confidence vote, are we reasonably cetrain that \(p > .3\)?
The sample proportion is \(\widehat{p} = \frac{72}{200} = .36 > .3\)
However, samples vary!
Assuming that only 30% of the registered voters support the recall, how likely would a simple random sample of 200 voters result in \(\widehat{p} \ge .36\)?
\(\operatorname{P}(\widehat{p} \ge .36 \mid p = .3)\)
\(n = 200\), \(p = .3\), \(SE_{\widehat{p}} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{.3\cdot .7}{200}}\)
\(p \le .3\) | \(p > .3\) | |
---|---|---|
vote requested | ||
vote not requested |
Suppose \(p = .3\). What probability do they want to allow for scenario 1 to happen? Are they OK with 5%? Or maybe they want only 1%? Or maybe even smaller?
Let’s call this probability \(\alpha\), so \(\alpha = 0.05\), or \(\alpha = 0.01\), …
They want the probability of accidentally requesting the referendum in the situation when \(p = .3\) to be less than \(\alpha\).
We calculated \(\operatorname{P}(\widehat{p} \ge .36 \mid p = .3) = 0.0322 = 3.22\%\).
They choose \(\alpha\) to determine the degree of “scenario 1 risk” they are willing to accept.
They collect a sample and calculate the sample proportion of that sample. We call it the observed sample proportion, denoted \(\widehat{p}_{\text{obs}}\).
They calculate \(\operatorname{P}(\widehat{p} \ge \widehat{p}_{\text{obs}} \mid p = .3)\).
If this probability is less than \(\alpha\), they decide it is safe to go ahead with the referendum.
What we just did is called hypothesis testing.
It is a formal way of evaluating mathematical models of random variables or their association.
\(H_0\) represents the evaluated model: a randomly selected registered voter supporting the non-confidence vote is a Bernoulli random variable with probability of success \(p = 0.3\).
Question: Does this model explain our observed data?
We collect a sample and calculate the sample statistics \(\widehat{p}_{\text{obs}}\).
Question: Is the \(\widehat{p}_{\text{obs}}\) different from the \(p\) from the null hypothesis?
No, it is (almost) the same: then the model does explain the data, we are done!
Yes, the \(\widehat{p}_{\text{obs}}\) is different from \(p\).
New question: is it “different enough”? Is the difference significant?
The \(H_0\) is “true” | The \(H_0\) is false | |
---|---|---|
Rejecting \(H_0\) | Type I Error | OK |
Not rejecting \(H_0\) | OK | Type II Error |
We choose the probability of Type I Error.
Denoted as \(\alpha\), it is called the significance level.
Then we calculate so called p-value:
The probability that the distance of the random variable \(\widehat{p}\) from \(p\) is at least as large as the distance of \(\widehat{p}_{\text{obs}}\) from \(p\), given that the \(H_0\) is true.
In the non-confidence vote example, the p-value was
\(\operatorname{P}(\widehat{p} \ge .36 \mid p = .3) = 0.0322 = 3.22\%\).
If the p-value is less than \(\alpha\), we reject the \(H_0\).
According to historical data, the population of certain species of fish endemic to a freshwater lake on a remote tropical island is 45.2% female. Researchers want to know if recent increase in the water temperature affected this proportion. They collect a random sample of 64 individuals of this species from the lake, and find that only 22 of them are female. They want to perform the test at 5% significance level.
According to historical data, the population of certain species of fish endemic to a freshwater lake on a remote tropical island is 45.2% female. Researchers want to know if recent increase in the water temperature affected this proportion. They collect a random sample of 64 individuals of this species from the lake, and find that only 22 of them are female. They decide to construct a 95% confidence interval estimating the current proportion of females in the population.