Math 132A

Introduction to Hypothesis Testing

Starting with an Example

A town has a local law governing the recall of the mayor:

At any time, anyone can request a referendum of non-confidence in the mayor. If more than 30% of all registered voters express non-confidence, the mayor will be recalled, and a new mayoral election will be held. If 30% or less of the registered voters vote for non-confidence, the mayor stays, and the person requesting the referendum must cover the cost.

Some residents are unhappy with the current mayor, and they want to request a referendum of non-confidence. However, they want to be reasonably certain that they will win, so they conduct a survey of 200 randomly selected registered voters, asking them whether they support the recall. Out of the 200 voters, 72 say they support the non-confidence vote.

Should they go ahead with it?

The important question: Does more than 30% of the registered voters in the town support the non-confidence vote?

In symbols: If \(p\) is the proportion of all registered voters in the town that support the non-confidence vote, are we reasonably cetrain that \(p > .3\)?

The sample proportion is \(\widehat{p} = \frac{72}{200} = .36 > .3\)

However, samples vary!

Another Question

Assuming that only 30% of the registered voters support the recall, how likely would a simple random sample of 200 voters result in \(\widehat{p} \ge .36\)?

\(\operatorname{P}(\widehat{p} \ge .36 \mid p = .3)\)

\(n = 200\), \(p = .3\), \(SE_{\widehat{p}} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{.3\cdot .7}{200}}\)

Possible Scenarios

	\(p \le .3\)	\(p > .3\)
vote requested
vote not requested

How much are they willing to risk?

Suppose \(p = .3\). What probability do they want to allow for scenario 1 to happen? Are they OK with 5%? Or maybe they want only 1%? Or maybe even smaller?

Let’s call this probability \(\alpha\), so \(\alpha = 0.05\), or \(\alpha = 0.01\), …

They want the probability of accidentally requesting the referendum in the situation when \(p = .3\) to be less than \(\alpha\).

We calculated \(\operatorname{P}(\widehat{p} \ge .36 \mid p = .3) = 0.0322 = 3.22\%\).

Strategy

They choose \(\alpha\) to determine the degree of “scenario 1 risk” they are willing to accept.

They collect a sample and calculate the sample proportion of that sample. We call it the observed sample proportion, denoted \(\widehat{p}_{\text{obs}}\).

They calculate \(\operatorname{P}(\widehat{p} \ge \widehat{p}_{\text{obs}} \mid p = .3)\).

If this probability is less than \(\alpha\), they decide it is safe to go ahead with the referendum.

Hypothesis Testing

What we just did is called hypothesis testing.

It is a formal way of evaluating mathematical models of random variables or their association.

Null hypothesis \(H_0: p = 0.3\) (the referendum will fail)
Alternative hypothesis \(H_A: p > 0.3\) (the referendum will succeed)

\(H_0\) represents the evaluated model: a randomly selected registered voter supporting the non-confidence vote is a Bernoulli random variable with probability of success \(p = 0.3\).

Question: Does this model explain our observed data?

Observed data

We collect a sample and calculate the sample statistics \(\widehat{p}_{\text{obs}}\).

Question: Is the \(\widehat{p}_{\text{obs}}\) different from the \(p\) from the null hypothesis?

No, it is (almost) the same: then the model does explain the data, we are done!
Yes, the \(\widehat{p}_{\text{obs}}\) is different from \(p\).

New question: is it “different enough”? Is the difference significant?
- If we decide that it is different enough, we conclude that the model does not explain the data, and we say that we reject the null hypothesis.

Possible Scenarios (and errors)

	The \(H_0\) is “true”	The \(H_0\) is false
Rejecting \(H_0\)	Type I Error	OK
Not rejecting \(H_0\)	OK	Type II Error

We choose the probability of Type I Error.

Denoted as \(\alpha\), it is called the significance level.

The p-value and the conclusion

Then we calculate so called p-value:

The probability that the distance of the random variable \(\widehat{p}\) from \(p\) is at least as large as the distance of \(\widehat{p}_{\text{obs}}\) from \(p\), given that the \(H_0\) is true.

In the non-confidence vote example, the p-value was

\(\operatorname{P}(\widehat{p} \ge .36 \mid p = .3) = 0.0322 = 3.22\%\).

If the p-value is less than \(\alpha\), we reject the \(H_0\).

Another example

According to historical data, the population of certain species of fish endemic to a freshwater lake on a remote tropical island is 45.2% female. Researchers want to know if recent increase in the water temperature affected this proportion. They collect a random sample of 64 individuals of this species from the lake, and find that only 22 of them are female. They want to perform the test at 5% significance level.

Comparison to Confidence Intervals

According to historical data, the population of certain species of fish endemic to a freshwater lake on a remote tropical island is 45.2% female. Researchers want to know if recent increase in the water temperature affected this proportion. They collect a random sample of 64 individuals of this species from the lake, and find that only 22 of them are female. They decide to construct a 95% confidence interval estimating the current proportion of females in the population.