Random Variables
A random variable is a mathematical model of a random quantity (measurement, count)
It is a function that assigns each outcome from a sample space a number.
Example: The number of dots on the side of a die that is facing up.
Example: The total number of dots on the sides of two dice that are facing up.
Example: The number of patients in a 200-person sample that tested positive to Covid-19.
Example: The height of a plant randomly selected from a field.
Example: The amount of money a customer spends at a store.
\(X = {}\) the number of heads when flipping 3 coins:
\(X\) models “the number of heads in 3 tosses of a fair coin”.
3 coin tosses
Taking a token 3 times from a bag with one red and one blue token, with replacement. Counting the number of red tokens.
Asking 3 people to randomly select one of two characters, when they have no preference. Counting the number of times the friendly character was selected.
Randomly selecting 3 people from a large crowd that has 50% males and 50% females. Counting the number of females.
Randomly selecting 3 plants from a field in which 50% of the plants have some specific genetic mutation. Counting the number of plants with the mutation.
Mathematically, all of those are the same.
The distribution of a discrete random variable is the collection of its values and the probabilities associated with those values.
The probability distribution for \(X\) is as follows:
\(x_i\) | 0 | 1 | 2 | 3 |
---|---|---|---|---|
\(P(X = x_i)\) | 1/8 | 3/8 | 3/8 | 1/8 |
The probabilities must add up to 1
\(x_i\) | 0 | 1 | 2 | 3 |
---|---|---|---|---|
\(P(X = x_i)\) | 1/8 | 3/8 | 3/8 | 1/8 |
A small bakery sells three kinds of cookies:
The owner figured out that 15% of their customers buy a package of snickerdoodles, 20% of the customers buy a package of chocolate chip cookies, 18% buy a package of peanut butter cookies, 15% buy one package of snickerdoodles and one package of chocolate chip cookies, 10% buy one package of peanut butter cookies and one package of snickerdoodles, 17% buy one package of chocolate chip cookies and one package of peanut butter cookies, and 5% will buy one package of each of the three kinds.
How much money will a random customer spend in the bakery?
Cookies bought | total cost | Probability |
---|---|---|
only snickerdoodles | 3.25 | .15 |
only chocolate chip | 2.50 | .20 |
only peanut butter | 2.00 | .18 |
snickerdoodles and chocolate chip | 5.75 | .15 |
snickerdoodles and peanut butter | 5.25 | .10 |
chocolate chip and peanut butter | 4.50 | .17 |
all three | 7.75 | .05 |
\(X = {}\) the amount of money a randomly selected customer spends in the bakery.
Distribution:
\(x\) | 2.00 | 2.50 | 3.25 | 4.50 | 5.25 | 5.75 | 7.75 |
---|---|---|---|---|---|---|---|
\(\operatorname{P}(X = x)\) | .18 | .20 | .15 | .17 | .10 | .15 | .05 |
Question: What is the average amount a customer spends in the bakery?
Question: How much do the individual customer spendings vary?
If \(X\) has values \(x_1\), …, \(x_k\) with probabilities \(P(X=x_1)\), …, \(P(X=x_k)\), the expected value of \(X\), also called the mean of \(X\), is the sum of each value multiplied by its corresponding probability:
\[\begin{aligned} E(X) &= x_1 P(X=x_1) + x_2 P(X = x_2) + \cdots + x_k P(X=x_k)\\\\ &= \sum_{i=1}^{k}x_iP(X=x_i) \end{aligned}\]
The Greek letter \(\mu\) may be used in place of the notation \(E(X)\) and is sometimes written \(\mu_X\).
In the coin tossing example,
\(x_i\) | 0 | 1 | 2 | 3 |
---|---|---|---|---|
\(P(X = x_i)\) | \(\color{red}1/8\) | \(\color{green}3/8\) | \(\color{blue}3/8\) | \(\color{orange}1/8\) |
\[\begin{align*} E(X) &= 0{\color{red}P(X=0)} + 1{\color{green}P(X=1)} + 2{\color{blue}P(X=2)} + 3{\color{orange}P(X = 3)} \\[1.2em] &\class{fragment}{{}= 0\cdot{\color{red}\frac{1}{8}} + 1\cdot{\color{green}\frac{3}{8}} + 2\cdot{\color{blue}\frac{3}{8}} + 3\cdot{\color{orange}\frac{1}{8}}} \\[1.2em] &\class{fragment}{{}= \frac{12}{8}} \\[1.2em] &\class{fragment}{{}= 1.5} \end{align*}\]
If \(X\) takes on values \(x_1\), …, \(x_k\) with probabilities \(P(X=x_1)\), …, \(P(X=x_k)\) and has the expected value \(\mu=E(X)\), then the variance of \(X\), denoted by \(\operatorname{Var}(X)\) or \(\sigma^2\), is
\[\begin{align*} \operatorname{Var}(X) &= (x_1-\mu)^2 P(X=x_1) + (x_2 - \mu)^2 P(X = x_2) + \\ &\qquad \cdots+ (x_k-\mu)^2 P(X=x_k) \\ &= \sum_{j=1}^{k} (x_j - \mu)^2 P(X=x_j) \end{align*}\]
The standard deviation of \(X\), written as \(\operatorname{SD}(X)\) or \(\sigma\), is the square root of the variance. It is sometimes written \(\sigma_X\).
\[\operatorname{SD}(X) = \sqrt{\operatorname{Var(X)}}\]
\(x_i\) | 0 | 1 | 2 | 3 |
---|---|---|---|---|
\(P(X = x_i)\) | \(\color{red}1/8\) | \(\color{green}3/8\) | \(\color{blue}3/8\) | \(\color{orange}1/8\) |
\[\begin{align*} \sigma_X^2 &= (0-\mu_X)^2{\color{red}P(X=0)} + (1-\mu_X)^2{\color{green}P(X=1)} + (2-\mu_X)^2{\color{blue}P(X=2)} \\ &\qquad + (3 - \mu_X)^2{\color{orange}P(X = 3)} \\[1.2em] &= \left(0-\frac{3}{2}\right)^2{\color{red}\frac{1}{8}} + \left(1 - \frac{3}{2}\right)^2{\color{green}\frac{3}{8}} + \left(2 -\frac{3}{2}\right)^2{\color{blue}\frac{3}{8}} + \left(3-\frac{3}{2}\right)^2{\color{orange}\frac{1}{8}} \\[1.2em] &\class{fragment}{{}= \frac{3}{4}} \end{align*}\]
the expected value of \(X\) is approximately the number you would get if you took a lot of values of \(X\) and calculated the mean.
Similarly, the standard deviation of \(X\) is approximately the number you would get if you took a lot of values of \(X\) and calculated the standard deviation of the data.