Math 132A

Random Variables

Definition of a Random Variable

A random variable is a mathematical model of a random quantity (measurement, count)

It is a function that assigns each outcome from a sample space a number.

Example: The number of dots on the side of a die that is facing up.

Example: The total number of dots on the sides of two dice that are facing up.

Example: The number of patients in a 200-person sample that tested positive to Covid-19.

Example: The height of a plant randomly selected from a field.

Example: The amount of money a customer spends at a store.

A Very Simple Example

\(X = {}\) the number of heads when flipping 3 coins:

Example cont.

\(X\) models “the number of heads in 3 tosses of a fair coin”.

  • \(X\) can take on the values 0, 1, 2, 3.

3 coin tosses

More Examples:

  • Taking a token 3 times from a bag with one red and one blue token, with replacement. Counting the number of red tokens.

  • Asking 3 people to randomly select one of two characters, when they have no preference. Counting the number of times the friendly character was selected.

  • Randomly selecting 3 people from a large crowd that has 50% males and 50% females. Counting the number of females.

  • Randomly selecting 3 plants from a field in which 50% of the plants have some specific genetic mutation. Counting the number of plants with the mutation.

Mathematically, all of those are the same.

Distribution of a Random Variable

The distribution of a discrete random variable is the collection of its values and the probabilities associated with those values.

The probability distribution for \(X\) is as follows:

\(x_i\) 0 1 2 3
\(P(X = x_i)\) 1/8 3/8 3/8 1/8

The probabilities must add up to 1

Bar graph showing a distribution

\(x_i\) 0 1 2 3
\(P(X = x_i)\) 1/8 3/8 3/8 1/8

More Interesting Example

A small bakery sells three kinds of cookies:

  • Snickerdoodles, which cost $3.25 a package
  • Chocolate chip cookies, for $2.50 a package
  • Peanut butter cookies, costing $2 a package

The owner figured out that 15% of their customers buy a package of snickerdoodles, 20% of the customers buy a package of chocolate chip cookies, 18% buy a package of peanut butter cookies, 15% buy one package of snickerdoodles and one package of chocolate chip cookies, 10% buy one package of peanut butter cookies and one package of snickerdoodles, 17% buy one package of chocolate chip cookies and one package of peanut butter cookies, and 5% will buy one package of each of the three kinds.

How much money will a random customer spend in the bakery?

More Interesting Example cont.

Cookies bought total cost Probability
only snickerdoodles 3.25 .15
only chocolate chip 2.50 .20
only peanut butter 2.00 .18
snickerdoodles and chocolate chip 5.75 .15
snickerdoodles and peanut butter 5.25 .10
chocolate chip and peanut butter 4.50 .17
all three 7.75 .05

This is a Random Variable

\(X = {}\) the amount of money a randomly selected customer spends in the bakery.

Distribution:

\(x\) 2.00 2.50 3.25 4.50 5.25 5.75 7.75
\(\operatorname{P}(X = x)\) .18 .20 .15 .17 .10 .15 .05

Question: What is the average amount a customer spends in the bakery?

Question: How much do the individual customer spendings vary?

Expectation of a Random Variable

If \(X\) has values \(x_1\), …, \(x_k\) with probabilities \(P(X=x_1)\), …, \(P(X=x_k)\), the expected value of \(X\), also called the mean of \(X\), is the sum of each value multiplied by its corresponding probability:

\[\begin{aligned} E(X) &= x_1 P(X=x_1) + x_2 P(X = x_2) + \cdots + x_k P(X=x_k)\\\\ &= \sum_{i=1}^{k}x_iP(X=x_i) \end{aligned}\]

The Greek letter \(\mu\) may be used in place of the notation \(E(X)\) and is sometimes written \(\mu_X\).

Expectation…

In the coin tossing example,

\(x_i\) 0 1 2 3
\(P(X = x_i)\) \(\color{red}1/8\) \(\color{green}3/8\) \(\color{blue}3/8\) \(\color{orange}1/8\)

\[\begin{align*} E(X) &= 0{\color{red}P(X=0)} + 1{\color{green}P(X=1)} + 2{\color{blue}P(X=2)} + 3{\color{orange}P(X = 3)} \\[1.2em] &\class{fragment}{{}= 0\cdot{\color{red}\frac{1}{8}} + 1\cdot{\color{green}\frac{3}{8}} + 2\cdot{\color{blue}\frac{3}{8}} + 3\cdot{\color{orange}\frac{1}{8}}} \\[1.2em] &\class{fragment}{{}= \frac{12}{8}} \\[1.2em] &\class{fragment}{{}= 1.5} \end{align*}\]

Variance and SD of a Random Variable

If \(X\) takes on values \(x_1\), …, \(x_k\) with probabilities \(P(X=x_1)\), …, \(P(X=x_k)\) and has the expected value \(\mu=E(X)\), then the variance of \(X\), denoted by \(\operatorname{Var}(X)\) or \(\sigma^2\), is

\[\begin{align*} \operatorname{Var}(X) &= (x_1-\mu)^2 P(X=x_1) + (x_2 - \mu)^2 P(X = x_2) + \\ &\qquad \cdots+ (x_k-\mu)^2 P(X=x_k) \\ &= \sum_{j=1}^{k} (x_j - \mu)^2 P(X=x_j) \end{align*}\]

The standard deviation of \(X\), written as \(\operatorname{SD}(X)\) or \(\sigma\), is the square root of the variance. It is sometimes written \(\sigma_X\).

\[\operatorname{SD}(X) = \sqrt{\operatorname{Var(X)}}\]

Variance and SD…

\(x_i\) 0 1 2 3
\(P(X = x_i)\) \(\color{red}1/8\) \(\color{green}3/8\) \(\color{blue}3/8\) \(\color{orange}1/8\)

\[\begin{align*} \sigma_X^2 &= (0-\mu_X)^2{\color{red}P(X=0)} + (1-\mu_X)^2{\color{green}P(X=1)} + (2-\mu_X)^2{\color{blue}P(X=2)} \\ &\qquad + (3 - \mu_X)^2{\color{orange}P(X = 3)} \\[1.2em] &= \left(0-\frac{3}{2}\right)^2{\color{red}\frac{1}{8}} + \left(1 - \frac{3}{2}\right)^2{\color{green}\frac{3}{8}} + \left(2 -\frac{3}{2}\right)^2{\color{blue}\frac{3}{8}} + \left(3-\frac{3}{2}\right)^2{\color{orange}\frac{1}{8}} \\[1.2em] &\class{fragment}{{}= \frac{3}{4}} \end{align*}\]

Intuitively…

the expected value of \(X\) is approximately the number you would get if you took a lot of values of \(X\) and calculated the mean.

Similarly, the standard deviation of \(X\) is approximately the number you would get if you took a lot of values of \(X\) and calculated the standard deviation of the data.