# Maximum Likelihood Estimates of Multinomial Cell Probabilities

## Definition: Multinomial Distribution (generalization of Binomial)

Section $$8.5.1$$ of Rice discusses multinomial cell probabilities.

Data consisting of: $X_1, X_2, \ldots, X_m$

are counts in cells $$1, \ldots, m$$ and follow a multinomial distribution

$f(x_1, \ldots, x_n \mid p_1, \ldots, p_m ) = { n! \over \prod_{j=1}^m x_i ! } \prod_{j=1}^m p_j ^{x_j}$

where

• $$(p_1, \ldots, p_m)$$ is the vector of cell probabilities with $$\sum_{i=1}^m p_i=1.$$

• $$n=\sum_{j=1}^m x_i$$ is the total count.

These data arise from a random sample of single-count Multinomial random variables, which are a generalization of Bernoulli random variables ($$m$$ distinct outcomes versus 2 distinct outcomes).

Suppose

• $$W_1, W_2, \ldots, W_n$$ are iid $$W \sim Multinomial(n, probs=(p_1, \ldots, p_m))$$ random variables:

• The sample space of each $$W_i$$ is $${\cal W} = \{1, 2, \ldots, m\}$$, a set of $$m$$ distinct outcomes.

• $$P( W_i=k) = p_k,$$ $$k=1,2, \ldots, m.$$

Define

• $$X_k = \sum_{i=1}^n 1( W_i =k)$$, (sum of indicators of outcome $$k$$), $$k=1, \ldots, m$$

• X$$=(X_1, \ldots, X_m)$$

## Maximum Likelihood Estimation

### Likelihood function of Multinomial

$\begin{array}{lcl} lik(p_1, \ldots, p_m) &=& log [ f(x_1, \ldots, x_m \mid p_1, \ldots, p_m )] \\ &=& log (n!) - \sum_{j=1}^m log( x_j !) + \sum_{j=1}^m x_j log(p_j) \\\end{array}$

### Maximum Likelihood Estimate (MLE)

The MLE of $$(p_1, \ldots, p_m)$$ maximizes $$lik(p_1, \ldots, p_m)$$ (with $$x_1, \ldots, x_m$$ fixed!)

• Maximum achieved when differential is zero

• Constraint: $$\sum_{j=1}^m p_j =1$$

• Apply method of Lagrange multipliers

• Solution: $$\hat p_j = x_j /n$$, $$j=1, \ldots, m.$$

Note: if any $$x_j =0,$$ then $$\hat p_j=0$$ solved as limit

## Example 8.5.1.A

### Hardy-Weinberg Equilibrium

• Equilibrium frequency of genotypes: $$AA$$, $$Aa$$, and $$aa$$

• $$P(a)=\theta$$ and $$P(A)=1-\theta$$

• Equilibrium probabilities of genotypes: $$(1-\theta)^2$$, $$2(\theta)(1-\theta)$$, and $$\theta^2.$$

• Multinomial Data: $$(X_1, X_2, X_3)$$ corresponding to counts of $$AA$$, $$Aa$$, and $$aa$$ in a sample of size $$n.$$

See, e.g.

http://www.nature.com/scitable/definition/hardy-weinberg-equation-299

http://www.nature.com/scitable/definition/hardy-weinberg-equilibrium-122

Sample Data

$\begin{array}{|l|lll|}\hline Genotype& AA & Aa & aa & Total \\ Count &X_1 & X_2 & X_3 & n\\ Frequency & 342 & 500 & 187 & 1029 \\ \hline \end{array}$

### Estimation of $$\theta$$

• $$X_3 \sim Binomial( n, p=\theta^2)$$

$$\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$$ $$\tilde \theta = (\tilde p)^{1/2} = (X_3/n)^{1/2}$$ $$=\sqrt{187/1029}=.4263.$$

• $$(X_1, X_2, X_3) \sim Multinomial(n, p=( (1-\theta)^2, 2\theta(1-\theta), \theta^2))$$

$$\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;$$ Solve for MLE

$\begin{array}{rcl} l(\theta ) &=& log( f(x_1, x_2, x_3 \mid p_1(\theta), p_2(\theta), p_3(\theta)))\\ &=& log( { n! \over x_1! x_2! x_3!} p_1(\theta)^{x_1} p_2(\theta)^{x_2} p_3(\theta)^{x_3} )\\ &=&x_1 log((1-\theta)^2) + x_2 log( 2 \theta ( 1-\theta)) + x_3 log ( \theta^2) + (\text{non-}\theta \; terms)\\ &=& (2 x_1 + x_2) log(1-\theta) + (2x_3 + x_2) log(\theta) + (\text{non-}\theta \; terms)\\ &&\\ \Longrightarrow \;\; \hat \theta & =& {2 x_3 + x_2 \over 2 x_1 + 2x_2 + 2x_3} = {2 x_3 + x_2 \over 2n} = 0.4247\\ \end{array}$

• Which estimate is better?

• Conduct Parametric Bootstrap Simulation!