Prob #1:
---------
In Equation 3, the second exponent should be outside the parenthesis,
i.e., the right-hand side of the equation should read as:
\theta_{i|y}^{\frac{x_i+1}{2}} (1-\theta_{i|y})^{\frac{1-x_i}{2}}
Also, the "empty" Equation 4 was a LaTeX error on our part :-) There's
nothing that is supposed to be there.
The answers to this problem will not depend upon the particular prior you
choose for the distribution of y. In other words, if P(y=1) = \theta_y and
P(y=-1) = 1- \theta_y, your answers will, essentially, not depend on the
prior distribution of \theta_y. In particular, you are free to choose a
uniform prior P(\theta_y) = 1, if it simplifies your math. Your answers
will *not* be penalized on this aspect.
1(a)
-----
The right-hand side of the equation should have a normalization
constant to ensure that the posterior P(\theta|D) is a valid probability
distribution, i.e., the expression \prod_{i} \prod_{y} (....) should be
preceded by a normalization constant.
1(d)
----
You may assume that -A2 is positive definite. Also, there is an
error in the statement: instead of the determinant of A2 i.e. |A2| = n
C(r) (approx.), it should be that the determinant of -A2 i.e. |-A2| =
(n^r) C(r), approximately.
1(e)
----
The d in exponent should be replaced by r, i.e., it should be
(2\pi/n)^{r/2}
Prob #2:
--------
- Please see hw4/prob2/README for some clarifications on what the split
function should return.
- Also, some people were having difficulty loading the cepstra data.
Please try the updated files and contact Ali if you still run into
problems.
- In part (a), t_0 should be 1, and t_m should be N+1.
Hint for (b):
- A d-dimensional multivariate Gaussian has PDF:
f(X; \mu, \Sigma) = {1 \over \sqrt{(2\pi)^d|\Sigma|}} \exp\left(-{1 \over 2}(X - \mu)' \Sigma^{-1} (X - \mu)\right)
- Maximum likelihood estimators for its parameters are:
\mu = {1 \over N}\sum_{i=1}^N X_i
\Sigma = {1 \over N}\sum_{i=1}^N (X_i - \mu)(X_i - \mu)'
- The number of degrees of freedom is d for mu and d(d+1)/2 for Sigma
(not one for each).
- You will need to assume that splitpoints don't occur sufficiently early
in the data (skip, say, the first and last 30 possible splitpoints). This
will mitigate small det(Sigma) problems that some of you have encountered
and also removes some serious artifiacts.
Prob #3:
--------
(e) Only run call_boosting.m for the first 30 iterations. The m-file
call_boosting.m has been updated.