This problem is to help you review the concepts of Section 7.3 we covered
in class.
Suppose we collect data (from some experiment or some survey); we plot our
data points (x_1, y_1), ..., (x_k, y_k) and suppose it turns out that they
look almost like they lie on a straight line. We want to find the line that best
fits our data points.
Ideally, we'd like to find a line mx +b that fits all our data
points exactly. Explain why we can write this as an equation Au = y, where u is the column vector [b m]^T,
y is the column vector
[y_1 ... y_k]^T, and A is the k by 2 matrix whose first column is all 1s and whose second column consists of x_1, ..., x_k.
If our equation Au = y turns out to have no solution, then we'll want
to find a line that "best" fits our data; i.e., find u such
that Au is as close as possible to y; equivalently, find a vector y' as
close as possible to y such that Au = y' has a solution. Explain
carefully why y'
= proj of y onto col(A).
Instead of actually computing y' = proj of y onto col(A) (which is a
lot of work) and then
solving the equation Au = y', we find u by a different method. Explain
what that method is.
(Optional) Explain why minimizing the distance in R^k between y and y' is
equivalent to minimizing the sum of the squares of the distances between
each of our data points (x_i, y_i) and the point directly below or above
(x_i, y_i) on the line we're looking for, where i = 1, ..., k. Draw a
diagram to accompany your explanation. Is what the book refers to as
"the least squares error" the same as ||y-y'||? Why is it
called "least squares"?
Redo Problem 7 of Section 7.3 the "long way": in the notation
of the above problem, compute y' = proj of y onto col(A), and then solve the
equation Au = y'. Did you get the same answer as in the back of the book? Draw
the two columns of A as vectors in R^3. Draw col(A) and y and y' in R^3 as
best as you can.
As a review of some topics from Chapter 5, let's prove the easier
part of the Least Squares Theorem: Let A be an m by n matrix. Let y be any vector in R^m. If Ax = projcol(A)(y),
then (A^T)Ax = (A^T)y. Proof:
There exist vectors c and d such that y = c + d where c is in col(A) and d
is in col(A)^perp. Explain why.
Show d is in null(A^T).
Show (A^T)y = (A^T)c.
Use the above to show if Ax = projcol(A)(y), then (A^T)Ax =
(A^T)y.