Machine Learning Notes 03

 

Multivariate Linear Regression

  • Hypothesis: \( h(\theta) = \sum_{i=0}^{n}\theta_{i}X_{i} \)

\( X=\begin{bmatrix}x_{0}\ x_{1}\ .\ .\ .\ x_{n}\end{bmatrix}\in \mathbb{R}^{n+1} ;, \theta=\begin{bmatrix}\theta_{0}\ \theta_{1}\ .\ .\ .\ \theta_{n}\end{bmatrix}\in \mathbb{R}^{n+1}\)

  Also, \(h(\theta) = \theta ^{ T} X\)


Gradient descent

  • Algorithm:

   Repeat{
      \(\theta_{j}:=\theta_{j}-\alpha\frac{1}{m}\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}\)
   }


Feature Scaling

  • Idea: Make sure fretures are on a similar scale.
  • Mean normalization
      Replace \(x_{i}\) with \(x_{i}-\mu_{i}\) to make features have approximately zero mean(Do not apply to \(x_{0}\), Which we suppose equals 1)
    \(x_{i}:=\frac{x_{i}-\mu_{i}}{s_{i}}(i\neq0)\);
    (\(\mu_{i}\) :Average value of \(x_{i}\) in training set, \(s_{i}\) :range(=max-min)(or standard deviation))

Learning Rate

  • If \(\alpha\) is too small: slow convergence;
  • If \(\alpha\) is too large: \(J(\theta)\) may not decrese on every iteration, may not converge(slow converge also possible);

Normal Equation —Solve for \(\theta\) analytically

  • For the input X, add a column on the left in X filled with 1, make up a new X;
  • then,    \(\theta = (X^{ T}X)^{-1}X^{ T}y\)

The difference between Gradient Descent and Normal Equation:

Contents


本作品采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议进行许可。

知识共享许可协议