# Linear regression

## Linear regression with single output

Suppose we have $n$ sets of trainning data, the $i\rm{th}$ set of data consists $m$ inputs ${x_1^i,...,x_m^i}$, and one output $y^i$. To predict the result with new data, linear regression is used.

### Hypothesis

By the linear regression hypothesis, the function $h_\theta( {\bf{x}}^i)$ is represented as $h_\theta({\bf {x}}^i) = \theta _0 +\theta_1 x_1^i + ... + \theta x_m^i$

where ${\bf {x}}^i = [1,x_1^i,...,x_m^i]^T$

The vectorized formation is ${h_\theta({\bf X})} = {\bf X} { \boldsymbol \theta}$

where ${h_\theta({\bf X})} = [ h_\theta({\bf x}^1), ... , h_\theta({\bf x}^n)]^T$, ${\boldsymbol \theta} = [\theta_0, ..., \theta_m]^T$, ${\bf X} =\left [ {\bf x}^1, ..., {\bf x}^n \right ]^T$

The value of $\boldsymbol \theta$ is determined by the cost function.

### Cost function

The least mean square is choosen as the cost function $J(\boldsymbol \theta)$ $J(\boldsymbol \theta) = \frac{1}{2n}\sum_{i=1}^{n} (h_\theta ({\bf x}^i) - y^i)^2 = \frac{1}{2n}({h_\theta ({\bf X})} - {\bf y})^T({h_\theta ({\bf X})} - {\bf y} )$

where ${\bf {y}} = [1,y_1,...,y_n]^T$

### Find $\boldsymbol \theta$

To minimize the cost function $J({\boldsymbol \theta})$, the derivative of $J({\boldsymbol \theta})$ should equals to 0. $\begin{matrix} &\frac{\partial J(\theta)}{\theta_0} = \frac{1}{n}\sum_i^{n}({h_\theta ({\bf x}^i)} - y^i ) = 0\\ &\frac{\partial J(\theta)}{\theta_1} = \frac{1}{n}\sum_i^{n}({h_\theta ({\bf x}^i)} - y^i )x_1^i = 0\\ &... \\ &\frac{\partial J(\theta)}{\theta_m} = \frac{1}{n}\sum_i^{n}({h_\theta ({\bf x}^i)} - y^i )x_m^i =0\\ \end{matrix}$

The vectorized form can be written as $\frac{1}{n}( {\bf X}^T {\bf X} {\boldsymbol \theta} - {\bf X}^T {\bf y} ) = 0$

The value of $\boldsymbol \theta$ is ${\boldsymbol \theta} = ({\bf X}^T {\bf X})^{-1} {\bf X}^T {\bf y }$

## Linear regression with multiple output

Suppose we have $n$ sets of trainning date, the $i\rm{th}$ set of data consists $m$ inputs ${x_1^i,...,x_m^i}$, and $k$ output $y_1^i,...,y_k^i$.

The linear regression for multiple outputs is similiar with linear regression for sinlgr output.

For the Hypothesis function $h^j_\theta({\bf x}^i), j =1,...,k$ is $h^j_\theta({\bf x}^i) = \theta^j _0 +\theta^j_1 x_1^i + ... + \theta^j x_m^i$

each hypothesis $h^j_\theta({\bf x}^i)$ is a sinple output hypothesis and linearly indepent.

The vectorized form can be written as ${H_\Theta({\bf X})} = {\bf X} \boldsymbol \Theta$

where $H_\Theta({\bf X}) = [h_\theta^1({\bf X}),...,h_\theta^k({\bf X})]$, $\boldsymbol \Theta = [\boldsymbol \theta^1,...\boldsymbol \theta^k]$

The cost function for multiple output linear regression can be written as multiple linealy independent sinple output linear regresson, with same operation on dericative. Finally the $\boldsymbol \Theta$ can be calculated with $\boldsymbol \Theta = ({\bf X}^T {\bf X})^{-1} {\bf X}^T {\bf Y}$

Where ${\bf Y} = [{\bf y}^1,...,{\bf y}^k]$.

## Python example of linear regression with multiple outputs

import math

import numpy as np
import sklearn
from sklearn import metrics
from sklearn import linear_model

# Load Training Date and test data

# X: training input data    y: training output data
# X2: test inut data
# (It is weird that although I put 0:3 and 3:6, it is actually actually 0:2, 3:5 columns.
#    Not sure if there is somthing wrong on my laptop)
X = np.transpose(f[3:6,:])
y = np.transpose(f[0:3,:])
XX = np.ones((len(X), len(X[0,:])+1))
XX[:,1:4] = X

# randomly pick some data for test. Here, 10 sets of data are choosen
n_test = 10
L = np.zeros((n_test,6))
for i in range(0,n_test):
L[i,:] = np.transpose(f2[:,np.random.randint(low = 1,high = len(f2))])
X2 = L[:,3:6]
Y3 = L[:,0:3]

##### Calculate with the linear regression equations #####
# initialize X with an extra row of element 
XX2 = np.ones((len(X2), len(X2[0,:])+1))
XX2[:,1:4] = X2

# Calculate theta by linear regression equation
XX2T = np.transpose(XX)
pinvXX = np.linalg.pinv(np.matmul(XX2T,XX))
theta = np. matmul(np.matmul( pinvXX, XX2T ), y )

# Predict with theta
y2 = np.matmul(XX2,theta)

#### Calculate with sklearn libraty ####
# Train data with slinear regression
lm = linear_model.LinearRegression()

# Linear fit
lm.fit(X,y)

# predict the results with test data input
y3 = lm.predict(X2)

## Outputs
print(' Input X Centroid, Y centroid, Area')
print(X2)
print(' ')

# print norm X norm, Y norm, Intersecpt predicted by Linear Regression
print(' predicted norm X norm, Y norm, Intersecpt(py)')
print(y2)
print(' ')

# print norm X norm, Y norm, Intersecpt predicted by Linear Regression
print(' predicted norm X norm, Y norm, Intersecpt(sklearn)')
print(y3)
print(' ')

# print exact norm X norm, Y norm, Intersecpt
print(' exact norm X norm, Y norm, Intersecpt')
print(Y3)
print(' ')

# print(' CPU_time for fit')
# print(timer2-timer1)

# print(' CPU_time for predict')
# print(timer3-timer2)

 Input X Centroid, Y centroid, Area
[[ 0.011728  0.461987 -0.43144 ]
[ 0.722013  0.09947   0.094082]
[ 0.5       0.248299 -0.023811]
[ 0.47532   0.216514 -0.126761]
[ 0.630344  0.114905 -0.134785]
[ 0.974811  0.011673  0.009922]
[ 0.560012  0.219963 -0.003007]
[ 0.860352  0.063292  0.035395]
[ 0.964991  0.014939 -0.01494 ]
[ 0.762323  0.088118  0.080133]]

predicted norm X norm, Y norm, Intersecpt(py)
[[ 9.34969586e-01 -1.06476457e+00  5.23932912e-01]
[ 2.01307450e-01  2.32187975e-01 -2.38219364e-01]
[ 5.02507675e-01 -5.87639280e-02  5.73360462e-06]
[ 4.38181172e-01 -3.12837524e-01  2.64879459e-02]
[ 2.32544812e-01 -3.32640210e-01 -1.39856401e-01]
[ 2.36238249e-02  2.44868209e-02 -5.09477480e-01]
[ 4.45161260e-01 -7.42107141e-03 -6.43885340e-02]
[ 1.28090390e-01  8.73524518e-02 -3.86660318e-01]
[ 3.02335578e-02 -3.68709035e-02 -4.98940392e-01]
[ 1.78333265e-01  1.97762792e-01 -2.81472928e-01]]

predicted norm X norm, Y norm, Intersecpt(sklearn)
[[ 9.34969586e-01 -1.06476457e+00  5.23932912e-01]
[ 2.01307450e-01  2.32187975e-01 -2.38219364e-01]
[ 5.02507675e-01 -5.87639280e-02  5.73360462e-06]
[ 4.38181172e-01 -3.12837524e-01  2.64879459e-02]
[ 2.32544812e-01 -3.32640210e-01 -1.39856401e-01]
[ 2.36238249e-02  2.44868209e-02 -5.09477480e-01]
[ 4.45161260e-01 -7.42107141e-03 -6.43885340e-02]
[ 1.28090390e-01  8.73524518e-02 -3.86660318e-01]
[ 3.02335578e-02 -3.68709035e-02 -4.98940392e-01]
[ 1.78333265e-01  1.97762792e-01 -2.81472928e-01]]

exact norm X norm, Y norm, Intersecpt
[[ 0.874571 -0.484898  0.58    ]
[ 0.726715  0.686939 -0.18    ]
[ 0.989948 -0.141429  0.      ]
[ 0.81037  -0.585918  0.02    ]
[ 0.664204 -0.747551 -0.1     ]
[ 0.923384  0.383878 -0.52    ]
[ 0.999796 -0.020204 -0.06    ]
[ 0.931526  0.363673 -0.34    ]
[ 0.707071 -0.707143 -0.52    ]
[ 0.745295  0.666735 -0.22    ]]